Solr collections

A Collection in Solr terminology is a collection of indexed documents. Each collection has its own set of configuration and schema definitions that can differ from those in other collections. We can also say that a Solr collection is a group of shards/cores that make up a single logical index. Solr splits data into shards, distributes the shards between the servers based on the number of shards per node, and maintains shards replicas to assure reliability using a replication factor that the core can establish. The logical concept is shown below.

Solr collections
Solr collections in a cluster
Solr collections
Solr collections in a cluster

Create collection via Solr web UI

  1. Open your web browser and go to the URL assigned as a web interface for Solr. You can check this URL in ADCM.

    Solr page in ADCM
    Solr page in ADCM
  2. The Collections tab displays a list of collections that exist in your cluster. Clicking on a collection name provides some basic metadata about how the collection is defined, its current shards and replicas, with options for adding and deleting individual replicas.

    The controls at the top of the page allow you to make various collection-related changes to your cluster, such as adding new collections or aliases, reloading or deleting a single collection, etc.

    To add a new collection, click Add Collection.

    Collections page
    Collections page
    Collections page
    Collections page
  3. Specify the collection parameters in the opened pane:

    • name. A user-defined name of the collection. It is a mandatory parameter. We recommend using the following rules:

      • Use only ASCII alphanumeric characters (A-Za-z0-9), hyphen (-), or underscore (_).

      • Avoid using the strings "shard" and "replica".

    • config set. The name of an existing collection configuration. You can choose _default config or search for existing configurations.

    • numShards. The number of shards to be created as part of the collection. It is a mandatory parameter if the router field is set to Composite ID.

    • replicationFactor. The number of replicas to be created for each shard. It is an optional parameter.

      TIP
      The default value is 1. The maximum value is the number of running Solr server nodes.
      Create new collection
      Create new collection
      Create new collection
      Create new collection
  4. (Optional) Click Show advanced to see optional Advanced options:

    • router. The name of the router that will be used. A router defines the distribution of documents among the shards. Possible values are Composite ID or Implicit. The shards parameter is required when using the Implicit router; when using the Composite ID router, the numShards parameter is required.

      • Implicit. The documents do not route automatically to different shards. Whichever shard you indicate on the indexing request (or within each document) will be the destination for those documents.

      • Composite ID. This router hashes the value in the uniqueKey field. It searches for that hash in the collection’s cluster state to determine which shard will receive the document, with the additional ability to direct the routing manually.

    • maxShardsPerNode. When creating Collections, the shards and/or replicas are spread across all available (i.e., live) nodes, and two replicas of the same shard will never be on the same node. If a node is not alive when the CREATE operation is called, it will not get any parts of the new collection, leading to too many replicas being created on a single live node. Defining maxShardsPerNode sets a limit on the number of replicas CREATE will spread to each node. If the entire collection can not fit into the live nodes, it will create no collection at all. The default value is 1.

    • shards. A comma-separated list of shard names (e.g., shard-x,shard-y,shard-z). This is a required parameter when using the Implicit router.

    • router.field. If this field is specified, the router looks at the field’s value in an input document to compute the hash and identify a shard instead of looking at the uniqueKey field. If the field specified is null in the document, it will reject the document.

    • autoAddReplicas. When set to true, enables automatic addition of replicas on shared file systems. The default value is false.

  5. After specifying all the parameters, click Add Collection to create the new collection.

Created collection
Created collection
Created collection
Created collection

Create collections via CLI

This section shows basic Solr operations using CLI. You can perform operations on Solr collections using the /usr/lib/solr/bin/solr script and solrctl utility.

NOTE
The examples below assume no Kerberos usage in the ADH сluster.

Create a collection

To create a new collection, use the commands:

$ /usr/lib/solr/bin/solr create -c <collection_name> -s <shards_num> -rf <replicas_num>
# or
$ solrctl collection --create <collection_name> -s <shards_num> -c <config_set>

Where:

  • <collection_name> — an alphanumerical name;

  • <shards_num> — a number of shards to split the collection into;

  • <replicas_num> — a number of copies of each document in the collection.

Example:

$ /usr/lib/solr/bin/solr create -c Collection0 -s 2 -rf 2

List collections

To view all available collections, run:

$ solrctl collection --list

The command output looks as follows:

Collection0 (5)
demoCollection_1 (5)
...

Delete a collection

To delete a collection, use the commands:

$ /usr/lib/solr/bin/solr delete -c <collection_name>
# or
$ solrctl collection --delete <collection_name>

The command output looks as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":236},
  "success":{
    "ka-adh-3.ru-central1.internal:8983_solr":{"responseHeader":{
        "status":0,
        "QTime":26}},
    "ka-adh-1.ru-central1.internal:8983_solr":{"responseHeader":{
        "status":0,
        "QTime":26}},
    "ka-adh-2.ru-central1.internal:8983_solr":{"responseHeader":{
        "status":0,
        "QTime":42}}}}


Deleted collection 'collection1' using command:
http://ka-adh-1.ru-central1.internal:8983/solr/admin/collections?action=DELETE&name=collection1

Collection health check

To check the state of a specific collection and get diagnostics information, use the command:

$ /usr/lib/solr/bin/solr healthcheck -c <collection_name>

Example:

$ /usr/lib/solr/bin/solr healthcheck -c Collection0

The output looks as follows:

{
  "collection":"Collection0",
  "status":"healthy",
  "numDocs":0,
  "numShards":2,
  "shards":[
    {
      "shard":"shard1",
      "status":"healthy",
      "replicas":[
        {
          "name":"core_node3",
          "url":"http://ka-adh-2.ru-central1.internal:8983/solr/Collection0_shard1_replica_n1/",
          "numDocs":0,
          "status":"active",
          "uptime":"2 days, 22 hours, 21 minutes, 49 seconds",
          "memory":"158.9 MB (%32.8) of 485 MB"},
        {
          "name":"core_node5",
          "url":"http://ka-adh-1.ru-central1.internal:8983/solr/Collection0_shard1_replica_n2/",
          "numDocs":0,
          "status":"active",
          "uptime":"2 days, 22 hours, 21 minutes, 49 seconds",
          "memory":"240.4 MB (%49.9) of 481.5 MB",
          "leader":true}]},
    {
      "shard":"shard2",
      "status":"healthy",
      "replicas":[
        {
          "name":"core_node7",
          "url":"http://ka-adh-3.ru-central1.internal:8983/solr/Collection0_shard2_replica_n4/",
          "numDocs":0,
          "status":"active",
          "uptime":"2 days, 22 hours, 21 minutes, 49 seconds",
          "memory":"196 MB (%40) of 489.5 MB",
          "leader":true},
        {
          "name":"core_node8",
          "url":"http://ka-adh-2.ru-central1.internal:8983/solr/Collection0_shard2_replica_n6/",
          "numDocs":0,
          "status":"active",
          "uptime":"2 days, 22 hours, 21 minutes, 49 seconds",
          "memory":"160.8 MB (%33.2) of 485 MB"}]}]}
Found a mistake? Seleсt text and press Ctrl+Enter to report it