Solr collections
A Collection in Solr terminology is a collection of indexed documents. Each collection has its own set of configuration and schema definitions that can differ from those in other collections. We can also say that a Solr collection is a group of shards/cores that make up a single logical index. Solr splits data into shards, distributes the shards between the servers based on the number of shards per node, and maintains shards replicas to assure reliability using a replication factor that the core can establish. The logical concept is shown below.
Create collection via Solr web UI
-
Open your web browser and go to the URL assigned as a web interface for Solr. You can check this URL in ADCM.
Solr page in ADCM -
The Collections tab displays a list of collections that exist in your cluster. Clicking on a collection name provides some basic metadata about how the collection is defined, its current shards and replicas, with options for adding and deleting individual replicas.
The controls at the top of the page allow you to make various collection-related changes to your cluster, such as adding new collections or aliases, reloading or deleting a single collection, etc.
To add a new collection, click Add Collection.
Collections pageCollections page -
Specify the collection parameters in the opened pane:
-
name. A user-defined name of the collection. It is a mandatory parameter. We recommend using the following rules:
-
Use only ASCII alphanumeric characters (
A-Za-z0-9
), hyphen (-
), or underscore (_
). -
Avoid using the strings "shard" and "replica".
-
-
config set. The name of an existing collection configuration. You can choose
_default
config or search for existing configurations. -
numShards. The number of shards to be created as part of the collection. It is a mandatory parameter if the
router
field is set toComposite ID
. -
replicationFactor. The number of replicas to be created for each shard. It is an optional parameter.
TIPThe default value is1
. The maximum value is the number of running Solr server nodes.Create new collectionCreate new collection
-
-
(Optional) Click Show advanced to see optional Advanced options:
-
router. The name of the router that will be used. A router defines the distribution of documents among the shards. Possible values are
Composite ID
orImplicit
. Theshards
parameter is required when using theImplicit
router; when using theComposite ID
router, thenumShards
parameter is required.-
Implicit. The documents do not route automatically to different shards. Whichever shard you indicate on the indexing request (or within each document) will be the destination for those documents.
-
Composite ID. This router hashes the value in the
uniqueKey
field. It searches for that hash in the collection’s cluster state to determine which shard will receive the document, with the additional ability to direct the routing manually.
-
-
maxShardsPerNode. When creating Collections, the shards and/or replicas are spread across all available (i.e., live) nodes, and two replicas of the same shard will never be on the same node. If a node is not alive when the
CREATE
operation is called, it will not get any parts of the new collection, leading to too many replicas being created on a single live node. DefiningmaxShardsPerNode
sets a limit on the number of replicasCREATE
will spread to each node. If the entire collection can not fit into the live nodes, it will create no collection at all. The default value is1
. -
shards. A comma-separated list of shard names (e.g.,
shard-x,shard-y,shard-z
). This is a required parameter when using theImplicit
router. -
router.field. If this field is specified, the router looks at the field’s value in an input document to compute the hash and identify a shard instead of looking at the
uniqueKey
field. If the field specified isnull
in the document, it will reject the document. -
autoAddReplicas. When set to
true
, enables automatic addition of replicas on shared file systems. The default value isfalse
.
-
-
After specifying all the parameters, click Add Collection to create the new collection.
Create collections via CLI
This section shows basic Solr operations using CLI.
You can perform operations on Solr collections using the /usr/lib/solr/bin/solr script and solrctl
utility.
NOTE
The examples below assume no Kerberos usage in the ADH сluster.
|
Create a collection
To create a new collection, use the commands:
$ /usr/lib/solr/bin/solr create -c <collection_name> -s <shards_num> -rf <replicas_num>
# or
$ solrctl collection --create <collection_name> -s <shards_num> -c <config_set>
Where:
-
<collection_name> — an alphanumerical name;
-
<shards_num> — a number of shards to split the collection into;
-
<replicas_num> — a number of copies of each document in the collection.
Example:
$ /usr/lib/solr/bin/solr create -c Collection0 -s 2 -rf 2
List collections
To view all available collections, run:
$ solrctl collection --list
The command output looks as follows:
Collection0 (5) demoCollection_1 (5) ...
Delete a collection
To delete a collection, use the commands:
$ /usr/lib/solr/bin/solr delete -c <collection_name>
# or
$ solrctl collection --delete <collection_name>
The command output looks as follows:
{ "responseHeader":{ "status":0, "QTime":236}, "success":{ "ka-adh-3.ru-central1.internal:8983_solr":{"responseHeader":{ "status":0, "QTime":26}}, "ka-adh-1.ru-central1.internal:8983_solr":{"responseHeader":{ "status":0, "QTime":26}}, "ka-adh-2.ru-central1.internal:8983_solr":{"responseHeader":{ "status":0, "QTime":42}}}} Deleted collection 'collection1' using command: http://ka-adh-1.ru-central1.internal:8983/solr/admin/collections?action=DELETE&name=collection1
Collection health check
To check the state of a specific collection and get diagnostics information, use the command:
$ /usr/lib/solr/bin/solr healthcheck -c <collection_name>
Example:
$ /usr/lib/solr/bin/solr healthcheck -c Collection0
The output looks as follows:
{ "collection":"Collection0", "status":"healthy", "numDocs":0, "numShards":2, "shards":[ { "shard":"shard1", "status":"healthy", "replicas":[ { "name":"core_node3", "url":"http://ka-adh-2.ru-central1.internal:8983/solr/Collection0_shard1_replica_n1/", "numDocs":0, "status":"active", "uptime":"2 days, 22 hours, 21 minutes, 49 seconds", "memory":"158.9 MB (%32.8) of 485 MB"}, { "name":"core_node5", "url":"http://ka-adh-1.ru-central1.internal:8983/solr/Collection0_shard1_replica_n2/", "numDocs":0, "status":"active", "uptime":"2 days, 22 hours, 21 minutes, 49 seconds", "memory":"240.4 MB (%49.9) of 481.5 MB", "leader":true}]}, { "shard":"shard2", "status":"healthy", "replicas":[ { "name":"core_node7", "url":"http://ka-adh-3.ru-central1.internal:8983/solr/Collection0_shard2_replica_n4/", "numDocs":0, "status":"active", "uptime":"2 days, 22 hours, 21 minutes, 49 seconds", "memory":"196 MB (%40) of 489.5 MB", "leader":true}, { "name":"core_node8", "url":"http://ka-adh-2.ru-central1.internal:8983/solr/Collection0_shard2_replica_n6/", "numDocs":0, "status":"active", "uptime":"2 days, 22 hours, 21 minutes, 49 seconds", "memory":"160.8 MB (%33.2) of 485 MB"}]}]}