Partial document updates in Solr
Overview
Solr supports three major approaches to updating documents partially. This article describes each strategy with examples, namely:
-
Atomic updates. Allow changing one or more document fields by reindexing the target document internally.
-
In-place updates. A subset of atomic updates that allow updating numeric fields without reindexing the entire document.
-
Optimistic concurrency. Allows conditional updates based on the document version.
In this article, the update examples are designed to modify the following sample document:
{
"id": 1,
"post_name": "Sample blog post...",
"categories": ["leisure", "hobby"],
"post_rank": 75.0,
"post_date": "2024-01-02",
"post_text": "Lorem ipsum dolor sit amet ...",
"description": "A sample post for testing purposes"
}
The sample document is stored in a Solr collection named test_collection
.
For information on adding a document to the index, see Solr indexing overview.
Atomic updates
This type of updates allows changing individual document fields. This approach should be chosen when the speed of index modifications is critical to the client applications.
Solr provides several modifiers that can be used to update individual document fields. These modifiers are as follows:
-
set
— sets a new value or replaces the existing field’s value. Removes the value ifnull
or an empty list is specified. Accepts a single value or a list. -
add
— adds the specified value(s) to amultiValued
field. Accepts a single value or a list. -
add-distinct
— adds the specified value(s) to amultiValued
field only if the value is not already present in the index. Accepts a single value or a list. -
remove
— removes all occurrences of the specified value from amultiValued
field. Accepts a single value or a list. -
removeregex
— removes all occurrences matching the specified regex from amultiValued
field. Accepts a single value or a list. -
inc
— increments a numeric field by a specific value. Accepts a single numeric value.
To perform an update, specify the corresponding modifier as a value of the field to be updated and submit the document to Solr. The following example updates several fields of the sample document:
{
"id": 1,
"post_name": {"set": "Updated post name ..."},
"categories": {"add": ["science"]},
"post_rank": {"inc": 5},
"post_date": "2024-01-02",
"post_text": "Lorem ipsum dolor sit amet ...",
"description": {"removeregex": ".*testing.*"}
}
$ curl -X POST 'http://ka-adh-1.ru-central1.internal:8983/solr/test_collection/update?commit=true' -H 'Content-Type: application/json' --data-binary '[{
"id": 1,
"post_name": {"set": "Updated post name ..."},
"categories": {"add": ["science"]},
"post_rank": {"inc": 5},
"post_date": "2024-01-02",
"post_text": "Lorem ipsum dolor sit amet ...",
"description": {"removeregex": ".*testing.*"}
}]'
When Solr receives such a document, it recognizes the modifier expressions and updates only the corresponding document’s fields. After the update, the document has the following view:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1726172218281"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"id":"1",
"post_name":["Updated post name ..."],
"categories":["leisure",
"hobby",
"science"],
"post_rank":[80.0],
"post_date":["2024-01-02T00:00:00Z"],
"post_text":["Lorem ipsum dolor sit amet ..."],
"_version_":1810022770405277696}]
}}
Update nested documents
Solr allows modifying, adding, and removing nested documents using atomic updates. The process is very similar to updating non-nested documents, the major distinctions that should be kept in mind are as follows:
-
An update operation of a nested object must be routed to the appropriate Solr shard (the one that stores the parent document). Since Solr selects the target shard based on the document’s ID, it is important to use the ID of the root document, not the ID of the child document being updated. To address this requirement, you can either specify a router explicitly (via
_route_
parameter) or you can use the compositeId router (used by default) to direct update operations to the appropriate shard. For more information on routing rules for updating child elements, see Solr documentation. -
When updating nested objects, it is mandatory to tell Solr which document is the parent for the object being updated. This can be done by specifying the
_root_
field in the update request. -
Solr attempts to apply the updates to the entire tree of nested documents instead of modifying individual objects, which may result in additional overhead.
Assume there is the following document with nested objects (comments) that needs to be updated:
{
"id": "post2",
"name_s": "Another demo post...",
"text_t": "Foo Bar Buzz ....",
"content_type": "post",
"comments":
[
{
"id": "post2!comment3",
"author_s": "Luke",
"text_t": "I like this post!",
"rated_i": 4,
"content_type": "comment"
}
]
}
For example, to increment the numeric rated_i
field using atomic updates, you have to submit the following document to Solr:
{
"id": "post2!comment3", (1)
"_root_": "post2", (2)
"rated_i": { "inc": 1 }
}
1 | The ID of the nested document to update.
Notice that the ID is composite (contains the ! char).
This allows the default compositeId router to route the update operation to the appropriate Solr shard. |
2 | The _root_ field is mandatory to allow Solr to identify the parent object whose child will be updated.
If not specified, Solr rejects the update. |
The result of the update:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1726172218281"}},
"response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
{
"id":"post2!comment3",
"author_s":"Luke",
"text_t":"I like this post!",
"rated_i":5,
"content_type":["comment"],
"_version_":1810022942789074944},
{
"id":"post2",
"name_s":"Another demo post...",
"text_t":"Foo Bar Buzz ....",
"content_type":["post"],
"_version_":1810022942789074944}]
}}
Submitting the following document to Solr adds another nested object to the parent document ("id": "post2"
):
{
"id": "post2",
"comments": { "add": { "id": "post2!comment4",
"author_s": "Rick Sanchez",
"text_t": "Another added comment here..",
"rated_i": 5,
"content_type": "comment"
} }
}
The result:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1726172218281"}},
"response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
{
"id":"post2!comment3",
"author_s":"Luke",
"text_t":"I like this post!",
"rated_i":5,
"content_type":["comment"],
"_version_":1810023009958756352},
{
"id":"post2!comment4",
"author_s":"Rick Sanchez",
"text_t":"Another added comment here..",
"rated_i":5,
"content_type":["comment"],
"_version_":1810023009958756352},
{
"id":"post2",
"name_s":"Another demo post...",
"text_t":"Foo Bar Buzz ....",
"content_type":["post"],
"_version_":1810023009958756352}]
}}
Submitting the following document to Solr removes the nested document by ID:
{
"id": "post2",
"comments": {
"remove": {
"id": "post2!comment4"
}
}
}
The result:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1726172218281"}},
"response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
{
"id":"post2!comment3",
"author_s":"Luke",
"text_t":"I like this post!",
"rated_i":5,
"content_type":["comment"],
"_version_":1810023111907606528},
{
"id":"post2",
"name_s":"Another demo post...",
"text_t":"Foo Bar Buzz ....",
"content_type":["post"],
"_version_":1810023111907606528}]
}}
In-place updates
In-place updates are a subset of atomic updates. However, during a regular atomic update, the entire document is reindexed internally. With in-place updates, only the required fields are affected, and the rest of the document remains intact. Thus, the efficiency of in-place updates does not depend on the number of fields in the updated document. Apart from the internal differences regarding efficiency, there is no functional difference between atomic updates and in-place updates.
An update operation runs as an in-place update if the fields to be updated meet the following requirements:
-
The fields are numeric and have the following properties:
indexed="false"
,stored="false"
,multiValued="false"
, anddocValues="true"
. -
The updated document has the
_version_
field with properties:indexed="false"
,stored="false"
,multiValued="false"
, anddocValues="true"
. -
The CopyField destination of the updated field (if any) is numeric and has the following properties:
indexed="false"
,stored="false"
,multiValued="false"
, anddocValues="true"
.
For in-place updates, Solr supports the following modifiers:
-
set
— sets a new value or replaces the existing field’s value. -
inc
— increments a numeric field by a specific value.
To update numeric fields using the in-place approach, the corresponding fields must be defined in the schema. For example:
<field name="post_rank" type="float" indexed="false" stored="false" docValues="true"/>
To update the document from the example, submit the following document to Solr:
{
"id": 1,
"post_rank": {
"set": 100
}
}
In this case, the document will be modified using the in-place update strategy. The results of the update operation:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1726172218281"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"id":"1",
"post_name":["Updated post name ..."],
"categories":["leisure",
"hobby",
"science"],
"post_rank":[100.0],
"post_date":["2024-01-02T00:00:00Z"],
"post_text":["Lorem ipsum dolor sit amet ..."],
"_version_":1810023165979525120}]
}}
Optimistic concurrency
Optimistic concurrency (a.k.a. optimistic locking) is a Solr feature that allows client applications to ensure that the same document is not modified by multiple clients in parallel.
This feature makes use of the _version_
field and expects this field to be present in every document in the Solr index.
By default, Solr schema includes the _version_
field, and the field is automatically added to each new document.
During an update, Solr compares the _version_
value from the index with the value provided in the update request, thus ensuring that the document was not modified by other clients.
To update a document using the optimistic concurrency approach, the client has to pass a correct _version_
value along with the document.
_version_
is an ordinary Solr field that can be queried just like any other field, for example:
$ curl -X GET 'http://ka-adh-1.ru-central1.internal:8983/solr/test_collection/query?q=*:*&fl=id,post_name,_version_&omitHeader=true' -H 'Content-Type: application/json'
The response:
{
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"id":"1",
"post_name":["Sample blog post..."],
"_version_":1809999244663193600}]
}}
You can pass the document’s version to Solr in two ways:
-
Using the
_version_=<version>
query parameter in the URL. For example:$ curl -X POST 'http://ka-adh-1.ru-central1.internal:8983/solr/test_collection/update?_version_=123' -H 'Content-Type: application/json' --data-binary '[{"id": 1, "updated_field": "updated value"}]'
-
By embedding the
_version_
value in the update document. For example:{ "id": 1, "_version_": 123, "post_rank": {"set": 80} }
This approach is handy when documents are sent in a batch and different
_version_
values need to be specified for each document.
If Solr receives an invalid version, it rejects the update, returns the 409 HTTP response and the following error:
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"version conflict for 1 expected=12345 actual=1809999244663193600",
"code":409}
If a valid document version is provided, the update succeeds and Solr generates a new version for the updated document. The following request successfully updates the test document by passing a valid version number as a query parameter.
$ curl -X POST 'http://ka-adh-1.ru-central1.internal:8983/solr/test_collection/update?_version_=1810003184734699520&versions=true&omitHeader=true' -H 'Content-Type: application/json' --data-binary '[{"id":1,"post_rank": {"set": 80}}]'
The successful update response:
{
"adds":[
"1",1810019385852559360]
}
TIP
The versions=true query parameter forces Solr to include the document version in each response.
This may be useful to avoid redundant GET requests to learn a document’s version.
|