When updating a Solr index with the DataImportHandler or one of the available Solr clients you don’t really need to bother with all the details of updates. Most clients just give a simplified “add”, “delete” and “commit” interface to Solr updates, issued as separate commands.
For most clients these commands are actually translated into Solr XML update messages. Taking a look at the documentation in the Solr wiki (http://wiki.apache.org/solr/UpdateXmlMessages) shows all the options available. It also shows that it is possible to combine multiple commands in a single request.
So if you need to do several deletes and add some documents this can all be done in a single request. This got me wondering, how does this actually work? Are all ‘commands’ executed in the order of the XML message?
To test this I have created a very simple Solr index with just two fields:
- id (int)
- name (string)
For posting XML update messages I use Curl:
curl http://localhost:8983/solr/core0/update? -H "Content-Type: text/xml" --data-binary @data.xml
This is the content of the data.xml file:
<update> <delete> <query>*:*</query> </delete> <add> <doc> <field name="id">1</field> <field name="name">item 1</field> </doc> <doc> <field name="id">2</field> <field name="name">item 2</field> </doc> <doc> <field name="id">3</field> <field name="name">item 3</field> </doc> </add> <commit/> </update>
The index can easily be reset by re-posting this XML. The delete all query ensures a complete cleanup.
For testing a complex update I created this XML:
<update> <delete> <id>3</id> </delete> <commit/> <add> <doc> <field name="id">4</field> <field name="name">item 4</field> </doc> </add> <rollback/> <add> <doc> <field name="id">5</field> <field name="name">item 5</field> </doc> <doc> <field name="id">6</field> <field name="name">item 6</field> </doc> </add> <delete> <id>1</id> </delete> <commit/> <optimize/> </update>
The expected behaviour is:
- we start with [1,2,3] in the index.
- we delete document 3 and commit, so we now have [1,2]
- add document 4 and rollback, so no change: [1,2]
- add document 5 and 6, delete 1 and commit+optimize: [2, 5, 6]
The outcome: just as expected! After posting the update the index contains document 2, 5 and 6. So Solr does handle the commands in the update XML message in the exact given order.
An important note: this example was made to test Solr. In most cases where the order of the commands matters something is wrong. In this test the order only matters because of the add and rollback of document 4. But in that case document 4 shouldn’t be in the update anyway. And normally you should only commit once, below all add/delete commands.
The other parts of this test are more useful. Like using Curl for testing XML update messages and adding multiple commands to a single update message. The most obvious example is adding multiple documents in a single update, and most clients support this. But combining multiple delete queries or even combining delete queries with adding of documents is not supported by most clients, while it can save a lot of requests.
There is a downside though, if any command fails everything below it will not be executed.