Everything on this page is specific to legacy setup of distributed search. Users trying out SolrCloud should not follow any of the steps or information below.
Update reorders (i.e., replica A may see update X then Y, and replica B may see update Y then X). deleteByQuery also handles reorders the same way, to ensure replicas are consistent. All replicas of a shard are consistent, even if the updates arrive in a different order on different replicas.
When not using SolrCloud, it is up to you to get all your documents indexed on each shard of your server farm. Solr supports distributed indexing (routing) in its true form only in the SolrCloud mode.
In the legacy distributed mode, Solr does not calculate universal term/doc frequencies. For most large-scale implementations, it is not likely to matter that Solr calculates TF/IDF at the shard level. However, if your collection is heavily skewed in its distribution across servers, you may find misleading relevancy results in your searches. In general, it is probably best to randomly distribute documents to your shards.
If a query request includes theshardsparameter, the Solr server distributes the request across all the shards listed as arguments to the parameter. Theshardsparameter uses this syntax:
host :port /base_url [,host :port /base_url ]*
For example, theshardsparameter below causes the search to be distributed across two Solr servers: solr1 and solr2, both of which are running on port 8983:
Rather than require users to include the shards parameter explicitly, it is usually preferred to configure this parameter as a default in the RequestHandler section ofsolrconfig.xml.
Do not add theshardsparameter to the standard requestHandler; otherwise, search queries may enter an infinite loop. Instead, define a new requestHandler that uses theshardsparameter, and pass distributed search requests to that handler.
With Legacy mode, only query requests are distributed. This includes requests to the SearchHandler (or any handler extending fromorg.apache.solr.handler.component.SearchHandler) using standard components that support distributed search.
As in SolrCloud mode, when shards.info=true, distributed responses will include information about the shard (where each shard represents a logically different index or physical location)
The following components support distributed search:
Distributed searching in Solr has the following limitations:
Formerly a limitation was that TF/IDF relevancy computations only used shard-local statistics. This is still the case by default. If your data isn't randomly distributed, or if you want more exact statistics, then remember to configure the ExactStatsCache.
Like in SolrCloud mode, inter-shard requests could lead to a distributed deadlock. It can be avoided by following the instructions here.
For simple functionality testing, it's easiest to just set up two local Solr servers on different ports. (In a production environment, of course, these servers would be deployed on separate machines.)
Make two Solr home directories:#666666solid
Start the two Solr instances#666666solid
Create a core on both the nodes with the sample_techproducts_configs.#666666solid
In the third window, index an example document to each of the server:#666666solid
Search on the node on port 8983:#666666solid
This should bring back one document.
Search on the node on port 8984:#666666solid
This should also bring back a single document.
Now do a distributed search across both servers with your browser orcurl. In the example below, an extra parameter 'fl' is passed to restrict the returned fields to id and name.#666666solid
This should contain both the documents as shown below:#666666solid 0 8 *:* localhost:8983/solr/core1,localhost:8984/solr/core1 true id,name xml 3007WFP Dell Widescreen UltraSharp 3007WFP VA902B ViewSonic VA902B - flat panel display - TFT - 19" ]]>