I’m back from a very relaxing week in Mexico. I strongly recommend this resort: Valentin Imperial Maya in Riviera Maya. Great place, great food!
Alright. Here are the show notes of my presentation on SOLR at Montreal on Rails on August 19th.
First, the slides are on SlideShare.
SOLR is a Java-based plugin. It is based on the Lucene technology. Other possible full-text search engine solutions are: Ferret, Ultra Sphinx, Xapian.
You basically install the acts_as_solr plugin, configure it and start the server using a rake task: rake solr:start. You also have to create temporary folders in the plugin folder.
script/plugin install git://github.com/railsfreaks/acts_as_solr.git mkdir vendor/plugins/acts_as_solr/solr/logs mkdir vendor/plugins/acts_as_solr/solr/tmp
Now look at the file config/solr.yml that was created by the plugin. You can customize it if you want. Then, generate documentation (very handy) and start SOLR:
rake doc:plugins rake solr:start
Then, you can test that SOLR is in fact running by going to: http://localhost:8982/solr/. This is a very handy tool to test the searches and verify that model instances have been properly indexed.
In your model, you simply have to add “acts_as_solr” and the model will be fulltext indexed. In my example, my model is named Tip. SOLR will index model instances when they are saved. To reindex existing instances, you can simply go through each of them and call save() or you can call rebuild_solr_index from the script/console:
script/console > reload! > Tip.rebuild_solr_index
To do a search, it’s very easy: Tip.search “something”.
Scores
Give the :scores option to the find method and results will have a solr_score attribute.
Tip.find_by_solr('foo', :scores => true)
number_to_percentage( tip.solr_score*100, :precision => 0 )
Additional fields
By default, SOLR indexes all model attributes. If you want to index a virtual attribute, give the option :additional_fields to acts_as_solr:
acts_as_solr :additional_fields => [:searchable_tags]
Specific fields
If you don’t want all the attributes to be indexed, use the :fields option to specify the attributes you want to have indexed (you can include virtual attributes):
acts_as_solr :fields => [:title, :body, :searchable_tags]
Boost
By default, all attributes have the same weight in the search. You can boost models/attributes by using the :boost option:
acts_as_solr :fields => [:body, {:title => {:boost => 100.0 }}, :featured, :searchable_tags], :boost => 10.0
Range
You can tell SOLR to treat an attribute as a integer or float range. This will allow you to search for intervals:
acts_as_solr :additional_fields => [ {:seconds => :range_integer} ]
Then, you can search for an interval:
Tip.find_by_solr('seconds:[0 TO 30]')
Pagination
The find_by_solr accepts pagination and sorting options: :limit, .
ffset,
rder
Multi-model search
You can search in multiple models by giving :models to the find_by_solr method. You have to invoke the method on a Model and include the other ones:
Tip.multi_solr_search( “pure”, :models => [Category,Comment] )
Return IDs only
Sometime, you only wanna have instances IDs instead of all their attributes. You might want to do that in order to perform a SQL query after the full-text search and limit the search to the IDs SOLR returned.
Tip.find_id_by_solr(‘pure’).docs
Facets
Faceting allows you to have statistics on result groups. For example, you could have the number of results per Tip category. This is a “advanced” topic and I encourage you to read the faceting article that you will find in my resources list below.
French accents
Now, what about french accents in a field? Boom… out-of-the-box, this SOLR plugin will treat them as whitespaces. So if you have “crédit” in a model, you will not be able to find it with “credit”. Look at the SOLR analyzer and you will see how it treats the indexing and search: http://localhost:8982/solr/admin/analysis.jsp?highlight=on
There is a way to fix this. You basically have to modify the filtering sequence in the SOLR schema (configuration). This is in the schema.xml file under vendor/plugins/acts_as_solr/solr/solr/conf. Modify the file with the following lines:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="French"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="French"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
The ISOLatin1AccentFilterFactory filter will take into account the french accents and replace them with their equivalent english letter. The SnowballPorterFilterFactory with french option will take into account the plural versions of some words. You can add additional filters (such as a HTML stripper one to remove HTML codes). Have a look at this page, it lists them all.
Caveat: those filters will apply to all the attributes. Now this works well if you are integrating SOLR in a french-only site, but it will not work so well on a bilingual site. This is where I want to eventually spend some time creating new field types based on language (i.e. text_fr, text_en). This would allow having different sets of filters by field type. I’ll write a blog entry when I get this done.
Resources
Look at the following links for additional information:
Acts as Solr Plugin
acts_as_solr : search and faceting
Advanced acts_as_solr
Solr: Indexing XML with Lucene and REST
acts_as_solr on GitHub
And read recipe 11 “Faceted Search with SOLR” in the Advanced Rails Recipe book.








[...] Here are the slides and more info! This entry was posted on Wednesday, August 20th, 2008 at 1:40 pm and is filed under Follow-up. [...]
good article.
Is anyone how to start solr in background mode in production mode
Do you use Capistrano? I remember having problems putting SOLR in background (capistrano would not stop).
If so, here’s the command I have in my recipe to start SOLR in brackground:
run “cd #{current_path} && nohup rake solr:start RAILS_ENV=#{rails_env} > log/solr.log 2> log/solr.err.log”
I also changed the SOLR rake file (plugins/acts_as_solr/lib/tasks/solr.rake) to not send the STDOUT and STEDERR to log files (just remove the redirections in the exec statement). My capistrano command takes care of redirecting those outputs to the log files.
Maybe this will help.
Please make it will_paginate compatible. There are some solutions out there but they break facets.