labs

Test-Driven Fulltext Search in 2 commits with Solr, Sunspot and sunspot_matchers

Search is a feature request we get frequently at Pivotal Labs. It’s easy to understand why– if your users can search your app, they can navigate by thinking about what they want, instead of trying to remember where they put it.

There are a number of different tools to implement search– there are dedicated fulltext datastores such as ElasticSearch and Apache Solr, plus conventional relational databases like PostgreSQL that have fulltext search support built-in. Of these, Solr is one of the more established options, and it’s the one I’ve seen used most here at Pivotal Labs. For Ruby projects, Solr has gem support in the form of Sunspot, which provides simple declarative DSLs for indexing and searching your data, and gets you out the door with working search in a very modest amount of code. To show just how simple TTD-ing out Sunspot search can be, let’s implement search.

Setting up

Let’s pretend we’re working on a video sharing app. Anybody can comment on any video, and we want comments to be searchable.

Gemfile

gem 'sunspot_rails'

group :development, :test do
 gem 'sunspot_matchers'
 gem 'sunspot_solr'
 gem 'rspec-rails'
end

spec/support/sunspot_matchers.rb

RSpec.configure do |c|
  c.include SunspotMatchers
  c.before do
    Sunspot.session = SunspotMatchers::SunspotSessionSpy.new(Sunspot.session)
  end
end

Let’s use scaffolds so we’ll have a functional app running right away:

% bundle install 
% rails g rspec:install
% rails g scaffold video url:string --no-view-specs
% rails g scaffold comment video_id:integer text:text --no-view-specs
% rake db:migrate test:prepare
% rake

You should see a few dozen passing tests, and you can try creating a video and a comment or two.

The first commit: Making comments searchable

We’re storing the comment text typed by our users in the ‘text’ column, and that’s what we want to make searchable. The sunspot_matchers gem we added to our Gemfile makes this easy to express in our spec:

spec/models/comment_spec.rb

require 'spec_helper'

describe Comment do
  it { should have_searchable_field(:text) }
end

The test fails:


% bundle exec rspec spec/models/comment_spec.rb
F

Failures:
1) Comment should should have searchable field text
 Failure/Error: it { should have_searchable_field(:text) }
 expected class: Comment to have searchable field: text, but Sunspot was not configured on Comment
 # ./spec/models/comment_spec.rb:4:in `block (2 levels) in <top (required)>'

We can make the test green by telling Sunspot which fields to make searchable on our Comment model.

app/models/comment.rb

class Comment < ActiveRecord::Base
  belongs_to :video
  attr_accessible :text, :video_id

  searchable do
    text :text
  end
end
% bundle exec rspec spec/models/comment_spec.rb
.

Finished in 0.02179 seconds
1 example, 0 failures

Perfect. That’s our first commit.

% git add -A
% git commit -m "make Comment model searchable"

We’ve now got our Rails app sending updates to Solr every time we create, update, or delete a comment. You may already get the sense that quite a bit more must be going on than the 3 lines of implementation code let on, but we’ll come back to that in a moment. In the meantime, let’s do do the other half of the search implementation: querying the index.

The Second Commit: Making a Controller Perform a Search

To add search to our app, we’re going to modify the #index method on our CommentsController so that when we provide a search query to the controller, we get back a filtered list of comments. This test is almost as easy as the last one we wrote:

spec/controllers/comments_controller_spec.rb

describe "GET index" do
  context "with a search term" do
    it "performs a search for matching comment text" do
      get :index, {search: "sandwiches"}, valid_session
      Sunspot.session.should be_a_search_for(Comment)
      Sunspot.session.should have_search_params(:fulltext, "sandwiches")
    end
  end
end

The test should fail because we’re not yet doing a search:

% bundle exec rspec spec/controllers/comments_controller_spec.rb
  1) CommentsController GET index with a search term performs a search for matching comment text
     Failure/Error: Sunspot.session.should be_a_search_for(Comment)
     RuntimeError:
       no search found
     # ./spec/controllers/comments_controller_spec.rb:65:in `block (5 levels) in '

To pass that spec, we’ll tell the controller to perform a Sunspot search of comments if it receives a search term:
app/controllers/comments_controller.rb

class CommentsController < ApplicationController
  # GET /comments
  # GET /comments.json
  def index
    if params[:search]
      @comments = Sunspot.search(Comment) do
        fulltext params[:search]
      end.results
    else
      @comments = Comment.all
    end

    respond_to do |format|
      format.html # index.html.erb
      format.json { render json: @comments }
    end
  end

  # ... more controller methods
end
% bundle exec rspec spec/controllers/comments_controller_spec.rb
..................

Finished in 0.83291 seconds
18 examples, 0 failures

やったー! That’s our second commit right there.

% git add -A
% git commit -m "teach the CommentsController how to do searches."

Meanwhile, back in Reality

Ok, so we’ve written some application code, and we’ve written some tests. But does it actually work?

In fact, it’s pretty easy to verify. Open up a terminal window, cd to your app’s directory, and start up the test Solr server that we sneakily bundled while updating the Gemfile.

% rails generate sunspot_rails:install
% bundle exec rake sunspot:solr:start

And start your rails server if it’s not running yet.

% rails s

You can type the url by hand if you want: visiting localhost:3000/comments?search=omg will return all the comments matching that search term. You’ve now got a rudimentary search feature, waiting to fleshed out with proper pagination and sorting[1].

Your API is very nice but what did you just do with my data?

Sunspot’s big contribution is minimizing the amount of code you have to write to integrate search into your app’s business domain. With Sunspot, you only write code for the behavior that’s specific to your application: which of your domain models you want to be able to search and how you want your users to initiate those searches.

To make use of the the minimal code you add to your models, sunspot_rails is also injecting a lot of code of its own that handles communication with the Solr server: it adds callbacks to ActiveRecord models you’ve configured for indexing, and it also sends Solr a crucial go-ahead “commit”message at the end of any controller action that modifies an indexed model, telling Solr to apply all of the changes you’ve made.

Suspect the commit messages

These commit messages are very important; If Solr’s not doing what you expect, there’s a good chance that commit messages are involved. The tradeoff for Sunspot’s convenience is that it hides the complexity of commits from you, so it’s easy not to realize that there’s more you need to know about. Once you do know a bit about how the Solr server works, however, you’ll benefit from Sunspot’s simplicity on all your future projects.

Here are the important things to know about commits: they’re slow, they’re blocking calls, and a commit has to happen before documents actually show up in the index[2]. But Solr can also be configured to perform commits automatically in the background [3][4]. If you’re using a 3rd-party host for your Solr servers (such as the WebSolr add-on for Heroku), you should experiment or simply ask how the Solr servers are configured– they may be configured to commit automatically already.

You’ll need to find a configuration that works for your application’s needs: you may want a guarantee that documents are indexed transactionally with changes to your primary database, in which case you need to be committing explicitly in your Rails app or background jobs. If some lag time is ok, then you also need to make sure that the autocommit interval used by your server matches the performance you want to get out of your app.

TL;DR

It’s extremely easy to integrate search functionality into your Rails app with a test-first approach. But Solr’s server-side configuration affects its behavior a lot, so make sure you understand the configuration of your Solr servers too.

[1]https://github.com/sunspot/sunspot/wiki/Ordering-and-pagination
[2]http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
[3]http://wiki.apache.org/solr/SolrConfigXml#indexConfig_Section
[4]http://wiki.apache.org/solr/NearRealtimeSearch