Full text search with TurboLucene

Krys Wilken has done a great job writing a simple interface on top of PyLucene named TurboLucene. PyLucene itself uses Lucene. Lucene is a very popular text search engine, and is used by large systems such as Wikipedia and CNet.

The TurboLucene tutorial uses Kid templates and SQLObject. I integrated it into my own project that uses Genshi and SQLAlchemy.

It’s worth mentioning that PyLucene must be installed manually because it is OS dependent. This adds complexity to making your application easy to install.

You must also disable Auto-Reload in your TurboGears configuration, which may slow down your development process.

More information about the query syntax and other details can be found at Lucene homepage.

Below is the additions to my project to get TurboLucene integrated (full repository here). So far I’m just indexing one type of objects.

controllers.py:

import turbolucene
from turbolucene import *
...
def make_document(project):
   """Turn project into a TurboLucene document."""
   document = Document()
   document.add(Field('id', str(project.id), STORE, UN_TOKENIZED))
   document.add(Field('title', project.title, STORE, UN_TOKENIZED))
   document.add(Field('notes', project.notes, COMPRESS, TOKENIZED))
   return document

def results_formatter(results):
   """Return the projects that match the ids provided by TurboLucene"""
   if results:
      return session.query(Project).select(Project.c.id.in_(*results))

turbolucene.start(make_document, ['notes'], results_formatter)

class Root(controllers.RootController):
   ...
   @expose(template="kaizen.templates.search", content_type='text/html; charset=utf-8')
   @error_handler()
   def search(self, query, **keywords):
      results = turbolucene.search(query)
      return dict(results=results, query=query)

project.py:

   ...
   def new(self, tg_errors=None, **kw):
   ...
      turbolucene.add(project)  

   ...
   def update(self, id, tg_errors=None, **kw):
   ...
      turbolucene.update(project)

   ...
   def delete(self, id, tg_errors=None, **kw):
   ...
      turbolucene.remove(project)

search.html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns:py="http://genshi.edgewall.org/"
xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="master.html" />
<head>
  <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
  <title>Search projects</title>
</head>
<body>
  <div id="search">
    <form action="/search" method="post" >
      <fieldset>
        <legend>Search</legend>
        <p>
          <label for="title">Query</label>
          <input type="text" id="query" name="query"/>
        </p>
        <input type="submit" value="Search"/>
      </fieldset>
    </form>
  </div>

  <div id="res" py:if="query">
    <p py:if="not results">
      No match
    </p>
    <ul py:if="results">
      <li py:for="project in results">
        <a href="${tg.url('/project/load/' + str(project.id))}"
          py:content="project.title">Project title</a>
      </li>
    </ul>
  </div>
</body>
</html>
About these ads

7 Responses to Full text search with TurboLucene

  1. Ansel says:

    Thanks, Krys! Concrete examples like this are really helpful. I’m looking forward to trying out TurboLucene.

  2. Krys Wilken says:

    Hi there,

    Nice article! Thanks for writing it! :-D

    I’m glad TurboLucene is useful for you.

    I will be releasing the next version soon and it will include multi-language support. :-)

    @Ansel: I did not write this. This is not my blog. But you are right, concrete examples are great. I hope TurboLucene meets your needs too. :-)

    Thanks again.

  3. PyArticles says:

    Very good articles. Please check out at Python Articles

  4. Druze says:

    Somehow i missed the point. Probably lost in translation :) Anyway … nice blog to visit.

    cheers, Druze!

  5. Alexwebmaster says:

    Hello webmaster
    I would like to share with you a link to your site
    write me here preonrelt@mail.ru

  6. Jason Madsen says:

    Cool Thanks for this post. I am starting django and this will be a big help.

  7. Thanks for finally talking about >Full text search with
    TurboLucene | From Java to Python <Loved it!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: