Full text search with TurboLucene

Krys Wilken has done a great job writing a simple interface on top of PyLucene named TurboLucene. PyLucene itself uses Lucene. Lucene is a very popular text search engine, and is used by large systems such as Wikipedia and CNet.

The TurboLucene tutorial uses Kid templates and SQLObject. I integrated it into my own project that uses Genshi and SQLAlchemy.

It’s worth mentioning that PyLucene must be installed manually because it is OS dependent. This adds complexity to making your application easy to install.

You must also disable Auto-Reload in your TurboGears configuration, which may slow down your development process.

More information about the query syntax and other details can be found at Lucene homepage.

Below is the additions to my project to get TurboLucene integrated (full repository here). So far I’m just indexing one type of objects.

controllers.py:

import turbolucene
from turbolucene import *
...
def make_document(project):
   """Turn project into a TurboLucene document."""
   document = Document()
   document.add(Field('id', str(project.id), STORE, UN_TOKENIZED))
   document.add(Field('title', project.title, STORE, UN_TOKENIZED))
   document.add(Field('notes', project.notes, COMPRESS, TOKENIZED))
   return document

def results_formatter(results):
   """Return the projects that match the ids provided by TurboLucene"""
   if results:
      return session.query(Project).select(Project.c.id.in_(*results))

turbolucene.start(make_document, ['notes'], results_formatter)

class Root(controllers.RootController):
   ...
   @expose(template="kaizen.templates.search", content_type='text/html; charset=utf-8')
   @error_handler()
   def search(self, query, **keywords):
      results = turbolucene.search(query)
      return dict(results=results, query=query)

project.py:

   ...
   def new(self, tg_errors=None, **kw):
   ...
      turbolucene.add(project)  

   ...
   def update(self, id, tg_errors=None, **kw):
   ...
      turbolucene.update(project)

   ...
   def delete(self, id, tg_errors=None, **kw):
   ...
      turbolucene.remove(project)

search.html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns:py="http://genshi.edgewall.org/"
xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="master.html" />
<head>
  <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
  <title>Search projects</title>
</head>
<body>
  <div id="search">
    <form action="/search" method="post" >
      <fieldset>
        <legend>Search</legend>
        <p>
          <label for="title">Query</label>
          <input type="text" id="query" name="query"/>
        </p>
        <input type="submit" value="Search"/>
      </fieldset>
    </form>
  </div>

  <div id="res" py:if="query">
    <p py:if="not results">
      No match
    </p>
    <ul py:if="results">
      <li py:for="project in results">
        <a href="${tg.url('/project/load/' + str(project.id))}"
          py:content="project.title">Project title</a>
      </li>
    </ul>
  </div>
</body>
</html>

This entry was posted on Wednesday, March 14th, 2007 at 11:32 am and is filed under TurboGears. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

8 Responses to Full text search with TurboLucene

Ansel says:

March 14, 2007 at 9:00 pm

Thanks, Krys! Concrete examples like this are really helpful. I’m looking forward to trying out TurboLucene.

Reply
Krys Wilken says:

March 15, 2007 at 6:31 am

Hi there,

Nice article! Thanks for writing it! 😀

I’m glad TurboLucene is useful for you.

I will be releasing the next version soon and it will include multi-language support. 🙂

@Ansel: I did not write this. This is not my blog. But you are right, concrete examples are great. I hope TurboLucene meets your needs too. 🙂

Thanks again.

Reply
PyArticles says:

May 17, 2007 at 7:17 am

Very good articles. Please check out at Python Articles

Reply
Druze says:

June 19, 2008 at 2:38 pm

Somehow i missed the point. Probably lost in translation 🙂 Anyway … nice blog to visit.

cheers, Druze!

Reply
Alexwebmaster says:

March 3, 2009 at 5:28 pm

Hello webmaster
I would like to share with you a link to your site
write me here preonrelt@mail.ru

Reply
Jason Madsen says:

March 7, 2010 at 2:22 am

Cool Thanks for this post. I am starting django and this will be a big help.

Reply
StudioPLUS Deluxe Studios says:

July 31, 2013 at 11:20 am

Thanks for finally talking about >Full text search with
TurboLucene | From Java to Python <Loved it!

Reply
marlotran91347 says:

April 8, 2016 at 5:33 pm

cutie boyfriends are beyond the greatest!! how did we ever do without!? Click https://twitter.com/moooker1

Reply