Full text search with TurboLucene

March 14, 2007

Krys Wilken has done a great job writing a simple interface on top of PyLucene named TurboLucene. PyLucene itself uses Lucene. Lucene is a very popular text search engine, and is used by large systems such as Wikipedia and CNet.

The TurboLucene tutorial uses Kid templates and SQLObject. I integrated it into my own project that uses Genshi and SQLAlchemy.

It’s worth mentioning that PyLucene must be installed manually because it is OS dependent. This adds complexity to making your application easy to install.

You must also disable Auto-Reload in your TurboGears configuration, which may slow down your development process.

More information about the query syntax and other details can be found at Lucene homepage.

Below is the additions to my project to get TurboLucene integrated (full repository here). So far I’m just indexing one type of objects.

Read the rest of this entry »


From Kid to Genshi – Changing template language

September 25, 2006

Genshi (formerly Markup) is a library that provides an integrated set of components for parsing, generating, and processing HTML, XML or other textual content for output generation on the web.

Replacing Kid with Genshi XML Template Language makes a lot of sense:

  • Better performance
  • Easier to debug
  • Uses standards like XPath and XInclude
  • TurboGears 1.1 will by default use Genshi as its template engine.

XInclude alone is a reason to use Genshi. It is an inclusion mechanism to facilitate modularity, and a recommendation by W3C.

Migrating to Genshi is very simple. Most things will work without change.

Below is an example of how i use Genshi to iterate over actions and include a template to display action details. I use the same template in other places to avoid duplication.
Read the rest of this entry »


Testing in TurboGears

September 11, 2006

I finally got time to continue on my GTD application. I wrote unit tests for all controller methods.

My first idea was to include a mock framework so I wouldn’t need to execute external logic in my tests, but then I realized that would be overkill. I’m using an in-memory database (SQLite) and executing the whole application stack (except for javascript). All tests runs within 10 seconds, which is acceptable.

TurboGears includes Nose, which is a test discovery and running process for unittest. Typing “nosetests” on the command line runs all tests in the project.

I’ve decided to later use AJAX in my application, and a good way to test the whole stack including javascript is to test with Selenium. I’ll leave this for later.

It just works

I have never felt such a joy developing a web application. I get into flow more often. Things just work and I rarely show the ‘what the heck happened’-face.

Read the rest of this entry »


Simplifying the model with assign_mapper

August 15, 2006

Thanks to all comments to my previous post, I managed to further simplify the model.

Changes:

  • I don’t have to define any variables for the model classes.
  • The session and metadata are imported from TurboGears.

Same functionality as before, but shorter (model.py at the bottom of the post).

A developer who doesn’t write unit tests will for sure write a bunch of hard to find bugs with a dynamically typed language like Python (like setting a wrong variable before an update: project.ttle=title). But hey, If it’s not tested, it’s broken.

Bruce Eckel has a good article for Java developers about weak typing: Strong Typing vs. Strong Testing

Read the rest of this entry »


Switching to SQLAlchemy

August 11, 2006

If you have a simple model, SQLObject may be enough. If you want more flexibility, check out SQLAlchemy.

SQLObject was so far enough for me, but I stumbled upon a weird issue when doing updates. Once in a while my selects returned old data that had been overwritten by a previous update.

SteveA had the same problem and showed me a workaround for this, but it sounded like too much work, and it was scary that this could happen.

I decided to try out SQLAlchemy. After some hair pulling I got it working (you can see the reason for the pulling at the end of the post).

You can get the latest version by typing

easy_install sqlalchemy

Make sure you have the latest 0.9x version of TurboGears.

The model

Using the model is almost as simple as before. The model definition is more complicated, but that’s something you won’t change very often.

model.py (compare to SQLObject version. Note that I’ve added some columns not shown in the previous model):

from sqlalchemy import *
import cherrypy

db = create_engine(cherrypy.config.get('sqlalchemy.dburi'),
echo=cherrypy.config.get('sqlalchemy.echo',0))
session = create_session(bind_to=db)
meta = BoundMetaData(db)

projects = Table('project', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
)
contexts = Table('context', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
)
actions = Table('action', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
    Column('project_id', Integer, ForeignKey('project.id')),
    Column('context_id', Integer, ForeignKey('context.id')),
    Column('notes', String(1000)),
    Column('priority', Integer, default=0),
    Column('closed', Boolean, default=0),
)
tags = Table('tag', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
)
tagaction = Table('tag_action', meta,
    Column('action_id', Integer, ForeignKey('action.id')),
    Column('tag_id', Integer, ForeignKey('tag.id'))
)
class Project(object):
    def __init__(self, title, notes):
        self.title = title
        self.notes = notes
    def __repr__(self):
        return self.title
class Action(object):
    def __init__(self, title, notes, project, context, 
           priority, closed):
        self.title = title
        self.notes = notes
        self.context = context
        self.project = project
        self.priority = priority
        self.closed = closed
    def __repr__(self):
        return self.title
class Tag(object):
    def __init__(self, title):
        self.title = title
    def __repr__(self):
        return self.title

class Context(object):
    def __init__(self, title):
        self.title = title
    def __repr__(self):
        return self.title
mapper(Tag, tags)
actionmapper = mapper(Action, actions, properties = {
    'tags' : relation(Tag, secondary=tagaction)
})
projectmapper = mapper(Project, projects, properties={
    'actions' : relation(Action, backref='project'),
})
contextmapper = mapper(Context, contexts, properties={
    'actions' : relation(Action, backref='context'),
})
 Read the rest of this entry »

Split the Controller

August 7, 2006

The default TurboGears skeleton project consists of one controller. I believe most projects would benefit from splitting it into sub-controllers. This lets you separate the logic into their own modules. There are many ways to do the split. I chose the same method used in the Fast Track project.
The code is now easier to work with.

Calling an action in for example the project sub-controller looks like this:

<form action="/project/save" method="post">

The main controller:


...
from subcontrollers.project import ProjectController
from subcontrollers.action import ActionController
from subcontrollers.tag import TagController
from subcontrollers.context import ContextController

class Root(controllers.RootController):
    project = ProjectController()
    action = ActionController()
    context = ContextController()
    tag = TagController()
    ...

Directory structure:
controllers.py
subcontrolles/
-project.py
-action.py
-context.py
-tag.py


Creating the model

August 2, 2006

I chose to create this simple model by hand. The code seen below is amazingly short. It is enough to handle CRUD-operations for all objects. It also does some magic middle table for the many-to-many relation between Action and Tag (I haven’t tested this yet though).

I am a big fan of testing. A unit test coverage below 70% gives me the creeps. But in this case, what should I test? I trust generated code. If I would have done this with Hibernate or iBatis, I would have written a bunch of test cases for it, simply because I don’t trust the amount of code and configurations lying around. But in this case it feels unnecessary to write any tests for the model. This saves a lot of time. Time will show if this was a wise thing to do.

model.py:

from sqlobject import *
from datetime import datetime
from turbogears.database import PackageHub
from turbogears.identity.soprovider import 
      TG_User, TG_Group, TG_Permission
hub = PackageHub("turbogtd")
__connection__ = hub

class Project(SQLObject):
    title = UnicodeCol(notNone=True)
    notes = UnicodeCol()
    actions = MultipleJoin("Action")

class Action(SQLObject):
    title = UnicodeCol(notNone=True)
    notes = UnicodeCol()
    project = ForeignKey("Project")
    context = ForeignKey("Context")
    tags = RelatedJoin("Tag")

class Tag(SQLObject):
    title = UnicodeCol(notNone=True, alternateID=True)
    actions = RelatedJoin("Action")

class Context(SQLObject):
    title = UnicodeCol(notNone=True, alternateID=True)

Follow

Get every new post delivered to your Inbox.