Full text search with TurboLucene

March 14, 2007

Krys Wilken has done a great job writing a simple interface on top of PyLucene named TurboLucene. PyLucene itself uses Lucene. Lucene is a very popular text search engine, and is used by large systems such as Wikipedia and CNet.

The TurboLucene tutorial uses Kid templates and SQLObject. I integrated it into my own project that uses Genshi and SQLAlchemy.

It’s worth mentioning that PyLucene must be installed manually because it is OS dependent. This adds complexity to making your application easy to install.

You must also disable Auto-Reload in your TurboGears configuration, which may slow down your development process.

More information about the query syntax and other details can be found at Lucene homepage.

Below is the additions to my project to get TurboLucene integrated (full repository here). So far I’m just indexing one type of objects.

Read the rest of this entry »


From Kid to Genshi – Changing template language

September 25, 2006

Genshi (formerly Markup) is a library that provides an integrated set of components for parsing, generating, and processing HTML, XML or other textual content for output generation on the web.

Replacing Kid with Genshi XML Template Language makes a lot of sense:

  • Better performance
  • Easier to debug
  • Uses standards like XPath and XInclude
  • TurboGears 1.1 will by default use Genshi as its template engine.

XInclude alone is a reason to use Genshi. It is an inclusion mechanism to facilitate modularity, and a recommendation by W3C.

Migrating to Genshi is very simple. Most things will work without change.

Below is an example of how i use Genshi to iterate over actions and include a template to display action details. I use the same template in other places to avoid duplication.
Read the rest of this entry »


Testing in TurboGears

September 11, 2006

I finally got time to continue on my GTD application. I wrote unit tests for all controller methods.

My first idea was to include a mock framework so I wouldn’t need to execute external logic in my tests, but then I realized that would be overkill. I’m using an in-memory database (SQLite) and executing the whole application stack (except for javascript). All tests runs within 10 seconds, which is acceptable.

TurboGears includes Nose, which is a test discovery and running process for unittest. Typing “nosetests” on the command line runs all tests in the project.

I’ve decided to later use AJAX in my application, and a good way to test the whole stack including javascript is to test with Selenium. I’ll leave this for later.

It just works

I have never felt such a joy developing a web application. I get into flow more often. Things just work and I rarely show the ‘what the heck happened’-face.

Read the rest of this entry »


Simplifying the model with assign_mapper

August 15, 2006

Thanks to all comments to my previous post, I managed to further simplify the model.

Changes:

  • I don’t have to define any variables for the model classes.
  • The session and metadata are imported from TurboGears.

Same functionality as before, but shorter (model.py at the bottom of the post).

A developer who doesn’t write unit tests will for sure write a bunch of hard to find bugs with a dynamically typed language like Python (like setting a wrong variable before an update: project.ttle=title). But hey, If it’s not tested, it’s broken.

Bruce Eckel has a good article for Java developers about weak typing: Strong Typing vs. Strong Testing

Read the rest of this entry »


Switching to SQLAlchemy

August 11, 2006

If you have a simple model, SQLObject may be enough. If you want more flexibility, check out SQLAlchemy.

SQLObject was so far enough for me, but I stumbled upon a weird issue when doing updates. Once in a while my selects returned old data that had been overwritten by a previous update.

SteveA had the same problem and showed me a workaround for this, but it sounded like too much work, and it was scary that this could happen.

I decided to try out SQLAlchemy. After some hair pulling I got it working (you can see the reason for the pulling at the end of the post).

You can get the latest version by typing

easy_install sqlalchemy

Make sure you have the latest 0.9x version of TurboGears.

The model

Using the model is almost as simple as before. The model definition is more complicated, but that’s something you won’t change very often.

model.py (compare to SQLObject version. Note that I’ve added some columns not shown in the previous model):

from sqlalchemy import *
import cherrypy

db = create_engine(cherrypy.config.get('sqlalchemy.dburi'),
echo=cherrypy.config.get('sqlalchemy.echo',0))
session = create_session(bind_to=db)
meta = BoundMetaData(db)

projects = Table('project', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
)
contexts = Table('context', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
)
actions = Table('action', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
    Column('project_id', Integer, ForeignKey('project.id')),
    Column('context_id', Integer, ForeignKey('context.id')),
    Column('notes', String(1000)),
    Column('priority', Integer, default=0),
    Column('closed', Boolean, default=0),
)
tags = Table('tag', meta,
    Column('id', Integer, primary_key=True),
    Column('title', String(1000), nullable = False),
)
tagaction = Table('tag_action', meta,
    Column('action_id', Integer, ForeignKey('action.id')),
    Column('tag_id', Integer, ForeignKey('tag.id'))
)
class Project(object):
    def __init__(self, title, notes):
        self.title = title
        self.notes = notes
    def __repr__(self):
        return self.title
class Action(object):
    def __init__(self, title, notes, project, context,
           priority, closed):
        self.title = title
        self.notes = notes
        self.context = context
        self.project = project
        self.priority = priority
        self.closed = closed
    def __repr__(self):
        return self.title
class Tag(object):
    def __init__(self, title):
        self.title = title
    def __repr__(self):
        return self.title

class Context(object):
    def __init__(self, title):
        self.title = title
    def __repr__(self):
        return self.title
mapper(Tag, tags)
actionmapper = mapper(Action, actions, properties = {
    'tags' : relation(Tag, secondary=tagaction)
})
projectmapper = mapper(Project, projects, properties={
    'actions' : relation(Action, backref='project'),
})
contextmapper = mapper(Context, contexts, properties={
    'actions' : relation(Action, backref='context'),
})
 Read the rest of this entry »

Split the Controller

August 7, 2006

The default TurboGears skeleton project consists of one controller. I believe most projects would benefit from splitting it into sub-controllers. This lets you separate the logic into their own modules. There are many ways to do the split. I chose the same method used in the Fast Track project.
The code is now easier to work with.

Calling an action in for example the project sub-controller looks like this:

<form action="/project/save" method="post">

The main controller:


...
from subcontrollers.project import ProjectController
from subcontrollers.action import ActionController
from subcontrollers.tag import TagController
from subcontrollers.context import ContextController

class Root(controllers.RootController):
    project = ProjectController()
    action = ActionController()
    context = ContextController()
    tag = TagController()
    ...

Directory structure:
controllers.py
subcontrolles/
-project.py
-action.py
-context.py
-tag.py


Creating the model

August 2, 2006

I chose to create this simple model by hand. The code seen below is amazingly short. It is enough to handle CRUD-operations for all objects. It also does some magic middle table for the many-to-many relation between Action and Tag (I haven’t tested this yet though).

I am a big fan of testing. A unit test coverage below 70% gives me the creeps. But in this case, what should I test? I trust generated code. If I would have done this with Hibernate or iBatis, I would have written a bunch of test cases for it, simply because I don’t trust the amount of code and configurations lying around. But in this case it feels unnecessary to write any tests for the model. This saves a lot of time. Time will show if this was a wise thing to do.

model.py:

from sqlobject import *
from datetime import datetime
from turbogears.database import PackageHub
from turbogears.identity.soprovider import
      TG_User, TG_Group, TG_Permission
hub = PackageHub("turbogtd")
__connection__ = hub

class Project(SQLObject):
    title = UnicodeCol(notNone=True)
    notes = UnicodeCol()
    actions = MultipleJoin("Action")

class Action(SQLObject):
    title = UnicodeCol(notNone=True)
    notes = UnicodeCol()
    project = ForeignKey("Project")
    context = ForeignKey("Context")
    tags = RelatedJoin("Tag")

class Tag(SQLObject):
    title = UnicodeCol(notNone=True, alternateID=True)
    actions = RelatedJoin("Action")

class Context(SQLObject):
    title = UnicodeCol(notNone=True, alternateID=True)

My first TurboGears project

July 25, 2006

My first TurboGears project will be a web-based GTD application. GTD stands for Getting Things Done, and is a method for managing time and commitment.

Of all my ideas I chose this because of the interesting model, which will consist of one-to-many and many-to-many relations. It also has all kinds of database operations (CRUD).
I also hope to learn some advanced kid templating, since the UI will consist of many panels: header, footer, left, right and middle.

The Model

Main object is the Project, which has Actions (tasks). When all actions are done, the project is complete. Actions belongs to a Context, which describes where this Action can be done (home,computer,phone,work etc).
Actions are usually listed by Context or Project.

Actions also has Tags. They help to find things, and gives me a chance to try out many-to-many relations.

I will add more functionality to the model later, for example a history view of changes.

Here’s a simplified diagram:

starter model
The User Interface

I have used a GTD-application before called d3. It has such a good interface that I will use it as a base for my project.


Model Designer and Catwalk

July 18, 2006

TurboGears comes with a collection of web-based tools bundled into something called Toolbox. I checked out Model Designer and Catwalk.

Toolbox


Model Designer

The Model Designer lets you create your model through a web browser. It also comes with a diagram view of the tables and their relations. It doesn’t tell any details about the type of relations, which would have been great. You can also generate the model source code and the tables.

The problem with the tool is that if you manually change your model, there’s no way to continue using Model Designer, since your real model is out of sync with the state of the tool.

Wouldn’t it be great if it read the meta-data from the database or the model source code so you could continue even if you made a couple of changes by hand?

Model Designer


Catwalk

Catwalk is a great tool for testing your model. It allows you to do CRUD-operations against your database. Very useful. The only glitch I found was that there weren’t any warnings displayed when there was an error during the database operation (for example when I didn’t add anything into a ‘not null’-field). The only way to find this out is to watch the console you started Toolbox from.

Overall the modeling was a joy. You get quick results, and you can experiment without any big hassles.
Catwalk

TurboGears 1.0 will include another great thing called ‘fastdata’. It will automatically create web pages for your model. This will save a lot of time.


Looking for documentation

July 12, 2006

Here’s a list of documentation that helped me getting started with Python and TurboGears:

Python:

A Byte of Python
Python 2.4 Quick Reference

TurboGears:

Official documentation – Preview documentation for 1.0
A request’s journey through the TG stack – Simple and beautiful
The CherryPy Documentation – the web development framework
Kid documentation – template language
SQLObject – object-relational mapper

The TurboGears discussion group isn’t very active. It is rare with more than ten posts new topics per day. The freenode channel #turbogears has about 50 users logged in at 13:00 UTC/GMT +3 hours. The irclog shows that there aren’t very much activity here. 81 lines for one day.

Thanks to Trac it is easy to follow the status of the project and see who is doing what. Trac timeline is a good way to follow the activity of the project.

Conclusion

There is for sure enough documentation for a beginner to get started. I’ve heard complaints that there are too little of it. Compared to many other open source libraries I feel overwhelmed with material. There are also some on-line videos to learn from.

The activity on the forum and the channel were surprisingly low for a project that has so much fuzz. I was hoping for more discussion. You don’t always get an answer either in the channel. Not very nice for a newbie.

There is an upcoming book called Rapid Web Applications with TurboGears.

Using Python to Create Ajax-Powered Sites