A basic search box with Django

29 November 2009

For providing a simple search box for this blog, I was using a search engine's custom search facility. This approach presented a number of shortcomings.

Firstly, new posts were not represented in the search results as posts only go into the search engine when the search engine bothers to send its indexing scripts at the site.

Secondly, the search engines do not distinguish between the content of the post and the general template of the site.

Thirdly, the results page from the search engine was cluttered and rather unhelpful. What I really wanted was a list of posts that have the result in, not the same post several times in different versions (normal view, category view, RSS feed, etc).

The shortcomings go on and on, suffice to say that I thought I should really write a proper search form. This post explains how I did it.

This site has been written using the Django web framework, I open sourced the code into the Soturi project. All the posts are held as a simple text field in an SQL database. Using a search engine library like Lucene is somewhat overkill. We can just use something like:

from blog.models import Post
Post.objects.filter(body__contains='Django')

This gives the posts containing the word Django, ordered by the default ordering, which is last post date. We can of course order the results by anything we like, but this order is probably what we want anyway.

Website searches tend be quite inexact, so we should at least support case insensitive searches, i.e. so that 'django' will match 'Django'. So we use the icontains lookup which provides case-insensitive results:

from blog.models import Post
Post.objects.filter(body__icontains='django')

However, we don't want to just be able to search by one keyword alone, we want to be able to search by lots of keywords. We also want to be able to treat "something in double quotes" as a single keyword.

So the first thing we need is a function that splits the query into keywords:

def split_query_into_keywords(query):
    """Split the query into keywords,
    where keywords are double quoted together,
    use as one keyword."""
    keywords = []
    # Deal with quoted keywords
    while '"' in query:
        first_quote = query.find('"')
        second_quote = query.find('"', first_quote + 1)
        quoted_keywords = query[first_quote:second_quote + 1]
        keywords.append(quoted_keywords.strip('"'))
        query = query.replace(quoted_keywords, ' ')
    # Split the rest by spaces
    keywords.extend(query.split())
    return keywords

MYQUERY = """Django form "aggregated values" """
split_query_into_keywords(MYQUERY)

Django queries are lazy, i.e. they do not actually run until the results are used which then forces the query to be evaluated. Therefore we can just chain filters together and Django should only access the database once.

from blog.models import Post
def search_for_keywords(keywords):
    """Make a search that contains all of the keywords."""
    posts = Post.objects.all()
    for keyword in keywords:
        posts = posts.filter(body__icontains=keyword)
    return posts

MYQUERY = """Django form "aggregated values" """
keywords = split_query_into_keywords(MYQUERY)
posts = search_for_keywords(keywords)

One could of course also search the other fields such as title, author and comments, then combine the results together. However, this is good enough for a simple search.

The rest of the search is just a basic Django form, as explained in the Django forms documentation. I.e. we have the form class:

from django import forms
class SearchForm(forms.Form):
    """Search posts by keywords"""
    keywords = forms.CharField(max_length=100)

For processing the form, I wrote something like this in the view:

if request.method == 'POST':
    form = SearchForm(request.POST)
    if form.is_valid():
        keywords = form.cleaned_data['keywords']
        keyword_list = split_query_into_keywords(keywords)
        posts = search_for_keywords(keyword_list)
        if posts:
            # Show the results

Lastly I put the HTML form tags in the template, and a line into urls.py and that was about it.

You could create a new template for the search results, I personally didn't bother, I just reused the list of posts which already existed.

Merry coding!

What do you have to say?

Show Editing Help

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

Cupcake

July 31, 2010
Good post! You helped me a lot with my school project! CountryField(blank = True) < (K)
Countries in Django

LeshaShampoo

July 30, 2010
it was very interesting to read commandline.org.uk I want to quote your post in my blog. It can? And you et an account on Twitter?
Email Syntax Check in Python

vemma2018

July 30, 2010
I find myself coming to your blog more and more often to the point where my visits are almost daily now!
On Comment Spam

layecenda

July 30, 2010
Hello. And Bye.test :) http://idfjhvihdfiphvlajbvhalibv.com
PuTTY Series: Adding PuTTY to your system path

scuba

July 30, 2010
I’ve been visiting your blog for a while now and I always find a gem in your new posts. Thanks for sharing.
On Comment Spam

Businesking

July 30, 2010
Great site and articles for hack for win, I said Amazing post
How not to program WSGI

Tehnoking

July 30, 2010
This is Great post to learn about the hack Thumbs-up for you :D
How not to program WSGI

Syabiltech

July 30, 2010
I think this articles for master...because very hard to learning, As blogger beginners like me.
How not to program WSGI

coffeeatea

July 30, 2010
Are you looking for coffee gifts? We can tell you more about the coffee gifts including coffee machines and coffee pods.
Introducing Soturi - yet another Django blog application

noni juice

July 30, 2010
I just sent this post to a bunch of my friends as I agree with most of what you’re saying here and the way you’ve presented it is awesome.
On Comment Spam

Dion Moult

July 29, 2010
What I do know is that ever since I tried out Opera and put their tab bar on the left as a column, I've loved that layout. Back on Firefox ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

ZonaEntertainment

July 29, 2010
Wow useful articles, I'm read to learn about this and now I bookmark this to my Facebook, thanks for share!
How not to program WSGI

Giacomo

July 29, 2010
Honestly, I think both Mozilla and you are wrong :) This sort of concept adds overhead. A user would have to manage all this crap, constantly dragging and dropping, creating ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

Matija "hook" Šuklje

July 29, 2010
As a minimalist, you'll probybly moan if I mention KDE, but I'll do so anyway ;) The future I want (and actually see slowly fold out before me) is to ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

tahitian noni

July 28, 2010
Thank You For This Blog, was added to my bookmarks.
On Comment Spam

Rick

July 28, 2010
I already have piles. It's called A New Window.
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

Tech News

July 25, 2010
Thanks for this short tutorial...was auto-FTPing my files from my appserver to webserver for my tech news website. Everything was OK until someone hacked it. Hosting provider is now recommending ...
SFTP in Python: Really Simple SSH

naypalm

July 24, 2010
During the past 3-4 years, I and many others have enjoyed unlimited 2G/3G internet. But ever since the massive cult-like following of i Phone users in the US, most cellular ...
Calling time on mobile internet nonsense?

Steve

July 15, 2010
Very occasionally, you will run into a Java program that uses a lot of memory just to hold all the classes used. It turns out that the JVM uses a ...
Three classic command line tips

no

July 14, 2010
1. number one 2. number two 4. number four 3. number three 6. number six # first # second ## second-ay ## second-bee ### second-bee-one ### second-bee-two
An Introduction to ReStructuredText