A basic search box with Django

29 November 2009

For providing a simple search box for this blog, I was using a search engine's custom search facility. This approach presented a number of shortcomings.

Firstly, new posts were not represented in the search results as posts only go into the search engine when the search engine bothers to send its indexing scripts at the site.

Secondly, the search engines do not distinguish between the content of the post and the general template of the site.

Thirdly, the results page from the search engine was cluttered and rather unhelpful. What I really wanted was a list of posts that have the result in, not the same post several times in different versions (normal view, category view, RSS feed, etc).

The shortcomings go on and on, suffice to say that I thought I should really write a proper search form. This post explains how I did it.

This site has been written using the Django web framework, I open sourced the code into the Soturi project. All the posts are held as a simple text field in an SQL database. Using a search engine library like Lucene is somewhat overkill. We can just use something like:

from blog.models import Post
Post.objects.filter(body__contains='Django')

This gives the posts containing the word Django, ordered by the default ordering, which is last post date. We can of course order the results by anything we like, but this order is probably what we want anyway.

Website searches tend be quite inexact, so we should at least support case insensitive searches, i.e. so that 'django' will match 'Django'. So we use the icontains lookup which provides case-insensitive results:

from blog.models import Post
Post.objects.filter(body__icontains='django')

However, we don't want to just be able to search by one keyword alone, we want to be able to search by lots of keywords. We also want to be able to treat "something in double quotes" as a single keyword.

So the first thing we need is a function that splits the query into keywords:

def split_query_into_keywords(query):
    """Split the query into keywords,
    where keywords are double quoted together,
    use as one keyword."""
    keywords = []
    # Deal with quoted keywords
    while '"' in query:
        first_quote = query.find('"')
        second_quote = query.find('"', first_quote + 1)
        quoted_keywords = query[first_quote:second_quote + 1]
        keywords.append(quoted_keywords.strip('"'))
        query = query.replace(quoted_keywords, ' ')
    # Split the rest by spaces
    keywords.extend(query.split())
    return keywords

MYQUERY = """Django form "aggregated values" """
split_query_into_keywords(MYQUERY)

Django queries are lazy, i.e. they do not actually run until the results are used which then forces the query to be evaluated. Therefore we can just chain filters together and Django should only access the database once.

from blog.models import Post
def search_for_keywords(keywords):
    """Make a search that contains all of the keywords."""
    posts = Post.objects.all()
    for keyword in keywords:
        posts = posts.filter(body__icontains=keyword)
    return posts

MYQUERY = """Django form "aggregated values" """
keywords = split_query_into_keywords(MYQUERY)
posts = search_for_keywords(keywords)

One could of course also search the other fields such as title, author and comments, then combine the results together. However, this is good enough for a simple search.

The rest of the search is just a basic Django form, as explained in the Django forms documentation. I.e. we have the form class:

from django import forms
class SearchForm(forms.Form):
    """Search posts by keywords"""
    keywords = forms.CharField(max_length=100)

For processing the form, I wrote something like this in the view:

if request.method == 'POST':
    form = SearchForm(request.POST)
    if form.is_valid():
        keywords = form.cleaned_data['keywords']
        keyword_list = split_query_into_keywords(keywords)
        posts = search_for_keywords(keyword_list)
        if posts:
            # Show the results

Lastly I put the HTML form tags in the template, and a line into urls.py and that was about it.

You could create a new template for the search results, I personally didn't bother, I just reused the list of posts which already existed.

Merry coding!

What do you have to say?

Show Editing Help

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

Zeth

November 29, 2009
Hi Jordan, yes that URL is gone now. I have a new contact form on this site.
Python CGI contact forms

Jordan

November 29, 2009
Zeth attention! Your form, http://zeth.me.uk/contact/, is not working The explorer says connecting ..but nothing happens Sorry for my poor English: I am Spanish Regards
Python CGI contact forms

Jordan

November 26, 2009
Sorry: tell me , not tellme (I'm spaniard) And http://zeth.me.uk/contact/ don't work
You got the touch, you got the power

David Jones

November 25, 2009
Your mad skillz are too l33t! for me. I specifically switched to Google Reader so that I could show people what blogs I read. But I couldn't work out how ...
How to find the fashionable blogs quickly

Brian R. Hickey

November 20, 2009
Symantec picked it up too.
How to bring down Internet Explorer with six words

Zeth

November 17, 2009
Thanks djm, I am the moose here. Christian, assuming one actually does Internationalise the countries, it should still work I guess, as the gettext stuff will happen before the list ...
Countries in Django

Phillip Temple

November 17, 2009
Good start, but: a) wouldn't I want None back rather than 'ZZ'? b) why not add a 'shortcut' boolean, then prepend flagged fields (plus usual '-----' separator) to the actual ...
Countries in Django

djm

November 17, 2009
Am I being a moose or did you mean: from whatever.countries import CountryField instead of from whatever.countries import CharField ? Good post though, cheers.
Countries in Django

Christian Joergensen

November 17, 2009
Wouldn't the ordering get messed up after i18n?
Countries in Django

Steve - Electronic Cigarettes Fan

November 17, 2009
Very well done. Is your blog just you writing? Nicely done, Steven.
Blogger vs Wordpress

vetetix

November 15, 2009
Sorry to bother you nearly two years after you wrote this blog article, but I can't manage to find how to modify an existing field. I am trying to change ...
Three Useful Python Bindings - ClamAV, Apt and Evolution

Manju

November 4, 2009
I am transferring some files using psftp to other device's FAT partition. But the filestamp of the file being transferred is modified to that of FAT device, after the transfer. ...
PuTTY Series: Using PSFTP

iki

November 2, 2009
or simpler: socket.gethostbyname_ex(socket.gethostname())[2]
How to find out your IP address in Python

iki

November 2, 2009
local_ip = set([ i[4][0] for i in socket.getaddrinfo(socket.gethostname(), None) if i[0] == 2 ])
How to find out your IP address in Python

Fred

November 2, 2009
testing rst ------------- - point 1
An Introduction to ReStructuredText

Ano

October 27, 2009
"You simply found the license of the StumbleUpon Toolbar for Internet Explorer." That's possible. I've got some more interesting information to add. Firstly, go to this page: https://addons.mozilla.org/en-US/firefox/addon/138 - this ...
Are your Firefox extensions proprietary software?

Ken

October 21, 2009
Stumbled in here at lunch. This is the best find of the week. Thanks.
Three classic command line tips

Jim

October 19, 2009
Thanks for the rtsp:// post - that's something that has been bugging me for a while!
Three classic command line tips

Zeth

October 18, 2009
Thanks for the comments guys. Great to see the all the gang are still here!
Three classic command line tips

Bubba

October 18, 2009
Is there any way psftp can return the true transfer rates oberved during the actual transfer?
PuTTY Series: Using PSFTP