How not to program WSGI

12 May 2008

or how not to serve robots.txt with PyBlosxom

So as you may have noticed, I moved this site from PyBlosxom to Django, which depending on your perspective is a fabulous thing to do or is tantamount to treason on the high seas. I will explain more about that later.

Old links to the site should, in the main, still work hopefully as I have done some regular expressions jujitsu which should hopefully send everyone to where they were supposed to be going.

However, some posts and comments will have their formatting up the creek. So I want the old version of the site to be available (at archive.commandline.org.uk) for a while longer.

Because the archived version is deprecated and on the way out, I do not want the search engines to index it. Therefore I needed to make a robots.txt file for that subdomain excluding them from indexing it.

The last version of this site, like many dynamic sites, is composed of a number of layers, part of which was a lot of my own nonsense code doing various things. Ignoring that, when a request for a packet came in it would go to WSGI which would then pass the request on to PyBlosxom which was at the bottom of it all doing the hard work.

To deploy it properly, one would normally put Apache at the front as well, but I never got around to that. In theory this is a bad thing to do. But in practice it worked really well without the huge and complicated server that is Apache in the mix. It actually ran fine for a year without stopping, and blazing fast too; if it also confused a few comment spam bots then all the merrier.

So I tried putting Apache into the mix so I could use a Location directive to direct /robots.txt to somewhere with the robots.txt file, but no joy, this would have required doing a lot of what I never got around to before.

So I then looked into how the test server was deploying the site, thinking that I could do some kind of smart regular expressions type matching like in Django or Pylons. But nope.

Hack for the win

So the next step down is PyBlosxom, so I looked out of chance in Pyblosxom/pyblosxom.py and saw the following:

def __call__(self, env, start_response):
    """
    Runs the WSGI app.
    """
    # ensure that PATH_INFO exists. a few plugins break if this is
    # missing.
    if "PATH_INFO" not in env:
        env["PATH_INFO"] = ""

    p = PyBlosxom(self.config, env)
    p.run()

    pyresponse = p.getResponse()
    start_response(pyresponse.status, list(pyresponse.headers.items()))
    pyresponse.seek(0)
    return [pyresponse.read()]

Bingo! As soon as I saw it, I just somehow, on auto pilot, typed in the following lines before the line p = PyBlosxom(self.config, env):

if env["PATH_INFO"] == "/robots.txt":
    start_response('200 OK', [('Content-type','text/plain')])
    return ["""User-agent: * \nDisallow: /"""]

And unbelievably it worked. What I had subconsciously done was to see that we have some kind of string referred to by env["PATH_INFO"]. Then further on we have an object called start_response which is being passed a status and some headers. Then we are returning the response.

I was kidding around more than anything so I just replaced everything I didn't know about with reasonable looking constants (you will know these well if you have ever done Python CGI programming).

I am sure there are millions of far better ways to serve robots.txt with PyBlosxom. But this hack works for me until I no longer need the old site anymore.

1 John Reese says...

The new site looks good. Just thought I'd stop by long enough to say that. Cheers

Posted at 3:59 a.m. on May 13, 2008


2 Ryan says...

Yeah, good layout too. Very clear. :) Better than the last, in fact! I'm another python/django nerd, so I'll be listening even more now. I guess one of the things that's inspiring about Django is they're concerned pretty hardcore with security fixes. Just this week, an email came out and they released new sub-versions for each major Django release to include the fix. Very awesome.

For your blog post model, what did you do for entering posts? Do you still use the default admin interface, or did you make your own views for posting and whatnot? I haven't looked into it much, but does django automatically include much in the way of wysiwyg text editors for text fields?

Posted at 5:28 p.m. on May 15, 2008


3 dbr says...

I concur with the other two comments - this is one of the nicer blog'y site layouts I've seen. The comment system is also actually pleasant to use, unlike every single other one I've (not)-used \o/ One slight bug, you need to enter two backslashes to make it visible.

Anyway.. This hack seems an extremely convoluted and bad way to serve a simple text file, does it not? I've not used Pyblosxom, but it seems insane that it doesn't allow you to serve up static files (specifically /robots.txt and /favicon.ico)..?

As much as I now dislike PHP compared to Python, this does reaffirm my decision to stick with PHP for web-applications - no web-framework has gotten near the simplicity of shoving an index.php file in htdocs/

Posted at 2:53 p.m. on May 16, 2008


4 Zeth says...

Hi guys, thanks for your comments. I deal with them in more depth in a post that I will publish shortly.

However, one small thing; dbr, I really advice against inferring anything from this post. You are completely right that the hack is extremely convoluted and a crazy way to serve a text file.

It is crazy because of me, not because of the software, but because I am bulldog and I won't let go until it is dead!:

  • I want the old archive to be easily available to humans but not search engines.
  • I could not use any existing URLs, because the archive cannot interfere with every incoming link going to the right place on the new site.
  • The last website was deployed in a really experimental way that would need a lot of work to unravel.
  • I wanted to do as little work on the old version of the site as possible, yet have it available with no loss of data or formatting.

This blog has gone from Blogger to Wordpress to my owned hacked Pyblosxom to Django. No other formatted text has gone this way so it is not really something you can learn positive lessons from. Most other people would have used Apache so you get that to serve your robots file.

When moving from Wordpress to Pyblosxom, I made it easy on myself by using a Wordpress style of pseudo-HTML as my markup format in Pyblosxom.

If I had a time machine then I would have used something XML-compatible (such as ReStructuredText) instead.

I still like Pyblosxom, it is a really nice way to get a simple blog up, and I still prefer it to any other pre-made blog application.

But the way I want to develop the site going forward involves a bit more freedom, and for that I want a real web framework. Django is by far my favourite so it is natural that I use that.

Posted at 4:37 p.m. on May 16, 2008


What do you have to say?

Show Editing Help

Europython

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

gutes Qualitätscasino

July 3, 2009
The paragraph is the most basic block in a reST document. Paragraphs are simply chunks of text separated by one or more blank lines. As in Python, indentation is significant ...
An Introduction to ReStructuredText

sreejith

July 3, 2009
I want to download a file from remote server in binary format. Can anyone let me know the command to do so? Thanks in advance
PuTTY Series: Using PSFTP

jythlkedl;rg

July 2, 2009
????? ??? ????????? ?????? ?? ???????? ? ????????? ???, ? ??????? ??? ??????? ??????? ? ??? ?? ???? ?? ?? ????? ???????????????? ??????????????????. ??????????????????? ????? ?? ?? ????, ?, ???, ...
Burning an iso to CD on Windows

gbi-service-ru

July 1, 2009
???? ?????????, ?????????? ?? ?? ? ???"??? ??????, ?????????? ??? ?? ???? ? ????, ? ? ?????. ?? ??? ???? ???? ??? ???. ?? ?????????? ???? ?? ?. ???????? ?, ...
Burning an iso to CD on Windows

seo techniques

July 1, 2009
I would like to thank you for the inforamtion you have put on this article no matter.
Only the penitent man will pass - on captchas and cotton wool

Online Craps lernen

July 1, 2009
I would like to thank you for the making these clarifications in such a detailed manner to rebuilt the communication and enhancing the strategies of the organization which could be ...
Disclaimer: NO WARRANTY

ZK@Web Marketing Blog

July 1, 2009
Django is an amazing web framework; we built a lot of features in a very short period of time and Django [mostly] stayed out of our way. Last night as ...
Baby Steps with Django - Part 4 Django Applications and flow

overnight payday loans

July 1, 2009
I found commandline.org.uk very informative. The article is professionally written and I feel like the author knows the subject very well. commandline.org.uk keep it that way.
Only the penitent man will pass - on captchas and cotton wool

Drogo

June 30, 2009
Gotta agree with your sentiments about many modern games. The cost of a new game is prohibitive, especially for consoles (although I've noticed that PS2 games have crashed in price ...
Retro British Gaming - Part 3: Amstrad CPC Games

pppiohooddd

June 29, 2009
Free vadult video site! http://crech.us/ 1000 free video every day!
OpenSolaris, Gobuntu, and be careful who you kiss

Tesyimasystus

June 29, 2009
...Love this dude!!! http://www.esnips.com/doc/79c22395-7bd6-4299-92db-cf392e381698/kutiman---this-is-what-it-became Peace
5 Homebrew Python Games

Simon Tite

June 28, 2009
twitterfall is still there, I just tried it, and to me it beats Visible Tweets hands down. Problem with Visible Tweets: * Extremely **irritating** animations! (There are three available, but ...
Visualising your favourite keywords in Twitter

piffAltetle

June 28, 2009
??? ??? ???? ???????????? ??????,?????????? ???? ?????? ??????????? ???????,??????????? ????? mp3,??????? ??????????? ??????.
Encrypt your /home this Christmas: part three - moving your data to the encrypted partition

idhyougjdsyhfr

June 26, 2009
SMS Trap is something that never fails to help you get your partner off guard? Our software will make reading other people?s SMS as easy as ABC. Ready for some ...
Burning an iso to CD on Windows

Sozdanie-saitov-com

June 26, 2009
???? ???????? ? ?????????????????? ????????????? - ??? - ???? ?? ????? ?????. ???? ?? ??? ? ??????? ?????? ????????? ?????! ???, ?????23126 sozdanie-saitov.com@mail.ru
Burning an iso to CD on Windows

gameskillz

June 26, 2009
Killzone 2 - the best PS3 game yet?Still LittleBigPlanet for me, but Sony's new shooter is mightily impressive. What you think about my web? http://www.easyfaxlesspaydayloan.com/payday-loans-online.html
Email Syntax Check in Python

Anish

June 25, 2009
hey Moritz, Check this http://commandline.org.uk/python/my-merry-five-minutes-with-bazaar/
Setting up a bazaar server

gbi zavod 177

June 24, 2009
???? ?????????, ?????????? ?? ?? ? ???"??? ??????, ?????????? ??? ?? ???? ? ????, ? ? ?????. ?? ??? ???? ???? ??? ???. ?? ?????????? ???? ?? ?. ???????? ?, ...
Burning an iso to CD on Windows

vettone

June 24, 2009
??? ????? ????? ????,??????? ?? ???,????? ???? ???.????? ?? ??????????? ????,?????????? ?????? ???????? ?? ????.???? ????????: http://euro-football.ucoz.com ????? ???? ??????????.
Burning an iso to CD on Windows

tuegjhg78kjfhuey

June 23, 2009
? ???????????????? ???? ??? ???, ?? ?????? ?? ?????????, ???, ???????????????? ??? ??????????, ???? ????? ??? ??? http://remont.ucoz.ua/
Burning an iso to CD on Windows