Only the penitent man will pass - on captchas and cotton wool

3 July 2007

IP Blocking is Cool

Since I moved this blog to WordPress from Blogger, I have had 467 real comments and 44,379 spam comments. Everyone of the 467 is precious and it is worth dealing with the forty thousand robot comments for those. In the time it took me to write that last sentence, it has now gone up to 44,389.

This latter number would be far higher but for the fact that I have set up my blog to automatically block, at the Apache level, any IP address that is responsible for more than one spam comment.

This measure is surprisingly effective. The behaviour of spammers is that they notice my blog exists and then shoot several dozen (sometimes hundreds of) spam comments from the same IP address. Since setting up this measure, I have blocked well over 12,000 IP addresses. I'll save my bandwidth for someone special, you the reader.

One interesting idea would be to study this list by doing reverse IP lookups to plot these computers by location, operating system and so on. Most that I have looked up myself have appeared to be zombie Windows-based PCs on normal commercial ISPs.

After all this, there is still the odd spam that gets through, roughly 0.001% or so of attempted spam comments make it through my current set of tests, i.e. one or two per week. At some point they just use humans, there is little I can do about that.

My process has been to queue all surviving comments, just in case. So I can weed out this small number of comments. Normally I approve real comments in a matter of hours. This is still a bit annoying however to introduce the time delay.

Captchas are not the answer

Even worse are captchas (little games that make you guess the characters), I hate trying to fill them out myself. I also use the text-only elinks browser quite a lot, and captchas will not work with that. Please type out the current picture:

You can't read this, go away they say.

Furthermore, captchas also are becoming less effective at their core aim of distinguishing a human from a machine, as it all descends into an arms race. Computer software is becoming better at beating them, while captchas are becoming more complex in response, meaning that humans have more trouble. The other day, someone's capacha was so obscured that it took me six tries to leave a comment on a blog, I won't bother to post there again.

Innocent until proven guilty

I do not make want to make it like Indiana Jones and the Last Crusade just to make a comment: roll under a flying saw, spell something in Hebrew and then take a flying leap into the abyss. Furthermore moderation of spam is my problem, not yours. It is already inconvenient to make comments, making it harder to make comments is bad bad news.

The ancient Greek philosophers and the Old Testament argue that it is better that any number of guilty people go free rather than one innocent man be punished.

This week I have tried something different, I have allowed all comments that do not have too much in the way of hyperlinks to automatically go live. Yes there will be the odd spam, but I will try to squish them quick.

Although it is handy sometimes to be able to carry on the conversation privately, for example, once I was helping someone install Gentoo; 90% of the time I do not really care that much what your email addresses are, but getting it in twice means we are different than other Wordpress blogs and so we break some spammers' scripts. This is dull though so it might be worth getting rid of the email field entirely.

In the long run I might write my own blog software, then I would do comments very differently indeed. It would be a little more like a wiki and allow you to annotate and change the text itself, using different colours and layers, etc.

Teletubbies need not apply

This is not a blog which attracts small children, it is about taking control of your own technology, as well as ethics, Linux, Bash, Python and so on. None of these are hot topics in the primary school age bracket. Children should not be on the Internet unsupervised anyway. If you let your small kiddy browse the web alone then you are at best an idiot.

Therefore, the consequences of having the odd spam online for 10 minutes is not all that great. So I am going to try it without the cotton wool for a while longer and see how we get on.

What would you do?

What is your take on this? Have you dealt with this problem yourself somewhere? Do let me know.

1 Brian says...

44,389? Your blog must be a lot more popular than mine, or you must be
really unlucky. I've had 4300 spams and around 380 legit comments since installing Wordpress. Akismet captures and automatically removes essentially all of the spam. It's very rare even for one to make it into the moderation queue (showing that Akismet thinks it's spam but isn't 100% sure). I don't have any other anti-spam tools running but that. No captchas and not even a required email address. Not sure what I'd do if I was facing that kind of spam volume.

I like using little logic puzzles instead of captchas, like a simple math problem or "Which of these pictures is a bunny?" sort of thing. If you were inventive you could make it actually a bit of fun. I share your distaste of captchas. On some message boards it takes me a dozen tries to make a post.

Posted at 12:52 a.m. on July 3, 2007


2 Zeth says...

Hi Brian, I like your blog design by the way!

> Your blog must be a lot more popular than mine, or you must be really unlucky.

I have quite a lot of people turn up but a very low percentage of them leave comments. I do not know what I am doing wrong :-( but recently I am trying to encourage you all to leave comments.

If it was about politics or something then I would get more perhaps. Of course, I did lose the old comments due to moving from blogger to wordpress.

So I think partly it has to do with the nature of the material and the fact that most readers seem to be just passing through. They read my blog within a 'Planet' or other aggregator or have come via Google, searching for the answer to some technical problem.

Posted at 1 a.m. on July 3, 2007


3 Brian says...

Thanks for liking my blog layout. I try to leave comments when I have anything interesting to say. I think most sites have a majority of people lurking and very few who participate so it's not just yours. (I read a statistic about it somewhere, so it must be true.) Personally I read your blog via Planet Larry.

Quality over quantity though, that's the way to go. Keep up the posts, they are good reading.

Posted at 1:54 a.m. on July 3, 2007


4 BTreeHugger says...

"moderation of spam is my problem, not yours."

Thank you for saying that. I generally abhor the "modern solutions", since adding tons of scripting and cookies makes me cringe. And I've written as a web designer about how much I hate modern comment forms and their extreme lack of usability.

Sadly, we're too sensible for the modern web. I just use Akismet and hope for the best, since I am not "blogging for points", simply to document things. But I never force the user to use cookies or a login, and if I ever stoop to using a CAPTCHA I have a small army of friends who will hurt me until I realise the error of my ways.

Among the most effective approaches I've seen were the simplest ones; forced previews, time limits between posts, and so on.

I've been wondering how spam bots harvest forms on a page, and whether they ignore some forms. If they do, we could set up a nonsense-like one that sends the legit comments to a weird server and bounces back to our real one. If the bots do not, we could instead just have two forms, and if both are used we ignore both comments and blacklist the IP for a time. Thoughts?

Posted at 2:02 a.m. on July 3, 2007


5 Andy Loughran says...

Zeth, The comment thing really gets to me too. I hate not being able to leave the comment box free so that two people visiting the site simultaneously can discuss my posts. Unfortunately I've not been too active recently (and moving my blog from a dynamic to static IP address seems to have reduced the spam) - so since installing askimet on wordpress I've not had more that 79 spam.

I don't know what I'd do if I were faced with the levels of spam that you have - it must be incredibly frustrating. I have started to use the openID plugin on my blog - it means that anyone who's already posted won't double post under different nicknames, and therefore their comments go live straight away. It was pretty easy to install too - the actual openID site I also run myself and it's nothing more than a (very) short php script.

Posted at 9:37 a.m. on July 3, 2007


6 Zeth says...

@Brian, Planet Larry is really cool, it is nice to have fellow Gentooers visit.

@BTreeHugger I have been thinking about the "Spam Trap" approach since yesterday since I added the extra email field.

'Spam Traps' are ways to try and divert the spam through hidden Javascript or whatever. They are very effective at distinguishing between a modern graphical web browser like Firefox and a spam machine. However, they normally have lots of false positives: people using text-only browsers, people with Javascript turned off people using visually impaired technologies people with certain Windows security technologies

This is both making the spam the visitor's problem (client side Javascript), as well as punishing the innocent. However, if we can make the system recover gracefully in case of false positives, then this might be the way to go.

Posted at 9:54 a.m. on July 3, 2007


7 Zeth says...

@Andy, in the last week I have moved from a dynamic to static IP address. We will see if this makes a difference. I suspect they are following URLs though, and cool URLs do not change.

However, spammers often comment on the most linked to posts, i.e. the highest Google ranked posts, rather than to the most recent posts. So one idea is to treat comments on posts older than a month differently.

Posted at 9:56 a.m. on July 3, 2007


8 Bug says...

Well, I used too many news systems. I never wanted to have a 'blog' so I just used a stand alone news systems. For that reason I never bothered with Wordpress. It took me quite a time to find a good usable news system that produces a valid HTML code. But after a while, I found the hard way that it gets spam, spam that I don't want. So I looked again for a news system. For a while it was clean, I guess the bots didn't know about the new system. The 2nd news system wasn't protected at all, and I had posted a digg link. Which then I ended up with 80 spam comments a day. Not fun at all [Because I didn't have a DELETE ALL comments button]. After a while, I just decided I can't stand it any longer and just coded a system myself. So now I use my own made news system. Again, for a while, the bots tried to understand the news system change. After some more time. 1 single spam comment entered. So I just ended up coding an hidden input field, which was great for getting rid of the stupid bots.

Though, the bots learned the lesson, I then received about 20 or so spam comments. Cleaned 'em all and made the math question. The trickiest part of the my math question is the fact, the input box, have something that is totally not related inside it. What do you think a bot will type if he has a label that says Whats the colour of the sky, while the input box default value is pie? So this seems to kill the spam. I didn't bother with IP Banning yet, or setting a blocked spam count, though I think I might make a count, just to see how much it kills :), so I'll know if it's because they try or because they just decided to skip on my site.

I'm totally against Captchas, though I do have one on my webbie in the contact sector. Just because I'm too lazy to edit the code [It was something premade I took. Haven't touched it for years, not sure it even sends to the correct E-Mail XD].

@Brian: The bunny idea is nice, but I like to surf using w3m sometimes, therefor, I don't like image tests in general, even if it's not as evil as captchas.

Posted at 2:18 p.m. on July 3, 2007


9 Phill says...

You must have quite a high Google ranking, Zeth - I find that the higher ranked on Google you are, the more spam you get :-/ I usually get around 50 or so spam comments a day, although fortunately Akismet filters most of these out.

What I have done on occasion is just disable comments for a particular post which seems to be generating a lot of traffic from spammers (it's usually an old one so doesn't really do any harm to disable comments).

I don't think CAPTCHAs are worth the bandwidth to be honest, spam bots are getting so much smarter that I don't think it's worth it.

Posted at 8:25 a.m. on July 4, 2007


10 Nothing says...

How do you block IP adresse in Apache level ?

Thanks !

Posted at 9:53 p.m. on July 8, 2007


11 Zeth says...

Easy just add lines to your .htaccess file. E.g.

Deny from 81.26.51.108

will deny from that IP Address.

Posted at 9:16 p.m. on July 9, 2007


12 Bug says...

Though, by the you kill the legitimate PC user from visiting the site. Also, think about the dynamic IP's issue.

Posted at 11:51 a.m. on July 13, 2007


13 Phill says...

I agree with what bug said... I suppose what you could do instead is redirect them to a URL which says "Your IP Address is blocked. To unblock, please contact..." - although I suspect for the majority of web users they'd just skip past as they couldn't be bothered to get themselves unblocked.

Posted at 12:49 p.m. on July 13, 2007


14 Zeth says...

Well if someone's Windows PC has been hijacked and is now part of a botnet, they no doubt have rather more pressing problems than not being able to read my blog.

Posted at 12:56 p.m. on July 13, 2007


15 Dave says...

I found just having a hidden field in the form to do the trick.

  1. Hidden field that the user can not obviously fill out (can't fill it

    out if you can't see it)

  2. Bot will fill it out because they just read the html

  3. Upon submit- if that field is filled out...you know a bot filled it

    out. Ignore/ban.

Posted at 6:33 p.m. on October 19, 2007


What do you have to say?

Show Editing Help

Europython

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

picsus

January 5, 2009
Monique, a Leaf fan, originate this plumb persistent to believe. Now, let me regarding out that this was in no way an try to articulate one cooperate is more wisely ...
This week in the world of the Command Line; The Friday Round up!

QuickSilver

January 5, 2009
Nice! Is there anyway to implement a ServerAliveInterval for long processes? This is because my our firewall keeps closing the connection based on inactive connections. Thanks,
SFTP in Python: Really Simple SSH

Tun

January 5, 2009
Hi, Do You know, haw can i get start date for tasks evolution? If exists the similar way to your example: i.get_due() ? I would like to have sth like ...
Three Useful Python Bindings - ClamAV, Apt and Evolution

MurreiM

January 5, 2009
This is great! http://www.youtube.com/MurreiM Buy Alli Orlistat online cheap
Filing cabinets 101 - An introduction to disk partitions

sarah

January 5, 2009
I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed ...
This Week: Freedom not Time-Bombs

jnfrlast

January 4, 2009
Hi! http://www.youtube.com/jnfrlast buy cheap viagra online
Filing cabinets 101 - An introduction to disk partitions

Samuel Huckins

January 4, 2009
Great tips! I have had occasion to do a lot of MySQL instance migrations lately, so here is an improvement for Trick 1: mysqldump <DATABASE_NAME> [mysqldump_options] | gzip -c | ...
Five useful command one liners

George Glass

December 31, 2008
I don't really see the point in trying to make linux user-friendly or take over the desktop. We rule the servers the most important element of the entire game. Let ...
Give Linux a chance

bug

December 31, 2008
@Zeth: The hidden field does block some. Not perfect, but it does release some weight from the filtering system, as those are 100% false comments. Acctually, if you would have ...
On Comment Spam

Zeth

December 31, 2008
Hi Eion, Yes that is an interesting approach also. It is the only approach given by default in the stock Django comments module, though it does not stop all comment ...
On Comment Spam

Bug

December 30, 2008
Well... Sadly, and I guess you hate me for it, I use captcha. But at least it's not an image, so even if you visit using w3m [yey!] you can ...
On Comment Spam

Eion

December 30, 2008
Other than server-side processing of comments, I like to add additional <input>'s and hide them in external css. Most of the time the fields are populated by spam-bots, and if ...
On Comment Spam

Nostoc

December 27, 2008
... Mate possible because of the dull Kg8
Ruy Lopez, Berlin defence, open variation

Nostoc

December 27, 2008
My bad, I meant the picture beneath 15, after close inspection my suggestion would be on 18. Instead of 18 : Qe2, I would have taken that knight with my ...
Ruy Lopez, Berlin defence, open variation

Zeth

December 27, 2008
Nostoc, white takes the rook on 15, the rook is a better kill than a knight.
Ruy Lopez, Berlin defence, open variation

Nostoc

December 26, 2008
I'm not that good at chess, but I have a question. At 15, why doesn't white simply take black's knight in C6 with the bishop? It's an easy kill, since ...
Ruy Lopez, Berlin defence, open variation

Zeth

December 26, 2008
CorkyAgain, good question, I don't have a FreeBSD box available at the moment so I can't comment. On Linux at least watch does as I have described.
Five useful command one liners

CorkyAgain

December 25, 2008
Is the watch command you're describing a Linuxism? On my FreeBSD box, "man watch" seems to be describing something completely different.
Five useful command one liners

Binny V A

December 25, 2008
I have actually setup a site to store just short commands... http://txt.binnyva.com/
Five useful command one liners

Bassam essa

December 25, 2008
i try this line command elinks -source "http://www.e51g.com/" > resulthtml.txt its work done :) thx
Command the Web - an ELinks tutorial