Friday, May 25, 2012

KittyCaptcha works!

So, I've got a site I've got some problems keeping spam out of. I integrated with Akismet, which hits the barn wall as far as this site is concerned, and I also do a little bit of heuristics, based on poster location, amount of links / keywords in a particular piece of content, etc.

Akismet catches a lot of valid content as spam, so every few days I go through the spam bucket via the admin console and blindly white-list anything that doesn't look like spam. That was working OK for awhile.

Anyway, a few days ago I look and see that there are 163 identical bits of content, all of which completely skirted Akismet. Genius. It was obviously a bot. Now, part of this problem was that I had previously removed reCaptcha a few days ago as usability wise its terrible, and I actually failed a few of the reCaptcha tests myself because the scrambled words it presents are almost impossible to read these days.

So I figure, "Akismet will catch most everything, why use a captcha?"

As it turns out Akismet seems to catch very little, even when I'm marking items as spam. Even when things are *obviously* spam -- several dozen links, incredibly high keyword density -- it just completely misses the mark.

I rack my brain.

HashCash? No good implementation details, plus the concept hasn't been proved yet anywhere.
Simple math captchas? I hate math.

Then I remember way back when. A few years ago, people were talking about Microsoft Research's Assira and some random dude's KittenAuth. Neither caught on: Assira has good image breadth but looks like crap, and KittenAuth isn't very detailed regarding its implementation. Back then I initially dismissed it as a fruitless endeavor: where would you even get a large assortment of kitten pictures from? You'd have to partner with someone like Microsoft or come up with your own small batch of images that would quickly be "solved" by a human being paid 10 cents an hour.

But that was a long time ago. Now we've got Flickr, Instagram, Google Image Search. I have all the Internet's random kitten pictures at my disposal. I just need to use them to their fullest, cutest effect.

So I dig around, study some techniques on form signing with encryption, how to avoid replay attacks -- cracking the captcha once, by hand, then using it to resubmit over and over and over again ... -- and about 2 hours later I have a simple but adorable KittyCaptcha.

I deployed it into production last night, and haven't had a single drop of automated spam almost 24 hours later. Regular content is flowing through quite nicely.

This was a pretty poorly written post, I realize, but it's mostly stream of consciousness and I can't be bothered to go back and edit anything.