Tuesday, April 07, 2009

How they'll break the 3D CAPTCHA

Just a quick note, to point out how the "unbreakable" 3D CAPTCHAs, recently publicised, could probably be broken rather easily. I don't want to turn this into a blog about CAPTCHA, but a friend called me on my off-hand claim, last week, that the 3D CAPTCHAs I mentioned are just as breakable as existing varieties (or more).

There only seem to be a few dozen possible objects, judging by the many repeats when you get a new puzzle. The 2D images of the objects may look very different from some different viewpoints (staring down a toilet bowl as opposed to looking at it from the side), but if you had stored solutions for a few hundred views of each object, every puzzle displayed would be within 10 degrees of a solved reference, and so very similar and identifiable by some standard 2D image processing. Since the object pool is so small, a reasonable amount of manual or semi-automated CAPTCHA solving would provide the necessary reference. There aren't many "big rotations" of each object, and the "small rotations" are just another kind of light distortion to apply to an image (probably easier to handle than today's prevalent CAPTCHAs' wavy distortions, noisy backgrounds, and random squiggles). Enlarging the object database to a useful size may require a lot of work, and there may not be more than a few thousand easily-distinguishable object types, anyway.

Now, the way many text CAPTCHA schemes got over dictionary-based CAPTCHA attacks (which also rely on the list of possible puzzles being rather small) is by displaying not words, but random letter sequences. Generating a random artificial 3D object is likely to make the puzzles unreasonably hard for us humans; in the puzzle shown above, the airplane is recognisable as two views of the same object, only because we know what an airplane is. There are other ways to evade an attack based on the small selection of possible puzzles, and, as always, adding noisy backgrounds and squiggles will slow down the CAPTCHA-breakers, but by that point it's just back to the usual arms-race between CAPTCHA makers and breakers...

Sunday, April 05, 2009

CAPTCHA GOTCHA

A recent wave of admiration for new 3D-flavoured CAPTCHAs got me thinking about CAPTCHAs. The whole model just doesn't hold up to technological or economic scrutiny. CAPTCHAs are doomed, because of three important "CAPTCHA gotchas".

The CAPTCHA idea sounds simple: prevent bots from massively abusing a website (e.g. to get many email or social network accounts, and send spam), by giving users a test which is easy for humans, but impossible for computers. Then the account-opening process can't be automated, which slows down the spammers and other Net nuisances. But does such a test really exist, and will it save us from cybercrime?

The new CAPTCHA, which you can see on YUNiTi's registration form, is of course billed as "unbreakable by current computer technology". Since current [CAPTCHA-breaking] computer technology is focused on reading squiggly letters with wavy lines, and this brand-new CAPTCHA is currently used to protect one relatively obscure service, this is perhaps less impressive than it sounds. With the right incentive, I believe hackers and researchers will soon be breaking this method, just as they've broken many squiggly-letter CAPTCHAs before.

And don't think for a moment the problem is with this or that specific puzzle-type, and that some new game can save the day. It seems CAPTCHAs must either become irrelevant, or fail, because of three inherent CAPTCHA gotchas:

The mental effort gotcha. Sure, we humans are much smarter than computers, but actually demonstrating that (where it counts) takes time and effort. Even a human interrogator can have a hard time telling humans and computers apart. Website users aren't willing to spend more than a few seconds solving a CAPTCHA, and they already frequently complain that CAPTCHAs are too hard. This means CAPTCHAs need to be simple puzzles we can solve reflexively and mechanically, without warming up the deeper thought processes. This just about implies the solution is likely to be something we could emulate on a computer with reasonable effort.

The accessibility gotcha. CAPTCHAs are inherently terrible for people with disabilities (and are frequently reviled for this). The blind can't see image-based CAPTCHAs, and visually-impaired users don't have it much easier, because of the deliberately unclear images. The audio alternatives are frequently too hard for humans or too easy for bots, and of course they're inaccessible to deaf or hearing-impaired users. CAPTCHAs which use text can get too difficult for some dyslexics, and so on. And even if the mental effort gotcha didn't stop you trying to base a CAPTCHA on more "intelligent" puzzles, would you really want to build in inaccessibility to children or the mentally challenged? Trying to keep a site reasonably accessible means using multiple alternative CAPTCHAs, and (again!) keeping those puzzles quite simple.

The economic gotcha. This is the CAPTCHA gotcha most likely to eliminate CAPTCHAs as an effective tool. Suppose a genuinely hard-to-break CAPTCHA scheme does emerge, and is used to filter access to a valuable resource, for example webmail or social network accounts. Suppose you're a spam-baron and need to open one hundred thousand such accounts. You could pay a small team (in a third-world country with cheap labour, of course) just to solve CAPTCHAs manually for you. The experts say you need 10 seconds per puzzle, or 278 hours total. That's a little more than one work month, which could set you back a few hundred dollars, if you insist on highly-qualified personnel (and even paying taxes!). If you made a business of it, you could probably knock that down to a hundred dollars. I'm not an expert on the malware economy, but I believe that's a fair price to pay, given other typical rates for resources. You'd certainly hope the many millions of spam messages you can now send will more than recover that investment. It's also a great outsourcing niche: just specialise in solving CAPTCHAs, and sell that service. And it's even been suggested that hackers may already be solving some CAPTCHAs with an alternative workforce: they require users of their own porn or pirated content websites to solve the CAPTCHAs from sites they wish to access. Efffectively, they're paying them in pictures or MP3s (which may be even cheaper for them).

The thing about the economic gotcha is that it's pretty much built into the idea of a CAPTCHA: paying someone to solve CAPTCHAs for you by the thousand just about has to be an efficient attack. Users won't spend a lot of time on a CAPTCHA for something of petty value, so the time wasted solving a CAPTCHA must be worth considerably less than whatever the website is offering. The value of the same sort of access to a criminal user is likely to be even greater, whereas the cost of solving is likely to be cheaper for them (consider the economy of scale). In a nutshell:

value to criminals > value to legitimate users > equivalent cost to user of solving CAPTCHA > cost to criminals of solving CAPTCHA without automation

So it doesn't seem that CAPTCHAs can (or even should) retain their ubiquitous presence on free-service websites. But do we have any alternatives ready? Yes and no, but here are some techniques which don't get used enough:

  • Quotas and soft limits for anything with a free account. During 5 years, in which I've used a GMail account as my only personal email, I've sent just a few thousand emails. Google could cap my usage so I'd never feel it, and spammers would be stuck. Much the same goes for Facebook, etc.
  • Heuristic "profiling" for bots: at least block them when they've started spamming, or whatever it is they're doing to abuse the system.
  • Spam filtering, not just on incoming messages (I haven't seen spam in my GMail Inbox for ages and ages), but also on outgoing stuff, perhaps feeding also into the heuristic profiling just mentioned.
  • Requiring message senders (or users of other services) to place a small sum of money, per message, in escrow, which the recipient may collect if the message is unwanted. This method, suggested to combat spam, could make massive abuse of the system very unprofitable, without making the system expensive to use for legitimate users. (It also has several flaws, but that's food for thought for another post sometime.)

What's common to these methods is that they go after the actual abuse of the service (which could very well be carried out by bots), rather than trying to enforce some human interaction, like a CAPTCHA. Use these well, and we'll never need to mangle another letter for CAPTCHA purposes.

[This post also attracted a fair amount of comments and controversy on Slashdot.]