Sunday, April 05, 2009

CAPTCHA GOTCHA

A recent wave of admiration for new 3D-flavoured CAPTCHAs got me thinking about CAPTCHAs. The whole model just doesn't hold up to technological or economic scrutiny. CAPTCHAs are doomed, because of three important "CAPTCHA gotchas".

The CAPTCHA idea sounds simple: prevent bots from massively abusing a website (e.g. to get many email or social network accounts, and send spam), by giving users a test which is easy for humans, but impossible for computers. Then the account-opening process can't be automated, which slows down the spammers and other Net nuisances. But does such a test really exist, and will it save us from cybercrime?

The new CAPTCHA, which you can see on YUNiTi's registration form, is of course billed as "unbreakable by current computer technology". Since current [CAPTCHA-breaking] computer technology is focused on reading squiggly letters with wavy lines, and this brand-new CAPTCHA is currently used to protect one relatively obscure service, this is perhaps less impressive than it sounds. With the right incentive, I believe hackers and researchers will soon be breaking this method, just as they've broken many squiggly-letter CAPTCHAs before.

And don't think for a moment the problem is with this or that specific puzzle-type, and that some new game can save the day. It seems CAPTCHAs must either become irrelevant, or fail, because of three inherent CAPTCHA gotchas:

The mental effort gotcha. Sure, we humans are much smarter than computers, but actually demonstrating that (where it counts) takes time and effort. Even a human interrogator can have a hard time telling humans and computers apart. Website users aren't willing to spend more than a few seconds solving a CAPTCHA, and they already frequently complain that CAPTCHAs are too hard. This means CAPTCHAs need to be simple puzzles we can solve reflexively and mechanically, without warming up the deeper thought processes. This just about implies the solution is likely to be something we could emulate on a computer with reasonable effort.

The accessibility gotcha. CAPTCHAs are inherently terrible for people with disabilities (and are frequently reviled for this). The blind can't see image-based CAPTCHAs, and visually-impaired users don't have it much easier, because of the deliberately unclear images. The audio alternatives are frequently too hard for humans or too easy for bots, and of course they're inaccessible to deaf or hearing-impaired users. CAPTCHAs which use text can get too difficult for some dyslexics, and so on. And even if the mental effort gotcha didn't stop you trying to base a CAPTCHA on more "intelligent" puzzles, would you really want to build in inaccessibility to children or the mentally challenged? Trying to keep a site reasonably accessible means using multiple alternative CAPTCHAs, and (again!) keeping those puzzles quite simple.

The economic gotcha. This is the CAPTCHA gotcha most likely to eliminate CAPTCHAs as an effective tool. Suppose a genuinely hard-to-break CAPTCHA scheme does emerge, and is used to filter access to a valuable resource, for example webmail or social network accounts. Suppose you're a spam-baron and need to open one hundred thousand such accounts. You could pay a small team (in a third-world country with cheap labour, of course) just to solve CAPTCHAs manually for you. The experts say you need 10 seconds per puzzle, or 278 hours total. That's a little more than one work month, which could set you back a few hundred dollars, if you insist on highly-qualified personnel (and even paying taxes!). If you made a business of it, you could probably knock that down to a hundred dollars. I'm not an expert on the malware economy, but I believe that's a fair price to pay, given other typical rates for resources. You'd certainly hope the many millions of spam messages you can now send will more than recover that investment. It's also a great outsourcing niche: just specialise in solving CAPTCHAs, and sell that service. And it's even been suggested that hackers may already be solving some CAPTCHAs with an alternative workforce: they require users of their own porn or pirated content websites to solve the CAPTCHAs from sites they wish to access. Efffectively, they're paying them in pictures or MP3s (which may be even cheaper for them).

The thing about the economic gotcha is that it's pretty much built into the idea of a CAPTCHA: paying someone to solve CAPTCHAs for you by the thousand just about has to be an efficient attack. Users won't spend a lot of time on a CAPTCHA for something of petty value, so the time wasted solving a CAPTCHA must be worth considerably less than whatever the website is offering. The value of the same sort of access to a criminal user is likely to be even greater, whereas the cost of solving is likely to be cheaper for them (consider the economy of scale). In a nutshell:

value to criminals > value to legitimate users > equivalent cost to user of solving CAPTCHA > cost to criminals of solving CAPTCHA without automation

So it doesn't seem that CAPTCHAs can (or even should) retain their ubiquitous presence on free-service websites. But do we have any alternatives ready? Yes and no, but here are some techniques which don't get used enough:

  • Quotas and soft limits for anything with a free account. During 5 years, in which I've used a GMail account as my only personal email, I've sent just a few thousand emails. Google could cap my usage so I'd never feel it, and spammers would be stuck. Much the same goes for Facebook, etc.
  • Heuristic "profiling" for bots: at least block them when they've started spamming, or whatever it is they're doing to abuse the system.
  • Spam filtering, not just on incoming messages (I haven't seen spam in my GMail Inbox for ages and ages), but also on outgoing stuff, perhaps feeding also into the heuristic profiling just mentioned.
  • Requiring message senders (or users of other services) to place a small sum of money, per message, in escrow, which the recipient may collect if the message is unwanted. This method, suggested to combat spam, could make massive abuse of the system very unprofitable, without making the system expensive to use for legitimate users. (It also has several flaws, but that's food for thought for another post sometime.)

What's common to these methods is that they go after the actual abuse of the service (which could very well be carried out by bots), rather than trying to enforce some human interaction, like a CAPTCHA. Use these well, and we'll never need to mangle another letter for CAPTCHA purposes.

[This post also attracted a fair amount of comments and controversy on Slashdot.]

48 comments:

  1. I have suggested on other fora the same thing that your first suggestion means: Impose default caps on sent emails per account, IP, whatever, until the sender has been established as a legit sender of mass mails. That would eliminate spambots running on "regular" people's computers, for example.

    I have been blocked from several services because of my IP (DHCP assigned, NATted) fell in a range assigned to an ISP that had too many spambots or portscanners running in its network or somesuch. If this happens to enough people, they'll either leave the ISP or pressure it to clean up its act (other ISPs could play a role).

    That system would naturally be susceptible for abuse, but then would any other system. Ultimately you will have to come to a solution, that removes the profit from spamming, for example. Your fourth suggestion would go a long way towards that. I'm sure that many people would be willing to place a deposit to cover a reasonable amount of messages. If I ever send a mass mail, it always goes to a listserv, which does the processing - and everybody on the list has subscribed to it. If I abuse the list, they complain, and I get blocked from it.

    There is always a catch in all these, but until we're willing to be educated and act civilized... besides, as someone said, "freedom is messy".

    ReplyDelete
  2. Do you dream of unicorns?

    Yes - Spambot
    No - Human

    ReplyDelete
  3. simple to solve charge 1p or ¢ per email sent, with the first 10 / day free

    ReplyDelete
  4. On the economic (supposed) gotcha:

    People aren't rational.

    People will spend a few seconds doing something that isn't actually "worth" a few seconds of their time. They will do it to see a site that they fully expect will do little of value for them.

    Why?

    Because it's just a few seconds, and people are not thoroughly rational.

    If you multiplied the value of the site by 60, and the time required to access it by 60 (i.e. from 10 seconds to 600 seconds, or 10 minutes), they are unlikely to do it. But when the value is 1/60th and the time is 1/60th, 10 seconds, they'll do it without thinking.

    We throw away pennies without thinking, and we throw away the time equivalent just as easily.

    For the criminals, however, it doesn't work out. The economic benefit IS NOT there. There are much, much better ways to do things than paying people to do this with cash, or by requiring them to fill out captchas to see an image or download an mp3 -- there are simply too many competing sites for people to bother to see an mp3 or a picture, and, when it comes to pirated software, the quantity of downloads isn't high enough to build the massive quantities of accounts that would be genuinely useful.

    On accessibility: as you noted, there are captchas there are accessible to both the blind and deaf. For people too mentally deficient, or young, or both, to figure out the captcha, there is a small likelihood that the site would've been much value to them, anyhow. Seriously, this is not that complicated. Someone in a coma can't solve a captcha, sure, but they also can't read a website or understand the content if it's read to them, so why fault the captcha rather than cruel fate?!?!

    Even then, there are workarounds that would allow certain individuals to bypass the captcha by identifying themselves.

    The captcha is here to stay, period.

    ReplyDelete
  5. Seriously -- this was a good article to get some attention from Slashdot, but the fact that an extremely small minority of people cannot solve any of the forms of captchas and that people with bad attentions can break a small number of captchas a day is NOT a reason that captchas should, or will, end.

    Would you consider the fact that our murder laws do not effectively lead to the punishment of all murders a reason to give up on the laws? How about the fact that, no matter how hard we try, some innocent men will be convicted?

    The same goes for captchas.

    Good job on getting on Slashdot, but the argument is extremely weak.

    ReplyDelete
  6. I see two problems with your last solution (money in escrow). First, you'd have to get everybody on the Internet to agree on the same escrow service. I don't see that happening. Second, the spammers would just pay with stolen credit-card information. Now you've created a system where it costs legitimate people money but costs the spammers nothing. That's the opposite of what you want.

    I've said before: to throw a monkey-wrench into the spammer's works, slow the process down. Require human intervention. Instead of instantly granting an account when someone asks for one, sometime in the next 6 hours (the delay is random) send them an e-mail with a validation code and a human-readable description of how to enter it. No direct links, nothing easy for a computer to automatically parse and use, just a description of what a human needs to do to activate the account. Come up with site-specific wording, and word-wrap each message to a slightly different width so messages aren't absolutely identical. Combined with OpenID to minimize how often people have to open new accounts just to for instance respond to a blog posting, it should be workable.

    Sure, it's not a big hurdle. Not for one account, anyway. But that's the idea. Legitimate users only open accounts occasionally, so it's not a big deal. Spammers, OTOH, open thousands of accounts at a time. They have to wait for the validation code to arrive, and they have to monitor the mailbox and check each message (since they're almost certain to arrive out-of-order), and they have to use addresses that they can receive mail at. Their volume turns a minor annoyance into a major burden for them.

    ReplyDelete
  7. Q: What is five plus seven?
    A: Twelve

    ReplyDelete
  8. I completely disagree with this post for a number of reasons:

    1. None of the alternatives you suggested are feasible at this time. Two of your solutions address email client side problems, and do not address the issue of a bot successfully sending a message via the form on your site.

    The 'profiling' is a nice idea - but who is going to create the profiles, store the list of spammers, and offer the service as an open-source platform? I'm not sure anyone is going to offer this as a free service when companies like Barracuda charge an arm and a leg to run similar servicse.

    The 'escrow' is also a nice idea, but again how is this implemented? Are you asking people to submit payment information every time they want to post? Again, what company is going to offer escrow services for free? Are you asking people to pay for an escrow service to then pay to post on your site? Good luck growing a user base that way.

    2. Captcha's are not that difficult. Some are harder than others, but if you visit 10 random site using Captcha's I'd bet you successfully pass 8 of them on the first try. If Captcha's were really so difficult to maneuver, why have they become so popular?

    I agree that there is a better solution out there, but we don't have one yet, and the cons of Captcha really aren't that bad...

    ReplyDelete
  9. You are right. Captchas are just easier to "add" to software while soft and hard quoats, heuristic and learning software, peer to peer authentification or message "billing" (1 EUR/USD cent) is hard to add to software. Hard means: it costs developer time and brain power and hardware power.

    Captchas are inheritly bad - you only use them when you are lazy and need to fix that spam problem "fast"

    ReplyDelete
  10. There's also an alternative approach to solving the problem of spam. That is, creating a system that allows pretty much anyone to send public messages that get delivered accurately to just the people who feel, after reading, that they liked reading them.

    When we have that, who will pay someone for sending spam anymore? Where spam is mass, unsolicited email. This approach has messages that resemble spam but are not, in effect, unsolicited.

    Right now, you probably have big question marks on your head that read "how?". We have tools that might work for this but we only use them for weeding out spam. Bayesian filters.

    The data fed into the single user Bayesian filters could also be used by other users whose "want to read" and "don't want to read" choices are similar enough so new users could get up to speed faster.

    Similarly, instead of just relying on Bayesian analysis, if significant percentage of people who have chosen like you in the past have chosen "don't want to read" on message that Bayesian thinks you do want to read, that could also be taken into account.

    Perhaps this kind of a system is in the works already? If so, please drop me a note about it.

    ReplyDelete
  11. This is a well-written article. However, I do disagree on a couple points. I do not believe text based CAPTCHAs have been done well in the past, and if done correctly they are highly accessible, highly adaptive, and very difficult to automate answering. I also believe they can achieve this goal while generally requiring less mental effort than current CAPTCHAs.

    I have actually written a CAPTCHA system that I believe does these things. Its main goal is to be highly accessible (text based with very little obscuring, so dyslexic/blind/visually-impaired users should hopefully not have much trouble). It is highly customizable, extensible, easy on users, and very difficult to automate. I invite everyone to check out an example of this system here: http://linkleaf.com/acaptcha/acaptcha.php ... I have also implemented the captcha as a wordpress plugin on my blog (http://examancer.com/) which I will release on wordpress.org within a couple days.

    I do agree that CAPTCHA alone isn't an optimal solution. I think one spam/bot/abuse control mechanism that isn't utilized enough is sane throttling policies (ex: 1 blog comment per IP per minute ... stops floods of spam while allowing regular users to get through). I have yet to need to implement a policy like that as my spam filtering and CAPTCHA systems have been sufficient... but that would be my first solution in the event of increased spam.

    Determined spammers/abusers/irate-users will alwats find a way tp get through. There is no fool-proof solution. These solutions will just slow them down... which is sometimes all an admin really wants. Its a lot easier to deal with 100 spam a week than 100,000.

    ReplyDelete
  12. If CAPTCHA doesn't work, there are many internet sites containing known spammers identified by e-mail address, username and IP address. The information can be queried dynamically through simple API calls. Current resources I'm aware of for this are:

    fSpamList
    StopForumSpam
    BotScout
    ProjectHoneyPot
    Sorbs
    Spamhaus
    DroneBL

    Here's a terrific standalone tool.

    http://temerc.com/Check_Spammers/

    More on the subject:

    http://www.stopforumspam.com/

    ReplyDelete
  13. I've said for a long time that telemarketers and now spammers wouldn't do what they do if idiots wouldn't purchase their products.

    ReplyDelete
  14. The only true solution is to ask to ask factual questions and verify the answers based on a massive graph of data containing all basic fact units for a certain topic. In other words it is a perfect conundrum - you need AI to verify intelligence which could then actually simulate intelligence. But since different AIs will have different databases - sites can specialize for example Disney's captchas would ask obvious questions from Disney movies that preumably noone else would have a trained AI for. I would expect AI based captchas 10-20 years out but for now the current one has to stay because as pointed out it is in fact economical to be able to quickly and effectively reduce spam by 1000%.

    ReplyDelete
  15. The argument that you can just limit the number of emails one can send to do away with the need for captchas doesn't hold up when you are talking about voting and polling. You need captchas with online polls, because people have found ways around the simple blocking an IP from voting twice.

    BTW I love that there is a captcha to post a comment here :D

    ReplyDelete
  16. The key point of the dissertation is that we are trying to solve the wrong problem. We are not trying to avoid people from creating and account (that is our business) we are trying to avoid abuse. So how about this: Use any soft method to delay the creation of the account, after that if you are trying to go beyond some level of use, trow a random CAPTCHA at the account. So the computer has to figure out wich captcha it is, and then solve it. Each time it wants to broke the limits again an other random captcha. Normal users will never see the captcha. Boot will be fighting it all time.

    ReplyDelete
  17. I don't think we'll see the end of captchas, necessarily (it's still a first line of defense against black-hats looking for low-hanging fruit), but I *do* think that you make a good point about trying other mechanisms to solve the problem. You wouldn't protect something of value with just a single lock, and sites that don't do defense in depth are going to find that out.

    Unfortunately, I think the only way we'll be able to solve this problem "for good" is with some major (read: expensive) infrastructure change.

    ReplyDelete
  18. ...Business as usual, without abusing the English language? UR kidding, right? lol

    ReplyDelete
  19. I've never understood why the majority of CAPTCHAs fall into a sort of "puzzle" category. Anything that requires us to use logic to solve means that eventually our robotic overlords will be able to solve it as well. Why not go with the one determining factor that separates humans from machines: emotion.

    CAPTCHA: What makes you feel happy?
    Picture A: Fork
    Picture B: Fingernail Clippers
    Picture C: Ice Cream (with sprinkles)
    Picture D: Sand

    Problem solved. Humans win.

    ReplyDelete
  20. Proof-of-work does not have accessibility problems and has a price that can adapt to the adversary. There's a good example at http://kapow.cs.pdx.edu

    ReplyDelete
  21. The problem with CAPTCHAS is they are OVER USED by such sites as Yahoo and MySpace. Only an idiot would stay at Yahoo mail when they have to enter a dumb CAPTCHA every time they respond to an email. Yahoo adds this abuse to user who use their Sign-In and Password, then are spammed by CAPTCHAS. It is just abuse period!

    ReplyDelete
  22. >CAPTCHA: What makes you feel happy?
    >Picture A: Fork
    >Picture B: Fingernail Clippers
    >Picture C: Ice Cream (with sprinkles)
    >Picture D: Sand
    >
    >Problem solved. Humans win.

    and what if some weird person says Icecream? the beach and sand are really awesome and makes me happy when I go. or perhaps someone has a different view of forks than you or I do and thinks those make him happier.

    you can't do emotions because no one's emotions are completely the same.

    ReplyDelete
  23. I've done away with CAPTCHAs. Regular email validation works just as well for me, and is less of a hassle to users. It also serves a valid purpose.

    I think the real clincher is gotcha 3. Since spammers use humans, the CAPTCHA does exactly what it's meant to do: let the humans though :-)

    I think spam filtering on sent mail is where the biggest dent can be made to email spam. Most spam filtering seems to be done on the receiving end, which means it's already too late. All you can do then is delete the mail, but the spam will keep on coming even if you yourself don't see it.

    Similarly with forum posts/comments, a user can automatically be banned if eg. the user sends x number of messages flagged as spam.

    Another method would be to have posts moderated by an admin. Only allow 1 comment per user unless they've had a comment approved. That way you get max 1 spam message from a spam bot (per account that has been set up), and you simply delete the spam message and ban the account in one go. This one could be easy to beat since you just need to post a legitimate post, and then you can spam away. But it would be fascinating to start seeing spammers post comments that actually contribute to the discussion rather than "Buy V1AgrA! Cheap cheap cheap! [url]http://www.you-are-a-dumbass-if-you-buy-meds-online-from-someone-you-dont-know.com[/url]"...

    Of course the methods I described require someone to set up an account in order to post, and some people don't like that requirement. In that case, you don't really have much choice.

    ReplyDelete
  24. What a load of rubbish that new 3D image version is. From an image analysis point of view it is even more simple that the text based one - you could match just using a gaussian filter analysis (no need to go into shape analyis at all which you could always fall back on) the off axis rotation is not great enough for there to be any difference - they have tried a small amount of light change to alter the highlights (and hence effect the distribution of luminance) but no where near enough to make a difference.

    ReplyDelete
  25. I've found a combo of the Akismet and Bad Behavior plug-ins plus moderating all comments by new commenters keeps my WordPress blog spam-free.

    ReplyDelete
  26. @Examancer - Your CAPTCHA is only accessible to people who:

    1. Speak english

    2. Are not dyslexic and/or mathematically challenged

    I agree with the article - CAPTCHA is dead. Let's work harder to find better solutions instead!

    /M;

    ReplyDelete
  27. This comment has been removed by the author.

    ReplyDelete
  28. You missed on solution, probably the hardest though- get people to stop buying stuff sold by spammers!

    ReplyDelete
  29. The "mental effort" gotcha is a non-starter; if you really believe that "simple for humans" implies "simple for computers", then you need to talk to some AI researchers :) The whole point of CAPTCHAs is to choose problems that are easy for humans, but that AI research has not yet been able to crack. That way, if the CAPTCHA is broken, at least AI research is advanced. (Furthermore, with reCAPTCHA, humans are actually solving *useful* problems that computers *were* unable to solve.)

    The "economics" gotcha is a known issue: von Ahn has talked in his presentations about the CAPTCHA sweatshops and dirty porn tricks that spammers have used to hack around CAPTCHAs. But as other commenters noted, this at least increases the effort it takes to spam, which almost certainly has a reductionary effect.

    The "accessibility" gotcha is the biggest one i think, but it doesn't seem insurmountable.

    ReplyDelete
  30. if wow gold and maple story mesos wow gold|*|wow power leveling|*|http://www.superpowerleveling.com|*|fdgf51

    ReplyDelete
  31. My cousin recommended this blog and she was totally right keep up the fantastic work!
    Email Spam Filtering

    ReplyDelete
  32. greetings to all.
    I would first like to thank the writers of this blog by sharing information, a few years ago I read a book called costa rica investment in this book deal with questions like this one.

    ReplyDelete
  33. Whoa, the Yuniti's Captcha is quite good. It will be very hard to decrypt that thing.

    Small Business Answering service

    ReplyDelete
  34. Hello .. firstly I would like to send greetings to all readers. After this, I recognize the content so interesting about this article. For me personally I liked all the information. I would like to know of cases like this more often. In my personal experience I might mention a book called Generic Viagra in this book that I mentioned have very interesting topics, and also you have much to do with the main theme of this article.

    ReplyDelete
  35. Sometimes it is difficult to understand captcha codes on sites using
    Captcha form.

    ReplyDelete
  36. At last! I found a good post like this.. Thanks for this informative post! By the way, can you write a post about plaxo seo factors? Thanks again!

    ReplyDelete
  37. How may find correct captcha for the security code.
    buy viagra

    ReplyDelete
  38. how may i get information about captcha and gotcha.
    please share with me.

    ReplyDelete
  39. how may i get more information from your website about lawyers and its directory.
    search a lawyer

    ReplyDelete
  40. how may i get information about lawyers from this site.
    saudi airlines tickets

    ReplyDelete
  41. custom essay writer
    I want to say that as a visitor this site is really great for all kinds of people.Everybody wants to visit such this type of site.People wants to know many kinds of information from other site.This website is one of them.So i think people will be very well benefited from here

    ReplyDelete
  42. Le mystère pièce fermée à clé est un sous-genre du roman policier dans lequel un crime presque toujours assassiner s'est engagée en vertu poste circumstances.Really agréable apparemment impossible.http://www.selectionprix.com/10-poussoir-a-saucisse-manuel-professionnel.html

    ReplyDelete
  43. Very excellent Coding..... Thanks for sharing.., Really useful to us.... Thank you!..



    captcha solver

    ReplyDelete