Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Any good FOSS alternative to Google's reCAPTCHA?
241 points by bocytron on May 6, 2020 | hide | past | favorite | 133 comments
Google's reCAPTCHA is everywhere, they seem to have the monopoly of checking if the user's not a robot.

CAPTCHA systems are essentials to the web, and it seems important to me to have a (good) FOSS alternative, but I can't find any.

Are all CAPTCHA closed-source to make it harder for attackers? Am I missing something?



Cloudflare recently moved away from Google's reCAPTCHA to hCaptcha.

Announcement: https://blog.cloudflare.com/moving-from-recaptcha-to-hcaptch... Discussion on HN:https://news.ycombinator.com/item?id=22812509


hCaptcha is the worst most buggy captcha service I have ever encountered. As a matter of fact digital ocean recently added that to their login screen and it made my life a complete nightmare. The fact I had to solve it everytime I wanted to login, I genuinely decided to leave their infrastructure and move to AWS regardless of the higher bill.

But I must give 100% points to DO support. Before leaving them I sent one last support ticket to DO with the recording my nightmare in a mp4 file and lo and behold they changed their entire login flow for me (1). I no longer see that and life is good again.

(1) https://imgur.com/a/GKJHhtT


Not sure how this is FOSS?


hCaptcha is annoying. I see it every time I open codepen and few other website.


hCaptcha is less annoying when using Tor than ReCAPTCHA at least.


Exactly my experience. And not just Tor, just about every datacenter IP address.


I found the tasks for alternatives (might be hCaptcha) hard to complete correctly even for a human.


Yes, and it made half the web unusable. Getting captchaed twice in 60 seconds is an immediate bounce from me.


What? The number of captchas is the same. And in my experience, hcaptcha is a heck of a lot nicer than recaptcha. It also works on Tor. This move from cloudflare will probably save me multiple hours of time per year, that I previously spent waiting for pictures of fire hydrants to load.


If I understood their blog posts correctly, the captcha rate should be the same - it's shown to you when Cloudflare detects that you need one, not when reCaptcha/hCaptcha decide. CF just has them show the captcha.


The amount of Captcha's you need to solve will likely go up if you previously got them but had a good enough ReCaptcha score to not need to solve their challenges every time. hCaptcha does not yet have something that makes one captcha solve bypass the following few captchas.


I get like 3-4 ReCaptchas literally every time I encounter one, so I have zero problems with them changing.


This. I chalked it up to aggressive Firefox settings.


I have not seen this at all. In my experience hCaptcha results in the same or equal experience that reCAPTCHA had. Anecdotally my pass rate on hCaptcha is actually higher (reCAPTCHA gets suspicious more quickly and often asks to solve more than one prompt).


Founder of HCaptcha here. Have you tried our privacy pass or accessibility features?


Can you please open source your frontend code? The net absolutely needs a captcha that has made a promise to not run tracking code, and sticks to that by providing transparency. Google is not doing it and has only increased the amount of tracking code. If you're doing things correctly the obfuscation of the data coming from the backend is what would be critical to keeping bots out, not the actual code administering the turing test on the frontend.


https://imgur.com/tyZnANQ

Firefox through a VPN and every site that uses hcaptcha throws this up and doesn't let me in.

You're the one who is responsible for this web cancer? You purposefully disable access to VPN users and then try to pass off a "privacy pass?"


Cloudflare is responsible for deciding you need a captcha...


This is exactly the position Google tried to put themselves in with reCAPTCHAv3, where they simply provide a “score” and websites now make the decision of blocking users themselves, absolving Google of any blame. If your captcha gives scores to real users that would cause everyone to block them, that’s on you.


Your privacy pass is useless. You ruined shopping for me and several others on niche e-commerce. So much that the e-commerce owners had to add my IP as a whitelist. I'm glad I buy a lot, otherwise me and them would be screwed.

Destroying people's businesses is not nice. That's why no one cares about security, "security engineers" don't test or measure the impact of their strict measures.


HCaptcha makes me do a lot more work to be considered a human, and I end up having to do it more often.


I absolutely love Cloudflares switch from Googles absolute shit of reCaptcha to the new hCaptcha. I still have to fill it out twice, but it takes like < 30 seconds to complete my action and get on with my life. With Googles reCaptcha I always wasted ~10 minutes (this is not an overstatement) and was sometimes still unable to do anything. I'm really grateful for hCaptcha as it somehow simply works.


Privacy pass (beta) has dubious privacy claims and does not work on (most?) mobile browsers because it requires a browser extension.


Why would I sign up for something from your service to get around your service? I'll just bounce and use one of your customers competitors.


I just close the tab when I see an reCaptcha checkbox. hCaptcha is fine, I can totally not see what everyones problem is with hCaptcha.


lol wut


Idem.


It's not FOSS though, is it?


I use hCaptcha on multiple sites and honestly, it does the job well. To those who claim it hurts business, yes it adds an extra step to the verification process but it also keeps you and the rest of my visitors safe.

It almost feels like that the inconvenience of using a multi-layered system equates to the people who want to get back to eating in restaurants while in the midst of a pandemic and will then complain if they catch Covid-19...


This comment was brought to you by Intuition Machines, Inc, Labeling the World.


[flagged]


Even when you disagree, please be polite.


What is your use case? I get exactly 0 spam on my website (of 100,000s of users) by simply writing my user registration page in a nonstandard way that bots aren't familiar with filling out automatically. It uses JS to `fetch()` a custom API endpoint and then redirects to the homepage.

Or for example, a fixed question "What color is the sky?" or something can reduce spam by orders of magnitude relative to nothing at all.


I think a "honeypot" HTML input field works well for anything not written explicitly to target your site. If any text is entered, mark as bot/spam.

    <form>
    <div style="display:none">
    If you are human, please ignore this field:
    <input type="text" name="Name" value="My Name">
    </div>
    Name:
    <input type="text" name="actualfield">
    </form>
Bots can't resist. Accessibility is fine, I think.

(Edit: suggested earlier elsewhere in the thread by tyingq: https://news.ycombinator.com/item?id=23090550 )


I used this technique in my forms until I realised that the browser's auto-fill also works similar to the bot and will fill fields that has a familiar field name (email, name phone etc). Real users (many of them) who use browsers auto-fill feature will get blocked by this technique. If you add a field with a random field name bots ignore that field.

One thing that works still is using Javascript to create a hidden field and make that field mandatory. Run of the mill bots don't run Javascript yet. However this will exclude people who have disabled Javascript in their browsers.


This works to the extent that bots aren't contextually aware of accessibility semantics. If the bot is mindful to the fact that the field isn't displayed, it could skip it. Which is exactly what screen reader technology would do, due to the "display: none;" rule.


Perhaps the trick could work by displaying it but setting the opacity or the height to 0, and hiding it from screen readers with aria-hidden. But I guess that won't fool the smarter bots.


  > <div style="display:none">
No, don't do this. Just use:

  <form>
    If you are human, please leave this blank:
    <input type="text" name="Name" value="">
    Name:
    <input type="text" name="actualfield">
  </form>


Accessibility should fine if screen readers haven't changed a lot. They skip or are not even aware of display:none blocks

Does the above honeypot work well with bots using headless browsers? Or is actually rendering the page not common enough for bots still?


Does this break things with chrome or LastPass autocomplete?


It shouldn't; you don't fill anything at registration, so even if the password generators prefill it, it should remain empty and can be ignored.



A website I use used to have a question of "How do you spell 'blue'?" Then a bot figured it out and they had to change it to "How do you spell 'green'?".


I like a test that asks a question relevant to whatever the site is about. "What game is this forum about?"

That, or a slightly harder variation, might also have the benefit of slowing down human trolls. But the answer should be easy for any legitimate user of your site. And of course easy to check automatically.


When I used to manage web forums, I ask a hard question and put the answer within the question itself (e.g. "hint: the answer is xxx").


I've seen that. It's great for keeping out generic bots, while allowing anyone with the slightest reading comprehension in. And if your forum is small, nobody is going to bother writing a custom bot for it.


I like this approach, specially for niche websites. Usually a pool of themed questions is enough.


This gets me thinking. What we're looking for here is a way for "small" players to be able to survive without having to lean on Google. But small players are smaller targets for bots. So they don't need to take drastic measures. Once you can get big enough to be noticed by more sophisticated bots, you would be more likely to be able to afford a more sophisticated defense.


how do you handle blind or colorblind visitors


Blind and colorblind people still know what color the sky is. It's impossible to live long enough to register on a website and have never heard that.


OP edited their comment. it was originally "What color is our logo?"


Ah, sorry about that then.


Our logo is black actually.


You just need to have heard the most famous song from The Mamas & The Papas.


Hopefully members on HN are smart enough to generalize my example to something that may be better or more suitable for their own website, and not just lazily copy-paste my example of a generic question. If you do, I'm not sure you pass the human test.


Too late. My CAPTCHA now reads "What colour is vortico's logo". It's pretty effective. Not one bot signup.


They know it culturally.


grey


UK grey or Seattle grey?


UK is grey, Seatle is gray


Television-tuned-to-a-dead-channel grey?


Usually black, actually; sometimes with white dots.


Octarine.


You mean gray? /s


darker than that


> "What color is the sky?"

Well, the answer is obvious:

> The sky above the port was the color of television, tuned to a dead channel.

I hope this is the good answer you support on your page.

On the other hand there is no one answer to this question, as the proper answer should begin with "it depends...". Currently, the sky is totally dark grey, storm is coming. Soon, it will be dark, so the sky will be black.

I think your "captcha" is broken.


I think this falls under something like https://xkcd.com/810/ , where you would not be allowed access, and that would be deemed a benefit to other users.


Perhaps his forum doesn't want snarky people?


Here are some [0] and if your submission is not on there, pls consider PR'ing it. OP is right, we need more alternatives.

[0] https://github.com/ZYSzys/awesome-captcha



Do you need a CAPTCHA? Or do you need to slow down / stop spammers? Consider hashcash [1] instead of CAPTCHA if #2 is your goal. It can be used in any place where real users interact with your site at almost zero effort on their behalf, and can slow down spamming enough to make you an unattractive target.

I have a terrible / incomplete / janky proof-of-concept version at [2] that you could build from, or you could find one that was built for your CMS / language du jour.

[1] https://en.wikipedia.org/wiki/Hashcash

[2] https://github.com/007/hashcash


> [2] https://github.com/007/hashcash

Looks like your repo is https://github.com/007/hashcash-js

But, cool! Thanks for sharing.


Proof of work is better IMO


hashcash is proof of work


We are trying out https://www.hcaptcha.com/ in our application.

It's not FOSS, but seem to be a viable alternative to give a go. So far it does the job, though the images load a little bit slower than recaptcha


I'm not a big fan of hCAPTCHA at it's current form. The challenges seems so much harder than reCAPTCHA ones and I keep failing them. The images are just extremely low quality. Maybe I'm a bot.


I get easy challenges. Perhaps even a bit easier than the reCAPTCHA ones, and less of them for sure.


It's way worse than Google's for me. I am using Firefox and I don't even try anymore whenever get exposed to it on any Cloudflare website.


Founder of HCaptcha here. Have you tried our privacy pass or accessibility features?


The answer to "your basic service doesn't work well" should not be "well have you tried our additional services?"


I am scrolling through HN on my phone and saw privacy pass isn't supported for android. Is porting privacy pass to the android version of Firefox on your roadmap at all? If not, would you consider adding it to your roadmap? Thanks.


No, I haven't but your service is making it literally impossible to browse with Firefox and Tor (understandably because you're here to make money from your customers labeling people like me as a threat in their dashboards not to enable people to actually browse easily). Even reCAPTCHA doesn't do that.


I suspect you will see a considerable bounce rate once you switch. Pretty much the day Cloudflare flipped over half the web became 'captcha every 5 seconds' garbage.

Some sites that are the only source of what I'm looking for will be fine, but most I just bounce from now.


If you aren't a big target, sometimes just a visually hidden form field that shouldn't contain anything is good enough.


That's if you aren't a target at all, which is only applicable for very few services. Any inexperienced attacker could use burp suite or inspect element to see and imitate this hidden field.


Sure, but that's often the case. You might only be a target, for example, because you're running WordPress. Nobody is deliberately going after your site specifically. There's plenty of sites that don't need a full-on captcha.


Oooo, I like this, thanks for the tip :)


Do try something more obscure than just display:none or similar. The bots seem wise to that.


Also call the field “website” or any variation of standard Wordpress field names.


Previous reCAPTCHA discussion, with some alternatives https://news.ycombinator.com/item?id=20158386



Click-captcha has tiny touchpoints, so I fail a couple times on mobile, just from that. Phpcaptcha has autocorrect enabled on the text box, so "alk" was changed to "all", causing a fail. Captcheck worked well.


I know it's not as usable for visually impaired people, but I kinda like those that ask you to drag a piece of puzzle on an image : https://github.com/ArgoZhang/SliderCaptcha


I second phpcaptcha.org.


What's your threat model? Maybe a CAPTCHA is not your only or not even a good solution. What about blind users? or with some other disability?

Think: rate-limit, IP rating/scoring, your own filter on messages, etc.


Latest version of recaptcha doesn't even require any solving.

It just analyse the traffic and give the site owner a score [0.0 - 1.0] on how sure they are the visitor is human.

They don't explain how they calculate the score, but from my usage it's pretty accurate. They suggest to consider at first anything higher than 0.5 to be a human.


And they also disregard RGPD completely. Maybe OP has strict legal requirements that make using Google's tools a NOGO.


Is that a new version that isn’t in use anywhere yet? Because for me reCaptcha is still a tool that analyses how much they know about you and if you dgaf about privacy, it lets you through.


It came out at the end of 2018. You probably just haven't noticed it because the only visual is a little badge at the bottom right hand corner of the page.


Haven’t seen any site that used reCaptcha that lost it. Any site you know where I could see it?


Here's a random site from builtwith that uses it https://press.priceline.com/

But yeah, it's probably not common for sites to 'upgrade' to v3. Recaptchas are a feature you begrudgingly add and v3 puts the onus on the product to decided what to do when there's a low score, so its pretty different.


Huh. I saw no logo or text referring to recaptcha at all. Yet before I allowed the Google domains for recaptcha, the message did not send. I guess they allowed low scores? Thank you.


Just FYI: google recaptcha works with screen readers too and blind users can use it.


Yep. Apache beam with fraud detection heuristics.


I just did research few days back, and there are none that aren't passable with some OCR/tensorflow tech. Anything simple and the question is why do you need it ? Anything hard enough for bots not to beat it will also fail many humans.

Add rate limiter instead and put CF infront or something similar. Way better experience then any captcha.

In case you still want it here is solid one:

https://github.com/dchest/captcha


Not exactly answering the question, but I recently used aliexpress.com and their captcha system is super easy: it shows a sliding button like the one to answer a call on a phone. The prompt just asks you to slide it to validate your input. Not sure how it works, but it sure is a much better UX than when I have to spend 3 minutes identifying for hydrants. Maybe we could make a FLOSS version of it?


The common recaptcha is just a "check box to prove you're human" if Google knows you even slightly otherwise.


But is that good for accessibility?


What some wikis do is just asking a question (in text) that you can then type in the answer (and if you don't know, can look it up in a book, Wikipedia, Google, or whatever you want to look it up, or ask someone who does know the answer). I think that work much better than reCAPTCHA.


I would say it really depends on your use case.

Lets say you have a comment section on your site where any user can write stuff.

More often than not a hidden field which should not be filled (the honeypot method) and a spam filter gets the job done no problem.

For registrations it can be more problematic because the spam filter does not work that well.

I have yet to find a good alternative to commercial captchas as well but rolling your own solution is possible.

And probably even the best idea because if every site has its own weird system it would make the life of bots quite hard.

In the end a dedicated attacker can always hire people to fill the captchas and circumvent any system for an astonishingly low amount of money.


I just want a version of captcha that isn't tied to my google account. This is particularly an issue on anonymous message boards like 4chan. If google wanted to, they could tie pretty much every 4chan post to a google account.


Problem is CAPTCHA is a hard problem to solve now days. It’s not like before when you can just display and image and ask what the letters are. It takes machine learning, lots of training data, etc.


Also it has to take into account that the puzzle may be sent to a farm of human puzzle solvers. Or that the puzzle is sent to unsuspicious users of another website.

I think this is what makes Google's approach powerful because they have the best view on IP addresses used worldwide. (Whether that's desirable is still another question).



Latest reCAPTCHA isn't even detectible. It runs in the background of the browser and gives predictions for bot traffic.

The days of reading images as validation are going to be one of those "remember when" moments on the internet.


It works well when you’re signed into the Google advertising ecosystem. But god forbid you try to use Tor to browse the web, you’re going to spend hours of time every week deciding whether the pole counts as part of the street light and waiting for pictures of fire hydrants to load.


I use VPN and Privacy Badger + UBlock origin in Firefox containers (I use google in it's own container). I swear Google hardcoded this particular setup to go to the shit-list. I'm sick of buses, bikes and traffic lights. Wrote a lot of angry emails about reCAPTCHA to local government agencies and services for which I'm a logged in and I'm a paying fucking customer.


My favorite is when they refuse to even deliver the audio challenge. Surely that’s a violation of some law (ADA?) if it happens on certain websites.


I still pick bikes and road signs and markets in the same frequency as before. Didn't notice anywhere before reading this comment. I use Firefox.


Same here. My SO teases me whenever she sees one on my laptop, because she thinks I’m getting punished for being “bad” at identifying crosswalks and boats. Of course she uses chrome and never sees them.

I swear the algorithm is:

    (if isFirefox and isUMatrixLoaded)
It makes me wonder, are bots really more likely to use FF and ad blockers? Whenever I write scrapers and want to come off as human, I always use headless chrome spoofing non-headless chrome, and I never get captchas...

I’m also vaguely aware of headless chrome botnets used to mine ads. They definitely don’t use uMatrix. :/

Is this anticompetitive in some way?


I don't think there's anything going on here explicitly targeting Firefox. Adblock/UMatrix/etc are doing their job, blocking cookies and other tracking and heuristics that allow sites to decide you are "likely human" and skip showing you a captcha. Using Chrome (and being logged in with your account connected to Chrome) is providing a heuristic that Firefox cannot, allowing you to skip captcha since you are being tracked by the browser itself. So it's more implicit than explicit.


Headless browser scrapers might blacklist things they don't need, like ad domains and trackers, since all they do is slow down how long it takes for their desired content to load.

The algorithm simply checks to see how often you check in with Google and how "human" your actions are based on the data you send to them via things like Search, Maps Location History, YouTube, Gmail, probably Chrome Sync, etc.

If you're using a completely new Google Account and block all of Google's Trackers, choose to use FastMail for your actual email communications, and only occasionally watch YouTube, Google has every reason to believe your IP is a residential proxy that's being used by a bot or perhaps someone in a third world country working at a recaptcha farm. They can't completely prevent the latter from solving their Captchas but at that point all they need is for the farm worker to help Google tune its self driving dataset.


They're talking about ReCaptcha v3[0], which is completely different in comparison to current ReCaptcha that requires you tick a checkbox. The new version analyzes your behavior across an entire website and, when the website requests your threat score, it returns 0.0-1.0 with 1.0 being considered human.

The problem with this new system is that it doesn't solve the problem developers wanted to solve before: preventing unauthorized signups. Most developers don't want to know how likely it is that a user is actually a bot, they just don't want bots signing up for user accounts. With the new system, Developers have to decide what to do with a threat score. Should 0.5 require some interstellar page with a v2 captcha? Does a 1 or 0.9 mean they should give them access to certain data that might be otherwise hidden from boys? It's something developers generally didn't need before.

We likely won't see the "i'm not a robot" checkbox (/invisible recaptcha[1]) go away anytime soon since v3 solves different problems, but it does mean websites might start sending you down different verification funnels depending on how human Google thinks you are with no way to solve a Captcha that proves Google otherwise.

0: https://developers.google.com/recaptcha/docs/v3

1: https://developers.google.com/recaptcha/docs/invisible


Firefox + VPN and I have to do that and I get pretty blatant fake failures and end up having to do it numerous times. It honestly feels like Google is trying to punish me.


It has been 2–3 years already since I heard of the new reCAPTCHA running in the background. However, that must be dependent on Chrome or on some JS that Firefox NoScript users may leave disabled, because privacy-aware internet users will still encounter image CAPTCHA hell doing many ordinary things on the web, and the situation only gets worse if you add Tor to the mix.


The worst are the sites that don't have a fallback. If the new CAPTCHA decides you are a bot because, for example, you are running a VPN or something there's absolutely nothing you can do. Has happened to me two or three times.


Except when you use a couple of privacy-related extensions in Firefox and every other website forces you to mark bicycles or buses.

Or do these websites use older versions?


So long as you're using Chrome. I encounter them all the time.


doubt this very much since google is currently getting waaaaay too much free ML training data from captcha, and most users are blissfully unaware and happy to continue filling out their puzzles.



It is not really a captcha, but I used email for people to submit comments to my website. You could rely on a third party mail provider for for filtering, which would make it even simpler.


I wonder what all those darkmarkets use. I assume they are pretty resilient!


Cloudfare should run captcha service and not sell the data acquired as such.

Data can be used by their ddos protection scheme but that's all about it, not to be sold to advertisers or other firms.


captcha is the cancer of the web




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: