Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
List of Dirty, Naughty, Obscene, and Otherwise Bad Words (github.com/shutterstock)
54 points by eorge_g on April 6, 2016 | hide | past | favorite | 66 comments


For anyone thinking of using this: You will never be able to make people not say “bad” things.

I want to stick my long-necked Giraffe up your fluffy white bunny.

http://habitatchronicles.com/2007/03/the-untold-history-of-t...


One good reason to use a list like this is to filter suggestions. If a user types 'cun' into a search box, almost all websites will not want to suggest you-know-what as a completion.

Edit: and alas, at Blekko they never let me ship an April-fool's "Did you mean: Mother trucking son of a blintz?" module. No sense of humor.


I agree in spirit, but then you end up with aggravating cases where Apple refuses to let me write "fucking" and replaces it with "ducking" instead ;)


That isn't filtering suggestions, so no, that's not what I was talking about.


He's talking about text autocomplete, which is a suggestion mechanism. Autocorrect knows full-well that "fucking" is a word, and doesn't put a squiggle under it or anything. But autocomplete—the thing that happens if you write "fuc" and then move the insertion cursor—will skip right past that word in the Autocorrect database give you "duck" as the best-fuzzy-match.



Where possible I use "sofa king" instead of "fucking".


So true. People will always find a way to talk euphemistically. And on the other end of spectrum, you can't really filter out borderline-offensive phrases.

Reminds me of a system I worked on where we generated temporary passwords for tens of thousands of invitations to a big event. Somebody tried to be clever and generate "user-friendly" passwords by combining words from a public 1st grade vocab list. It seemed like overkill for a temporary password, which had to be changed upon initial login, but whatevs, we had extra time on the project. It looked good in testing.

Within hours of sending out the first batch of invitations, we started getting complaints of people "not comfortable" with their passwords. I don't remember all of them, but some great examples were things like "donkeybanana1014", "drunkgod9488", "devilboy4593". It wasn't a huge PR problem, mostly just caused some laughs and little support scrambling, but I filed it away as yet another example of someone getting burnt by trying to be clever.


Tangential but related: Toontown Online had two communication systems: "speed chat" and "secret friends". Speed chat is basically safe, prefabricated messages. Secret friends is unfiltered text chat, but only between people who have swapped friend codes. Since the game didn't support swapping friend codes, they were trying to make sure you only connected with people you knew.

However, players figured out a language to encode friend codes using speed chat phrases. So you would send a series of speed chat messages, the other player would send some back, and then you'd have each other's friend codes.


>In 1992, I co-founded a company with Chip Morningstar and Douglas Crockford named Electric Communities.

Oh wow, it's really that Douglas Crockford. Embarrassed to say I only knew of his Javascript work.


Chip Morningstar! :-) What a name!


We need AI that can generate these 0_o ... Think of the possibilities!


Made me snigger. Incidentally, you can't use that word on many forums: snigger.

I live near a town called Scunthorpe. Scunthorpe residents tend to use the nickname Scunny when they're on-line, because the full name gets blocked so often.



I made a reasonable whack at solving the Scunthorpe problem by building a whitelist of terms containing those 4 letters, and a similar list for the Hiroshita problem. Alas, now IBM owns all that code. As I build a search engine with autocomplete for the Wayback Machine, I'm really missing having it.


Yes. I used to frequent a message board that implemented a really dumb profanity filter. It got to the point that its users would greet one another in face to face meetings with "Hecko!"


I see people on dating websites writing c* cktail with the '* '. Not nearly as funny as Hecko, though!


A clbuttic issue.


Hilarious. See if you can find:

- a word from the 10 commandments

- a great jazz album from Herbie Hancock

- a James Bond film

- the surname of a US presidential candidate for the 2012 republican nomination


What a fun and educational quiz! Thank you.

And the answers are:

- a word from the 10 commandments

ass, of course. An easy start.

- a great jazz album from Herbie Hancock

Today I learned that Herbie Hancock is a jazz musician who made an album called Mr Hands. I also learned that there is a video of a man being buggered by a horse called Mr Hands.

Thank you for enriching my life.

- a James Bond film

And at this point I lose all respect for the compiler of this list. Octopussy is a James Bond film. That is all. Any other usages are just silly.

- the surname of a US presidential candidate for the 2012 republican nomination

Santorum! Santorum! Do people actually use that word, or is it just a running joke?


Mr. Hands died in that video, hence why it's kind of legendary.


Well,he died later at the hospital. It's not strictly a snuff film, to be pedantic.


> - the surname of a US presidential candidate for the 2012 republican nomination

That one is a bit of a special case. His name is a swear word because it's his name. As a protest against him, people decided to make his name into a dirty word, and seem to have succeeded.


Who was this?


(Rick) Santorum, as coined by Dan Savage, in recognition of his services to the cause of LGB bigotry.


Rick Santorum IIRC


Rick Santorum


easy.

- a word from the 10 commandments: blumpkin

- a great jazz album from Herbie Hancock: two girls one cup

- a James Bond film: donkey punch

- the surname of a US presidential candidate: yiffy


"Huge tits" but not "huge melons", "nigga" but not "niggaz"? Who wrote this thing? I think this is about 0.1% of the "naughty" words out there, and it's futile anyway (former school sys admin here, I know what I'm talking about ;) This is before we get onto the desirability of blindly blocking words like "nigger" which have different meanings depending on who is using them, ref. "my nigger", or "tits", ref. "blue tits are eating the nuts again".


You can check the README. It's from ShutterStock, and it's their list of words to filter image suggestions.

https://github.com/shutterstock/List-of-Dirty-Naughty-Obscen...


Well it's still a very basic effort. If the job it's trying to do is annoy users, it's fine, but if it's trying to limit use of bad language, it mostly fails.


Its trying to prevent accidental autocomplete, not limit usage.


Not to be confused with the big list of naughty strings... https://github.com/minimaxir/big-list-of-naughty-strings


Someone tried to PR a large amount of curse words which I vetoed because it was redundant. Which is why I'm happy to see a more comprehensive list. :)


For the curious, the last entry on the list is the single unicode character:

U+1F595 REVERSED HAND WITH MIDDLE FINGER EXTENDED


There are so many words here about sex, but none that I can see about violence. Sometimes I am so puzzled by American culture (assuming this list was compiled by an American).


my favorite story on this was back in BBS days this one board would change the f word to "gently caressing" and it really changed the tone of heated arguements.


"There was a girl on TV who was talking about..."

What do you mean I'm banned?


~380 lines? What a poverty of imagination.


Why was the swastika included in the Japanese list? While it has pretty bad connotations, particularly in the west, in Japan it retains a pre-World World II meaning of holy and sacred.


Bonus points for the most creative pull requests. Chocolate starfish, rusty sherrif's badge and brown eye are all missing.


Protip: if your list of words doesn't include Carlin's Seven, you're not trying hard enough.

Seriously: "splooge moose" gets an entry but "cocksucker" doesn't? Even Urban Dictionary, usually a canonical source for profane euphemisms, doesn't have a definition for that first one...


I guess the lyrics of Thomas Campion's "I Care Not For These Ladies" are right out, then...


Hard to keep a straight face when singing about "golden showers", I must admit.


I hear Rick santorum has his money tied up in a shrimping operation


I wouldn't blindly use this as a blacklist for spam etc, lots of words here like "vagina" might be OK on a medical site or even some racial slurs if a news agency is reporting a quote etc.

Definitely a good list to use as a starting point.


Was this list written by a guy?


I'm not sure, but the inclusion of "vagina" would not seem to indicate it, as "penis" is included in the list as well. This seems to be a list of sex related words, without additional considerations.


Who uses the word "vagina" for obscene purposes anyway?


Wouldn't vulva be more accurate?


Neo-victorians.


Escort really shouldn't be on there, nor "jelly donut," and words like "hardcore" and "neonazi" are really quite questionable. And yet it's missing words like condom, scrotum, and labia.


George Carlin would be proud...


I think he'd be disappointed. Here is his extended list of dirty words: https://www.youtube.com/watch?v=TSlbEq0roEM#t=35s or https://www.youtube.com/watch?v=N0ee4wqZvf8 (try the closed captions button!)

I'm especially fond of bearded clam, and i've actually used 'gleet' in a poem.


#2 is there in some forms, but #5 is surprisingly absent!


In the PT version we have "bissexual" and "homem gay" (gay man). Bad words? Huh?


This is hilarious, can't be serious. We have also:

- burro (donkey)

- cerveja (beer)

- inferno (hell)

- torneira (water tap)

- frango assado (roasted chicken)

- aranha (spider)


In this repository are also lists of words in other languages, including Esperanto.


I knew more of them in third class... Maybe crawling 4chan with a ANN?!


The people who live in Dildo, Newfoundland will be disappointed.


"alaskan pipeline"?


Just an obvious heads up, this is NSFW.


Pretty sure the description is a warning enough, plus it's just words, not ASCII porn.


Yeah, serious question: Is this really not safe for work?

Are there workplaces where you'd be reprimanded for looking at a list of vulgar words, especially in this context? (More than you would for any other "unproductive" activity, like say, reading HN.) And if it's just about getting flagged by some monitoring software, I'd think this thread would be just about as likely to get you in trouble.

I get that NSFW is inherently subjective and context-dependent, but for me personally this is not NSFW, i.e., totally SFW.


I like pull requests.


wrapping men!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: