Hate Language Wanted

by Translation Guy on April 2, 2013

Burma Arakan conflict

Hatebase is recruiting linguists with good hate language skills to contribute to their multilingual hate language glossary.

By ferreting out all the haters on the World Wide Web by the language they use, Hatebase’s sponsors hope to map hate speech on the Internet in order to predict impending ethnic violence.

It’s part of the Sentinel Project to build an early warning system to identify communities at risk of genocide.

Hatebase developer Timothy Quinn explained to Foreign Policy blogger Joshua Keating, “The real value is the sightings. As soon as you have logged incidents of hate speech you can start mapping that stuff, looking at frequency, severity, the migration of terms geographically. There’s a whole lot of value when people start mapping it against the real world.”

According to their website, Hatebase is now the world’s largest online repository of structured, multilingual, usage-based hate speech.  Tim tells me they’ve got around a thousand terms so far, with three-quarters or so in English. That’s because this initiative is English-speaker dominated, and because of the rich diversity of hate language available in this language we share.

Any language pro can tell you that for a domain as large as hate, a term base of only a thousand words is way too small. And whenever you have proscribed speech, variants will abound. There is a lot more hate out there, folks.

Linguists with local hate knowledge are needed to address this deficiency. Contributing to the database of hate is a snap and it takes but a moment to sign up. The form has fields for the hate term itself, pronunciation, meaning, language and type of hate.  Types of hate include language about ethnicity, or sexual orientation, or class or disability.

By mapping the location and frequency of hate speech on the Web, researchers will be able to detect spikes in hatred against the high ambient level of web and social media hate speech. The hope is that this will provide an early warning system for areas where genocidal violence is about to break out.

“In a nutshell, we’re trying to create “soft” data points that have meaning when meshed together with other axes. So for instance, someone defining “cockroach” as a Tutsi (as happened in Rwanda in the early ‘90s) in and of itself isn’t a strong predictor for genocide. But when you start logging multiple sightings within a demarcated region over a compressed period of time, that vocabulary starts to become more meaningful. Of course, it could still be that there’s an actual cockroach infestation being discussed — so the data now needs to connect with external systems and processes,” says Tim.  “So if you’re an NGO who’s monitoring a dozen different confirmed conflict zones and all of a sudden you see a regionalized spike in ANY vocabulary (even something that could have a benign second meaning), that’s a strong indicator that something’s going on which deserves further study.”

Current languages of interest are those spoken in Kenya and Burma, the latest hotspots for ethnic violence over the last couple of weeks.  Right now in Burma, the Rohingya are the targets of Burman ethnic cleansers, but since that conflict is occurring mostly offline, it must be tough to detect patterns.

Sadly, Hatebase can only grow, as the Web and social media are likely to become much more powerful tools for mobilizing haters in the future.  A little counter-mobilizing is in order.  Please spread the word.

Here’s a breakout of hate speech on the Web by type as compiled by Hatebase:



  1. Finlay Laube says:

    If you coupled kind of thing with stock market variations or economic data, I think you would have a more accurate idea as to real hot spots for violence, as poor economic conditions plus growing ethnic problems usually equal violence.

  2. Trish says:

    How much of this can actually be monitored online, considering these kinds of conflicts tend to happen in third world countries with a lack of infrastucture like widespread internet access.

  3. The problem is that, the data will show potential genocides all over the planet, so which one do you focus on?

  4. This right here is a great idea.

  5. I’m surprised that there is so little hate speech based on sexual orientation, considering the stuff I find in comment sections on the internet.

  6. Matt says:

    I saw this on FP and signed up, I think it’s a fabulous idea.

  7. Luc Allgut says:

    I imagine that if you looked at the data, you would find a surprising amount of this stuff originating in Western countries that won’t necessarily result in ethnic violence.

  8. It has its failings, but overal I think that it could be a good tool to compliment other monitoring techniques.

  9. Tiffani says:

    How do they find the hatespeech? And where are they looking? Twitter, facebook, youtbe, forums, comment sections?

  10. Well, good luck to them, hopefully we never have to know whether it works or not.

  11. Laurie says:

    It’s a remarkably well intentioned project, but keeping up with what is essentially slang is an incredibly hard thing to do, I’m can’t help feeling that they might always be a step behind.

    • Ken says:

      Keep your algorithms crossed that The software will work when it needs to.

  12. Good to see these kinds of projects being done, although I’m not sure how effective this will be, unfortunately I guess it will take another genocide to determine its effectivness.

    • Ken says:

      That would be un-American, Maureen.

LiveZilla Live Chat Software