Hatebase is recruiting linguists with good hate language skills to contribute to their multilingual hate language glossary.
By ferreting out all the haters on the World Wide Web by the language they use, Hatebase’s sponsors hope to map hate speech on the Internet in order to predict impending ethnic violence.
It’s part of the Sentinel Project to build an early warning system to identify communities at risk of genocide.
Hatebase developer Timothy Quinn explained to Foreign Policy blogger Joshua Keating, “The real value is the sightings. As soon as you have logged incidents of hate speech you can start mapping that stuff, looking at frequency, severity, the migration of terms geographically. There’s a whole lot of value when people start mapping it against the real world.”
According to their website, Hatebase is now the world’s largest online repository of structured, multilingual, usage-based hate speech. Tim tells me they’ve got around a thousand terms so far, with three-quarters or so in English. That’s because this initiative is English-speaker dominated, and because of the rich diversity of hate language available in this language we share.
Any language pro can tell you that for a domain as large as hate, a term base of only a thousand words is way too small. And whenever you have proscribed speech, variants will abound. There is a lot more hate out there, folks.
Linguists with local hate knowledge are needed to address this deficiency. Contributing to the database of hate is a snap and it takes but a moment to sign up. The form has fields for the hate term itself, pronunciation, meaning, language and type of hate. Types of hate include language about ethnicity, or sexual orientation, or class or disability.
By mapping the location and frequency of hate speech on the Web, researchers will be able to detect spikes in hatred against the high ambient level of web and social media hate speech. The hope is that this will provide an early warning system for areas where genocidal violence is about to break out.
“In a nutshell, we’re trying to create “soft” data points that have meaning when meshed together with other axes. So for instance, someone defining “cockroach” as a Tutsi (as happened in Rwanda in the early ‘90s) in and of itself isn’t a strong predictor for genocide. But when you start logging multiple sightings within a demarcated region over a compressed period of time, that vocabulary starts to become more meaningful. Of course, it could still be that there’s an actual cockroach infestation being discussed — so the data now needs to connect with external systems and processes,” says Tim. “So if you’re an NGO who’s monitoring a dozen different confirmed conflict zones and all of a sudden you see a regionalized spike in ANY vocabulary (even something that could have a benign second meaning), that’s a strong indicator that something’s going on which deserves further study.”
Current languages of interest are those spoken in Kenya and Burma, the latest hotspots for ethnic violence over the last couple of weeks. Right now in Burma, the Rohingya are the targets of Burman ethnic cleansers, but since that conflict is occurring mostly offline, it must be tough to detect patterns.
Sadly, Hatebase can only grow, as the Web and social media are likely to become much more powerful tools for mobilizing haters in the future. A little counter-mobilizing is in order. Please spread the word.
Here’s a breakout of hate speech on the Web by type as compiled by Hatebase: