Developing an online hate classifier for multiple social media platforms

Maximilian Hopf; Shammur A. Chowdhury; Joni Salminen; Hind Almerekhi; Bernard Jansen; Soon-gyo Jung

Developing an online hate classifier for multiple social media platforms

Maximilian Hopf; Shammur A. Chowdhury; Joni Salminen; Hind Almerekhi; Bernard Jansen; Soon-gyo Jung

Developing an online hate classifier for multiple social media platforms

Maximilian Hopf

Shammur A. Chowdhury

Joni Salminen

Hind Almerekhi

Bernard Jansen

Soon-gyo Jung

Katso/Avaa

Publisher's version (1.866Mb)

Lataukset:

Springer

doi:10.1186/s13673-019-0205-6

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021042824119

Tiivistelmä

The proliferation of social media enables people to express their
opinions widely online. However, at the same time, this has resulted in
the emergence of conflict and hate, making online environments
uninviting for users. Although researchers have found that hate is a
problem across multiple platforms, there is a lack of models for online
hate detection using multi-platform data. To address this research gap,
we collect a total of 197,566 comments from four platforms: YouTube,
Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as
non-hateful and the remaining 20% labeled as hateful. We then experiment
with several classification algorithms (Logistic Regression, Naïve
Bayes, Support Vector Machines, XGBoost, and Neural Networks) and
feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their
combination). While all the models significantly outperform the
keyword-based baseline classifier, XGBoost using all features performs
the best (F1 = 0.92). Feature importance analysis indicates that BERT
features are the most impactful for the predictions. Findings support
the generalizability of the best model, as the platform-specific results
from Twitter and Wikipedia are comparable to their respective source
papers. We make our code publicly available for application in real
software systems as well as for further development by online hate
researchers.

Kokoelmat

Rinnakkaistallenteet [19207]