Multimodal Hate Speech Detection from Videos and Texts

EasyChair Preprint 10743

12 pages•Date: August 19, 2023

Nishchal Prasad, Sriparna Saha and Pushpak Bhattacharyya

Abstract

Since social media posts also consist of videos with associated comments, and many of these videos or their comments impart hate speech, detecting them in this multimodal setup is crucial. We have focused on the early detection of hate speech in videos by exploiting features from an initial set of comments. We devise Text Video Classifier (TVC), a multimodal hate classifier, based on four modalities which are character, words, sentence, and video frame features, respectively, and develop a Cross Attention Fusion Mechanism (CA-FM) to learn global feature embeddings from the inter-modal features. We report the architectural details and the experiments performed. We use several sampling techniques and train this architecture on a Vine dataset of both video and their comments. Our proposed architectural design attains performance improvement on the models previously constructed on the chosen dataset, for an output probability threshold of 0.5, showing the positive effect of using the CA-FM and TVC.

Keyphrases: Cross Attention Fusion Mechanism, TVC, multimodal

Links:

https://easychair.org/publications/preprint/km3Rv

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:10743,
  author    = {Nishchal Prasad and Sriparna Saha and Pushpak Bhattacharyya},
  title     = {Multimodal Hate Speech Detection from Videos and Texts},
  howpublished = {EasyChair Preprint 10743},
  year      = {EasyChair, 2023}}

Download PDF Open PDF in browser