Multimodal Sentiment Analysis in Indonesian: A Comparative Study of Deep Learning Models for Hate Speech Detection on Social Media

Deep Learning Indonesian Language Social Media

Authors

July 27, 2025
July 27, 2025

Downloads

With the rapid expansion of social media, the prevalence of hate speech has become a critical issue, particularly in the context of Indonesian language and culture. The detection of hate speech in social media platforms is a complex task due to the multimodal nature of online communication, where text, images, and videos are often combined to express sentiments. This study aims to explore and compare deep learning models for multimodal sentiment analysis, focusing on their effectiveness in detecting hate speech in Indonesian social media content. By analyzing both textual and visual data, the study seeks to enhance the accuracy of sentiment classification, specifically identifying instances of hate speech. The research employs several state-of-the-art deep learning models, including Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer-based models, to perform sentiment analysis on a multimodal dataset. The dataset includes text and images from Indonesian social media posts, labeled for hate speech detection. The results show that multimodal models outperform text-only models, with the Transformer-based model yielding the highest accuracy and F1-score in detecting hate speech. The inclusion of visual data significantly improved the model’s ability to classify complex and subtle expressions of hate speech. This study concludes that multimodal deep learning models offer a promising solution for detecting hate speech in Indonesian social media, with implications for better content moderation and online safety.