252
v1v2v3 (latest)

BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

Main:10 Pages
11 Figures
Bibliography:2 Pages
15 Tables
Appendix:13 Pages
Abstract

Social media platforms face escalating challenges in detecting harmful content that promotes muscle dysmorphic behaviors and cognitions (bigorexia). This content can evade moderation by camouflaging as legitimate fitness advice and disproportionately affects adolescent males. We address this challenge with BigTokDetect, a clinically informed framework for identifying pro-bigorexia content on TikTok. We introduce BigTok, the first expert-annotated multimodal benchmark dataset of over 2,200 TikTok videos labeled by clinical psychiatrists across five categories and eighteen fine-grained subcategories. Comprehensive evaluation of state-of-the-art vision-language models reveals that while commercial zero-shot models achieve the highest accuracy on broad primary categories, supervised fine-tuning enables smaller open-source models to perform better on fine-grained subcategory detection. Ablation studies show that multimodal fusion improves performance by 5 to 15 percent, with video features providing the most discriminative signals. These findings support a grounded moderation approach that automates detection of explicit harms while flagging ambiguous content for human review, and they establish a scalable framework for harm mitigation in emerging mental health domains.

View on arXiv
Comments on this paper