The integration of Large Language Models (LLMs) and Vision-Language Models (VLMs) opens new avenues for addressing complex challenges in multimodal content analysis, particularly in biased news detection. This study introduces VLBias, a framework that leverages state-of-the-art LLMs and VLMs to detect linguistic and visual biases in news content. We present a multimodal dataset comprising textual content and corresponding images from diverse news sources. We propose a hybrid annotation framework that combines LLM-based annotations with human review to ensure high-quality labeling while reducing costs and enhancing scalability. Our evaluation compares the performance of state-of-the-art SLMs and LLMs for both modalities (text and images) and the results reveal that while SLMs are computationally efficient, LLMs demonstrate superior accuracy in identifying subtle framing and text-visual inconsistencies. Furthermore, empirical analysis shows that incorporating visual cues alongside textual data improves bias detection accuracy by 3 to 5%. This study provides a comprehensive exploration of LLMs, SLMs, and VLMs as tools for detecting multimodal biases in news content and highlights their respective strengths, limitations, and potential for future applications
View on arXiv@article{raza2025_2412.17052, title={ VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges }, author={ Shaina Raza and Caesar Saleh and Emrul Hasan and Franklin Ogidi and Maximus Powers and Veronica Chatrath and Marcelo Lotif and Roya Javadi and Anam Zahid and Vahid Reza Khazaie }, journal={arXiv preprint arXiv:2412.17052}, year={ 2025 } }