A Comprehensive Social Bias Audit of Contrastive Vision Language Models

In the domain of text-to-image generative models, biases inherent in training datasets often propagate into generated content, posing significant ethical challenges, particularly in socially sensitive contexts. We introduce FairCoT, a novel framework that enhances fairness in text-to-image models through Chain-of-Thought (CoT) reasoning within multimodal generative large language models. FairCoT employs iterative CoT refinement to systematically mitigate biases, and dynamically adjusts textual prompts in real time, ensuring diverse and equitable representation in generated images. By integrating iterative reasoning processes, FairCoT addresses the limitations of zero-shot CoT in sensitive scenarios, balancing creativity with ethical responsibility. Experimental evaluations across popular text-to-image systems--including DALL-E and various Stable Diffusion variants--demonstrate that FairCoT significantly enhances fairness and diversity without sacrificing image quality or semantic fidelity. By combining robust reasoning, lightweight deployment, and extensibility to multiple models, FairCoT represents a promising step toward more socially responsible and transparent AI-driven content generation.
View on arXiv@article{sahili2025_2501.13223, title={ A Comprehensive Social Bias Audit of Contrastive Vision Language Models }, author={ Zahraa Al Sahili and Ioannis Patras and Matthew Purver }, journal={arXiv preprint arXiv:2501.13223}, year={ 2025 } }