Yin and Yang: Balancing and Answering Binary Visual Questions

16 November 2015

Devi Parikh

Papers citing "Yin and Yang: Balancing and Answering Binary Visual Questions"

49 / 49 papers shown

Title
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA S M Sarwar 66 1 0 25 Feb 2025
MASS: Overcoming Language Bias in Image-Text Matching Jiwan Chung Seungwon Lim Sangkyu Lee Youngjae Yu VLM 30 0 0 20 Jan 2025
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems Zhixian He Pengcheng Zhao Fuwei Zhang Shujin Lin 36 0 0 14 Sep 2024
Causal Understanding For Video Question Answering Bhanu Prakash Reddy Guda Tanmay Kulkarni Adithya Sampath Swarnashree Mysore Sathyendra CML 41 0 0 23 Jul 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering Bo Zou Chao Yang Yu Qiao Chengbin Quan Youjian Zhao VGen 42 1 0 01 Apr 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee Sungjin Park Edward Choi 44 1 0 23 Feb 2024
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA Asmar Nadeem Adrian Hilton R. Dawes Graham A. Thomas A. Mustafa 21 9 0 25 Oct 2023
Evaluating Object Hallucination in Large Vision-Language Models Yifan Li Yifan Du Kun Zhou Jinpeng Wang Wayne Xin Zhao Ji-Rong Wen MLLM LRM 88 691 0 17 May 2023
AlignVE: Visual Entailment Recognition Based on Alignment Relations Biwei Cao Jiuxin Cao Jie Gui Jiayun Shen Bo Liu Lei He Yuan Yan Tang James T. Kwok 18 7 0 16 Nov 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering Pan Lu Swaroop Mishra Tony Xia Liang Qiu Kai-Wei Chang Song-Chun Zhu Oyvind Tafjord Peter Clark A. Kalyan ELM ReLM LRM 211 1,105 0 20 Sep 2022
WildQA: In-the-Wild Video Question Answering Santiago Castro Naihao Deng Pingxuan Huang Mihai Burzo Rada Mihalcea 68 7 0 14 Sep 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks Zhecan Wang Noel Codella Yen-Chun Chen Luowei Zhou Xiyang Dai ... Jianwei Yang Haoxuan You Kai-Wei Chang Shih-Fu Chang Lu Yuan VLM OffRL 23 22 0 22 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations Leonard Salewski A. Sophia Koepke Hendrik P. A. Lensch Zeynep Akata LRM NAI 25 20 0 05 Apr 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios Guangyao Li Yake Wei Yapeng Tian Chenliang Xu Ji-Rong Wen Di Hu 29 136 0 26 Mar 2022
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL) Binoy Saha Sukhendu Das 14 1 0 05 Feb 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks Zhecan Wang Noel Codella Yen-Chun Chen Luowei Zhou Jianwei Yang Xiyang Dai Bin Xiao Haoxuan You Shih-Fu Chang Lu Yuan CLIP VLM 22 39 0 15 Jan 2022
Contrastive Vision-Language Pre-training with Limited Resources Quan Cui Boyan Zhou Yu Guo Weidong Yin Hao Wu Osamu Yoshie Yubo Chen VLM CLIP 13 32 0 17 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning Zhecan Wang Haoxuan You Liunian Harold Li Alireza Zareian Suji Park Yiqing Liang Kai-Wei Chang Shih-Fu Chang ReLM LRM 15 30 0 16 Dec 2021
3D Question Answering Shuquan Ye Dongdong Chen Songfang Han Jing Liao ViT 24 46 0 15 Dec 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning Pan Lu Liang Qiu Jiaqi Chen Tony Xia Yizhou Zhao Wei Zhang Zhou Yu Xiaodan Liang Song-Chun Zhu AIMat 28 183 0 25 Oct 2021
Multimodal Integration of Human-Like Attention in Visual Question Answering Ekta Sood Fabian Kögel Philippe Muller Dominike Thomas Mihai Bâce Andreas Bulling 33 16 0 27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering Ekta Sood Fabian Kögel Florian Strohm Prajit Dhar Andreas Bulling 29 19 0 27 Sep 2021
On Semantic Similarity in Video Retrieval Michael Wray Hazel Doughty Dima Damen 21 66 0 18 Mar 2021
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions Sebastian Bujwid Josephine Sullivan VLM 23 28 0 17 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA Yonatan Bitton Gabriel Stanovsky Roy Schwartz Michael Elhadad CoGe 17 33 0 17 Mar 2021
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering Bo Liu Li-Ming Zhan Li Xu Lin Ma Y. Yang Xiao-Ming Wu 19 234 0 18 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach Y. Liu Yangyang Guo Jianhua Yin Xuemeng Song Weifeng Liu Liqiang Nie 24 28 0 03 Feb 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts Yuxian Meng Shuhe Wang Qinghong Han Xiaofei Sun Fei Wu Rui Yan Jiwei Li 16 28 0 30 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering Pratyay Banerjee Tejas Gokhale Yezhou Yang Chitta Baral 17 34 0 04 Dec 2020
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles Christopher Clark Mark Yatskar Luke Zettlemoyer 18 61 0 07 Nov 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering Aisha Urooj Khan Amir Mazaheri N. Lobo M. Shah 24 56 0 27 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review Wei-Neng Chen Weiping Wang Li Liu M. Lew VLM 110 31 0 16 Oct 2020
A Study of Compositional Generalization in Neural Models Tim Klinger D. Adjodah Vincent Marois Joshua Joseph Matthew D Riemer Alex Pentland Murray Campbell CoGe NAI 10 12 0 16 Jun 2020
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing Vedika Agarwal Rakshith Shetty Mario Fritz CML AAML 21 155 0 16 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines Jingxiang Lin Unnat Jain A. Schwing LRM ReLM 26 9 0 31 Oct 2019
Fusion of Detected Objects in Text for Visual Question Answering Chris Alberti Jeffrey Ling Michael Collins David Reitter 6 173 0 14 Aug 2019
Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference Yonatan Belinkov Adam Poliak Stuart M. Shieber Benjamin Van Durme Alexander M. Rush 19 94 0 09 Jul 2019
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization S. Ramakrishnan Aishwarya Agrawal Stefan Lee AAML 20 235 0 08 Oct 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features Chiori Hori Huda AlAmri Jue Wang G. Wichern Takaaki Hori ... Raphael Gontijo-Lopes Abhishek Das Irfan Essa Dhruv Batra Devi Parikh VGen 16 125 0 21 Jun 2018
Did the Model Understand the Question? Pramod Kaushik Mudrakarta Ankur Taly Mukund Sundararajan Kedar Dhamdhere ELM OOD FAtt 27 196 0 14 May 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering Aishwarya Agrawal Dhruv Batra Devi Parikh Aniruddha Kembhavi OOD 51 581 0 01 Dec 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge Damien Teney Peter Anderson Xiaodong He A. Hengel 45 380 0 09 Aug 2017
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets Wei-Lun Chao Hexiang Hu Fei Sha 22 37 0 24 Apr 2017
An Analysis of Visual Question Answering Algorithms Kushal Kafle Christopher Kanan 19 230 0 28 Mar 2017
Who is Mistaken? Benjamin Eysenbach Carl Vondrick Antonio Torralba 19 15 0 04 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 99 3,116 0 02 Dec 2016
FVQA: Fact-based Visual Question Answering Peng Wang Qi Wu Chunhua Shen Anton van den Hengel A. Dick CoGe 33 453 0 17 Jun 2016
Adversarial Feature Learning Jiasen Lu Philipp Krahenbuhl Trevor Darrell GAN 13 1,824 0 31 May 2016
VQA: Visual Question Answering Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. L. Zitnick Dhruv Batra Devi Parikh CoGe 40 5,362 0 03 May 2015