ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.06706
  4. Cited By
Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

20 January 2019
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
    CoGe
ArXivPDFHTML

Papers citing "Visual Entailment: A Novel Task for Fine-Grained Image Understanding"

50 / 229 papers shown
Title
Conditional Prompt Tuning for Multimodal Fusion
Conditional Prompt Tuning for Multimodal Fusion
Ruixia Jiang
Lingbo Liu
Changwen Chen
16
0
0
28 Nov 2023
MultiDelete for Multimodal Machine Unlearning
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng
Hadi Amiri
MU
36
7
0
18 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
28
14
0
15 Nov 2023
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models
Liqiang Jing
Ruosen Li
Yunmo Chen
Mengzhao Jia
Xinya Du
MLLM
11
6
0
02 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
35
35
0
01 Nov 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
28
46
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
  Beyond
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
10
3
0
23 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
11
2
0
08 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
  Pre-trained Models
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
25
36
0
07 Oct 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
Module-wise Adaptive Distillation for Multimodality Foundation Models
Chen Liang
Jiahui Yu
Ming-Hsuan Yang
Matthew A. Brown
Yin Cui
Tuo Zhao
Boqing Gong
Tianyi Zhou
6
10
0
06 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified
  Re-Formulation of Task-Oriented Benchmarks
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
20
13
0
04 Oct 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Z. Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
12
8
0
31 Aug 2023
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
  Heterogeneous Federated Learning
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen
Yao Zhang
Denis Krompass
Jindong Gu
Volker Tresp
FedML
65
39
0
21 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
21
42
0
30 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLM
LRM
18
4
0
15 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavavs
VLM
MLLM
11
29
0
13 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
13
71
0
05 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
17
3
0
03 Jul 2023
Towards Language Models That Can See: Computer Vision Through the LENS
  of Natural Language
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLM
VLM
11
59
0
28 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
33
551
0
23 Jun 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of
  Vision-and-Language Learning
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Shivaen Ramshetty
Gaurav Verma
Srijan Kumar
30
1
0
19 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng-Tao Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
23
158
0
15 Jun 2023
Improving Selective Visual Question Answering by Learning from Your
  Peers
Improving Selective Visual Question Answering by Learning from Your Peers
Corentin Dancette
Spencer Whitehead
Rishabh Maheshwary
Ramakrishna Vedantam
Stefan Scherer
Xinlei Chen
Matthieu Cord
Marcus Rohrbach
AAML
OOD
20
15
0
14 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
12
22
0
09 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
16
0
0
06 Jun 2023
Towards In-context Scene Understanding
Towards In-context Scene Understanding
Ivana Balazevic
David Steiner
Nikhil Parthasarathy
Relja Arandjelović
Olivier J. Hénaff
15
28
0
02 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining
  for Context Assisted Image Captioning
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Abisek Rajakumar Kalarani
P. Bhattacharyya
Niyati Chhaya
Sumit Shekhar
CoGe
VLM
16
9
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
16
0
0
01 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for
  Vision-Language Representation Learning
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFin
VLM
21
2
0
31 May 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
14
0
0
31 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
MPCHAT: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
15
18
0
27 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
14
1
0
24 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
6
4
0
24 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
15
1
0
24 May 2023
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
  Visual Metaphors
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors
Tuhin Chakrabarty
Arkadiy Saakyan
Olivia Winn
Artemis Panagopoulou
Yue Yang
Marianna Apidianaki
Smaranda Muresan
DiffM
19
27
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
17
8
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
35
42
0
24 May 2023
UNIMO-3: Multi-granularity Interaction for Vision-Language
  Representation Learning
UNIMO-3: Multi-granularity Interaction for Vision-Language Representation Learning
Hao-Yu Yang
Can Gao
Hao Liu
Xinyan Xiao
Yanyan Zhao
Bing Qin
18
2
0
23 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
31
72
0
17 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
16
4
0
13 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task
  Explanation Prompts
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao-Yu Yang
Davide Modolo
Z. Tu
Stefano Soatto
VLM
22
2
0
11 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on
  Joint Textual and Visual Clues
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
35
14
0
08 May 2023
Visual Reasoning: from State to Transformation
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
J. Guo
Xueqi Cheng
LRM
6
3
0
02 May 2023
$π$-Tuning: Transferring Multimodal Foundation Models with Optimal
  Multi-task Interpolation
πππ-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Chengyue Wu
Teng Wang
Yixiao Ge
Zeyu Lu
Rui-Zhi Zhou
Ying Shan
Ping Luo
MoMe
70
35
0
27 Apr 2023
Retrieval-based Knowledge Augmented Vision Language Pre-training
Retrieval-based Knowledge Augmented Vision Language Pre-training
Jiahua Rao
Zifei Shan
Long Liu
Yao Zhou
Yuedong Yang
VLM
78
13
0
27 Apr 2023
Efficient Multimodal Fusion via Interactive Prompting
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li
Ruijie Quan
Linchao Zhu
Yezhou Yang
20
42
0
13 Apr 2023
Multi-Modal Representation Learning with Text-Driven Soft Masks
Multi-Modal Representation Learning with Text-Driven Soft Masks
Jaeyoo Park
Bohyung Han
SSL
9
4
0
03 Apr 2023
Exposing and Addressing Cross-Task Inconsistency in Unified
  Vision-Language Models
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
A. Maharana
Amita Kamath
Christopher Clark
Mohit Bansal
Aniruddha Kembhavi
17
3
0
28 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of
  Vision-and-Language Tasks Using Knowledge Distillation
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
CLL
19
11
0
25 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
37
12
0
23 Mar 2023
Previous
12345
Next