Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,088 papers shown
Title
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
88
0
0
04 Dec 2024
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis
Hao-Yu Yang
Zhenyu Zhang
Yanyan Zhao
Bing Qin
69
0
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
64
1
0
02 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li
Yifei Xing
X. Lan
X. Li
Haifeng Chen
D. Jiang
Mamba
71
0
0
01 Dec 2024
MIMIC: Multimodal Islamophobic Meme Identification and Classification
Safrin Sanzida Islam
Sahid Hossain Mustakim
Sadia Ahmmed
Md. Faiyaz Abdullah Sayeedi
Swapnil Khandoker
Syed Tasdid Azam Dhrubo
Nahid Md Lokman Hossain
64
0
0
01 Dec 2024
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation
Yiyuan Pan
Yunzhe Xu
Zhe Liu
Hesheng Wang
LM&Ro
73
0
0
30 Nov 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
64
0
0
30 Nov 2024
LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation
Huadong Tang
Youpeng Zhao
Y. Huang
Min Xu
Jun Wang
Qiang Wu
MLLM
VLM
78
0
0
30 Nov 2024
SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment
Jie Wang
Yichen Wang
Zhilin Zhang
Jianhao Zeng
Kaidi Wang
Zhiyang Chen
62
0
0
27 Nov 2024
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
88
3
0
22 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
47
1
0
15 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
34
0
0
13 Nov 2024
Prompt-enhanced Network for Hateful Meme Classification
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
53
0
0
12 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
19
0
0
11 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
19
0
0
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
25
0
0
09 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
26
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
59
1
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
21
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
32
1
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
W. Liu
X. Wang
VLM
MLLM
LRM
35
12
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
46
1
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
47
5
0
28 Oct 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
33
1
0
27 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
28
0
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
21
0
0
23 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
26
0
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
63
4
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
29
0
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
52
9
0
16 Oct 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
L. Chen
Hexiang Hu
Mingda Zhang
Y. Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming Yang
Boqing Gong
26
2
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
2
0
14 Oct 2024
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen
Jianfei Yang
28
1
0
14 Oct 2024
Leveraging Customer Feedback for Multi-modal Insight Extraction
Sandeep Sricharan Mukku
Abinesh Kanagarajan
Pushpendu Ghosh
Chetan Aggarwal
27
0
0
13 Oct 2024
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
Maksim Kuznetsov
Airat Valiev
Alex Aliper
Daniil Polykovskiy
E. Tutubalina
Rim Shayakhmetov
Z. Miftahutdinov
20
0
0
11 Oct 2024
A Social Context-aware Graph-based Multimodal Attentive Learning Framework for Disaster Content Classification during Emergencies
Shahid Shafi Dar
Mohammad Zia Ur Rehman
Karan Bais
Mohammed Abdul Haseeb
Nagendra Kumara
22
10
0
11 Oct 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan Yu
Tianxiao Li
Yuxin Zhu
Rongze Pan
33
0
0
10 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
30
0
0
10 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
26
0
0
10 Oct 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
31
4
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
35
2
0
09 Oct 2024
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li
Xinwei Zhang
Peilin Zhong
Yuan Deng
Meisam Razaviyayn
Vahab Mirrokni
15
2
0
09 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
34
0
0
04 Oct 2024
Multi-modal clothing recommendation model based on large model and VAE enhancement
Bingjie Huang
Qingyi Lu
Shuaishuai Huang
Xue-she Wang
Haowei Yang
29
3
0
03 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
30
0
0
01 Oct 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
34
1
0
30 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
21
1
0
28 Sep 2024
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng
Wenjie Luo
Yiren Lu
Tianyi Shen
Cole Gulino
Ari Seff
Justin Fu
26
6
0
26 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
26
2
0
26 Sep 2024
Previous
1
2
3
4
5
6
...
40
41
42
Next