ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03557
  4. Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
    VLM
ArXiv (abs)PDFHTML

Papers citing "VisualBERT: A Simple and Performant Baseline for Vision and Language"

50 / 1,256 papers shown
Title
Chameleon: Fast-slow Neuro-symbolic Lane Topology ExtractionIEEE International Conference on Robotics and Automation (ICRA), 2025
Zongzheng Zhang
Xinrun Li
Sizhe Zou
Guoxuan Chi
Siqi Li
...
Guoliang Wang
Guantian Zheng
Leichen Wang
Hang Zhao
Hao Zhao
305
9
0
10 Mar 2025
Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations
Khoi Anh Nguyen
Linh Yen Vu
Thang Dinh Duong
Thuan Nguyen Duong
Huy Thanh Nguyen
V. Q. Dinh
192
4
0
05 Mar 2025
Vision-Language Model IP Protection via Prompt-based LearningComputer Vision and Pattern Recognition (CVPR), 2025
Lianyu Wang
Ming Wang
Huazhu Fu
Daoqiang Zhang
VLM
361
0
0
04 Mar 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
350
2
0
25 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLMLM&MA
339
5
0
24 Feb 2025
ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval SystemsThe Web Conference (WWW), 2025
Haibo Xing
Kanefumi Matsuyama
Hao Deng
Jinxin Hu
Yu Zhang
Xiaoyi Zeng
249
4
0
22 Feb 2025
Learning Generalizable Prompt for CLIP with Class Similarity Knowledge
Learning Generalizable Prompt for CLIP with Class Similarity Knowledge
Sehun Jung
Hyang-won Lee
VLMVPVLM
207
2
0
17 Feb 2025
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding
Kimia Ramezan
Alireza Amiri Bavandpour
Yifei Yuan
Clemencia Siro
Mohammad Aliannejadi
153
0
0
17 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable DecisionsInternational Conference on Web and Social Media (ICWSM), 2025
Ming Shan Hee
Roy Ka-wei Lee
VLM
206
9
0
16 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive SurveyIEEE Internet of Things Journal (IEEE IoT J.), 2025
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
313
3
0
11 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video SearchKnowledge Discovery and Data Mining (KDD), 2025
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
D. Yin
256
2
0
09 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
295
2
0
09 Feb 2025
Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection
Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection
Zehong Yan
Peng Qi
Wynne Hsu
Yang Deng
247
0
0
24 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
MASS: Overcoming Language Bias in Image-Text MatchingAAAI Conference on Artificial Intelligence (AAAI), 2025
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
185
0
0
20 Jan 2025
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical ClassificationInternational Conference on Computational Linguistics (COLING), 2025
Shijing Chen
Mohamed Reda Bouadjenek
Shoaib Jameel
Usman Naseem
Basem Suleiman
Flora D. Salim
Hakim Hacid
Imran Razzak
156
3
0
12 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
414
32
0
06 Jan 2025
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi
Shivam Sharma
Tanmoy Chakraborty
174
4
0
31 Dec 2024
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking
  in Escort-Advertisement Data
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data
V. Saxena
Benjamin Bashpole
Gijs van Dijck
Gerasimos Spanakis
205
0
0
18 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Bringing Multimodality to Amazon Visual Search SystemKnowledge Discovery and Data Mining (KDD), 2024
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
229
7
0
17 Dec 2024
Does VLM Classification Benefit from LLM Description Semantics?
Does VLM Classification Benefit from LLM Description Semantics?AAAI Conference on Artificial Intelligence (AAAI), 2024
Pingchuan Ma
Lennart Rietdorf
Dmytro Kotovenko
Vincent Tao Hu
Bjorn Ommer
VLM
338
2
0
16 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
331
5
0
13 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Unified Framework for Open-World Compositional Zero-shot LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
230
1
0
05 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
554
1
0
04 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models
  Assessment for Traffic Monitoring Tasks
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
332
3
0
02 Dec 2024
Exploring Large Vision-Language Models for Robust and Efficient
  Industrial Anomaly Detection
Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection
Kun Qian
Tianyu Sun
Wenhong Wang
206
1
0
01 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal
  Alignment
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal AlignmentComputer Vision and Pattern Recognition (CVPR), 2024
Yan Li
Yifei Xing
X. Lan
Xuzhao Li
Haifeng Chen
Shihong Deng
Mamba
240
16
0
01 Dec 2024
MIMIC: Multimodal Islamophobic Meme Identification and Classification
MIMIC: Multimodal Islamophobic Meme Identification and Classification
Safrin Sanzida Islam
Sahid Hossain Mustakim
Sadia Ahmmed
Md. Faiyaz Abdullah Sayeedi
Swapnil Khandoker
Syed Tasdid Azam Dhrubo
Nahid Md Lokman Hossain
182
1
0
01 Dec 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
131
1
0
30 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object
  Interaction Analysis
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLMVLM
304
1
0
27 Nov 2024
Enhancing Few-Shot Out-of-Distribution Detection with Gradient Aligned
  Context Optimization
Enhancing Few-Shot Out-of-Distribution Detection with Gradient Aligned Context Optimization
Baoshun Tong
Kaiyu Song
Hanjiang Lai
OODD
178
1
0
24 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning ScenariosNeural Information Processing Systems (NeurIPS), 2024
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLMLRM
315
4
0
20 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
229
8
0
17 Nov 2024
Prompt-enhanced Network for Hateful Meme Classification
Prompt-enhanced Network for Hateful Meme ClassificationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
280
3
0
12 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
117
1
0
11 Nov 2024
Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as
  Alternative Annotators
Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as Alternative Annotators
Claire Jo
Miki Wesołowska
Magdalena Wojcieszak
212
7
0
06 Nov 2024
Multimodal Commonsense Knowledge Distillation for Visual Question
  Answering
Multimodal Commonsense Knowledge Distillation for Visual Question Answering
Shuo Yang
Siwen Luo
S. Han
LRM
91
1
0
05 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
229
1
0
02 Nov 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
553
14
0
27 Oct 2024
MAD-Sherlock: Multi-Agent Debate for Visual Misinformation Detection
MAD-Sherlock: Multi-Agent Debate for Visual Misinformation Detection
Kumud Lakara
Juil Sock
Christian Rupprecht
Juil Sock
Philip Torr
John Collomosse
Christian Schroeder de Witt
196
5
0
26 Oct 2024
A Survey of Multimodal Sarcasm Detection
A Survey of Multimodal Sarcasm DetectionInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Shafkat Farabi
Tharindu Ranasinghe
Helen Treharne
Yu Kong
Marcos Zampieri
163
14
0
24 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
191
3
0
24 Oct 2024
Exploiting Text-Image Latent Spaces for the Description of Visual
  Concepts
Exploiting Text-Image Latent Spaces for the Description of Visual ConceptsInternational Conference on Pattern Recognition (ICPR), 2024
Laines Schmalwasser
J. Gawlikowski
Joachim Denzler
Julia Niebling
131
3
0
23 Oct 2024
Reducing Hallucinations in Vision-Language Models via Latent Space
  Steering
Reducing Hallucinations in Vision-Language Models via Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Zou
VLMLLMSV
289
33
0
21 Oct 2024
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Fahim
Fabiha Haider
Fariha Tanjim Shifat
Md Tasmim Rahman Adib
Anam Borhan Uddin
Md Farhan Ishmam
Md Farhad Alam
178
1
0
19 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using
  Transformer-based Method in Vietnamese Text-based Visual Question Answering
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question AnsweringPacific Asia Conference on Language, Information and Computation (PACLIC), 2024
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
195
0
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLMCoGeReLMVLMLRM
187
1
0
17 Oct 2024
Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes
Seeing Through VisualBERT: A Causal Adventure on Memetic LandscapesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Dibyanayan Bandyopadhyay
Mohammed Hasanuzzaman
Asif Ekbal
AAML
221
5
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-TrainingACM Multimedia (ACM MM), 2022
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
297
9
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic ModelingComputer Vision and Pattern Recognition (CVPR), 2024
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
241
8
0
14 Oct 2024
Leveraging Customer Feedback for Multi-modal Insight Extraction
Leveraging Customer Feedback for Multi-modal Insight ExtractionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sandeep Sricharan Mukku
Abinesh Kanagarajan
Pushpendu Ghosh
Chetan Aggarwal
158
0
0
13 Oct 2024
Previous
123456...242526
Next