ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,232 papers shown
Title
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
941
0
0
11 Apr 2025
TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
Zijian Zhang
Xuhui Zheng
X. Wu
Chong Peng
Xuezhi Cao
186
3
0
10 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical ImagingInternational Journal of Machine Learning and Cybernetics (IJMLC), 2025
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Chen Tang
MedIm
195
4
0
09 Apr 2025
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Batuhan Ozyurt
Roya Arkhmammadova
Deniz Yuret
150
4
0
04 Apr 2025
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles
Chen Wei Kuo
Kevin Chu
Nouar Aldahoul
Hazem Ibrahim
Talal Rahwan
Yasir Zaki
SyDa
377
0
0
04 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
395
37
0
03 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and AttentionInternational Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
351
1
0
03 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric InteractionComputer Vision and Pattern Recognition (CVPR), 2025
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
224
4
0
02 Apr 2025
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning
RefChartQA: Grounding Visual Answer on Chart Images through Instruction TuningIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Alexander Vogel
Omar Moured
Yufan Chen
Kailai Li
Rainer Stiefelhagen
353
4
0
29 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CTRL-O: Language-Controllable Object-Centric Visual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2025
Aniket Didolkar
Antonios Tragoudaras
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCLVLM
405
4
0
27 Mar 2025
VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
Zizhi Chen
Minghao Han
Xukun Zhang
Shuwei Ma
Tao Liu
Xing Wei
Li Zhang
418
0
0
25 Mar 2025
VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
Kelaiti Xiao
Liang Yang
Paerhati Tulajiang
Hongfei Lin
Hongfei Lin
MLLM
324
0
0
25 Mar 2025
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Ziming Wei
Bingqian Lin
Yunshuang Nie
Jiaqi Chen
Shikui Ma
Hang Xu
Xiaodan Liang
460
3
0
23 Mar 2025
A Language Anchor-Guided Method for Robust Noisy Domain Generalization
A Language Anchor-Guided Method for Robust Noisy Domain Generalization
Zilin Dai
Lehong Wang
Fangzhou Lin
Yidong Wang
Zhigang Li
Kazunori D Yamada
Ziming Zhang
Wang Lu
839
2
0
21 Mar 2025
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Pengyu Liu
Guohua Dong
D. Guo
Kun Li
Fengling Li
Xun Yang
Meng Wang
Xiaomin Ying
AI4CE
234
5
0
20 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
351
6
0
18 Mar 2025
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Heng Li
...
Yuxuan Zhou
Yuxuan Zhou
Jingdong Sun
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
290
1
0
18 Mar 2025
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
DPC: Dual-Prompt Collaboration for Tuning Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Haoyang Li
Liang Wang
Chunbai Zhang
Jing Jiang
Yan Peng
Guodong Long
VLM
325
8
0
17 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
276
4
0
17 Mar 2025
Learning Privacy from Visual Entities
Learning Privacy from Visual EntitiesProceedings on Privacy Enhancing Technologies (PoPETs), 2025
Alessio Xompero
Andrea Cavallaro
SSLGNN
256
2
0
16 Mar 2025
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models
Xirui Zhou
Lianlei Shan
Xiaolin Gui
200
16
0
14 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
496
11
0
13 Mar 2025
Can LLMs Understand Time Series Anomalies?
Can LLMs Understand Time Series Anomalies?International Conference on Learning Representations (ICLR), 2024
Zihao Zhou
Rose Yu
AI4TS
373
30
0
13 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Eric Fosler-Lussier
Chenglong Wang
295
2
0
13 Mar 2025
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework
Zhuo Zhi
Chen Feng
Adam Daneshmend
Mine Orlu
Andreas Demosthenous
L. Yin
Da Li
Ziquan Liu
Miguel R. D. Rodrigues
LRM
248
8
0
11 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational EfficiencyIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
260
2
0
10 Mar 2025
Anatomy-Aware Conditional Image-Text Retrieval
Meng Zheng
Jiajin Zhang
Benjamin Planche
Zhongpai Gao
Terrence Chen
Ziyan Wu
MedIm
243
0
0
10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
246
5
0
09 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level CaptionsComputer Vision and Pattern Recognition (CVPR), 2025
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
591
5
0
07 Mar 2025
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database SystemsProceedings of the VLDB Endowment (PVLDB), 2024
Biao Ouyang
Yingying Zhang
Hanyin Cheng
Yang Shu
Chenjuan Guo
Bin Yang
Qingsong Wen
L. Fan
Christian S. Jensen
193
6
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
843
3
0
05 Mar 2025
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang
Jingyu Li
Zhiyu Li
Jingjing Zhang
F. Li
...
Nan Chen
Lei Zhang
Yongdong Zhang
Zhendong Mao
S.Kevin Zhou
389
1
0
03 Mar 2025
Perceptual Visual Quality Assessment: Principles, Methods, and Future Directions
Wei Zhou
Hadi Amirpour
Christian Timmerer
Guoquan Zheng
P. Callet
Alan C. Bovik
245
6
0
01 Mar 2025
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time SeriesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Yanan Niu
Roy Sarkis
D. Psaltis
Mario Paolone
Christophe Moser
Luisa Lambertini
239
3
0
28 Feb 2025
RTGen: Real-Time Generative Detection Transformer
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
Jiying Zhao
Wenhu Chen
ObjDVLM
372
0
0
28 Feb 2025
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving SystemsInternational Conference on Big Data and Smart Computing (BigComp), 2025
Faisal Mohammad
Duksan Ryu
216
0
0
28 Feb 2025
Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in financeIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Sarmistha Das
Basha Mujavarsheik
R E Zera Lyngkhoi
Sriparna Saha
Alka Maurya
133
0
0
26 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
443
2
0
25 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLMLM&MA
367
5
0
24 Feb 2025
Are Large Language Models Good Data Preprocessors?
Are Large Language Models Good Data Preprocessors?The Web Conference (WWW), 2025
Elyas Meguellati
Nardiena A. Pratama
S. Sadiq
Gianluca Demartini
269
2
0
24 Feb 2025
Beyond Pattern Recognition: Probing Mental Representations of LMs
Beyond Pattern Recognition: Probing Mental Representations of LMs
Moritz Miller
Kumar Shridhar
ReLMLRM
244
0
0
23 Feb 2025
Modular Prompt Learning Improves Vision-Language Models
Modular Prompt Learning Improves Vision-Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Zhenhan Huang
Tejaswini Pedapati
Pin-Yu Chen
Jianxi Gao
VLM
113
2
0
21 Feb 2025
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank AdaptationInternational Conference on Multimedia Retrieval (ICMR), 2024
Yuheng Ji
Yue Liu
Zhicheng Zhang
Zhao Zhang
Yuting Zhao
Gang Zhou
Xingwei Zhang
Xinwang Liu
Xiaolong Zheng
VLM
390
4
0
21 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable DecisionsInternational Conference on Web and Social Media (ICWSM), 2025
Ming Shan Hee
Roy Ka-wei Lee
VLM
238
10
0
16 Feb 2025
Handwritten Text Recognition: A Survey
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
295
6
0
12 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive SurveyIEEE Internet of Things Journal (IEEE IoT J.), 2025
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
361
20
0
11 Feb 2025
Foundation Models for Anomaly Detection: Vision and Challenges
Foundation Models for Anomaly Detection: Vision and Challenges
Jing Ren
Tao Tang
Hong Jia
Haytham Fayek
Haytham Fayek
Xiaodong Li
Suyu Ma
Xiwei Xu
Feng Xia
453
2
0
10 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
315
2
0
09 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
278
1
0
09 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video SearchKnowledge Discovery and Data Mining (KDD), 2025
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
D. Yin
276
2
0
09 Feb 2025
Previous
12345...434445
Next