ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,223 papers shown
Title
Knowledge-Guided Adaptive Mixture of Experts for Precipitation Prediction
Knowledge-Guided Adaptive Mixture of Experts for Precipitation Prediction
Chen Jiang
Kofi Osei
Sai Deepthi Yeddula
Dongji Feng
Wei-Shinn Ku
41
0
0
14 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
236
1
0
12 Sep 2025
DualTrack: Sensorless 3D Ultrasound needs Local and Global Context
DualTrack: Sensorless 3D Ultrasound needs Local and Global Context
P. Wilson
Matteo Ronchetti
Rüdiger Göbl
Viktoria Markova
Sebastian Rosenzweig
R. Prevost
P. Mousavi
O. Zettinig
52
0
0
11 Sep 2025
SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training
SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-trainingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Rongsheng Wang
Fenghe Tang
Qingsong Yao
Rui Yan
Xu Zhang
...
Haoran Lai
Zhiyang He
Xiaodong Tao
Zihang Jiang
S. Kevin Zhou
MedIm
70
0
0
10 Sep 2025
Parse Graph-Based Visual-Language Interaction for Human Pose Estimation
Parse Graph-Based Visual-Language Interaction for Human Pose Estimation
Shibang Liu
Xuemei Xie
G. Shi
68
0
0
09 Sep 2025
Artificial intelligence for representing and characterizing quantum systems
Artificial intelligence for representing and characterizing quantum systems
Yuxuan Du
Yan Zhu
Y. Zhang
Min-hsiu Hsieh
Patrick Rebentrost
...
Ya-Dong Wu
Jens Eisert
G. Chiribella
Dacheng Tao
B. Sanders
131
3
0
05 Sep 2025
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
Bangxiang Lan
Ruobing Xie
Ruixiang Zhao
Xingwu Sun
Zhanhui Kang
Gang Yang
Xirong Li
76
0
0
05 Sep 2025
Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model
Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model
Phuoc-Nguyen Bui
Khanh-Binh Nguyen
Hyunseung Choo
VLM
224
0
0
04 Sep 2025
Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models
Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models
Hiroshi Sasaki
VLM
72
0
0
02 Sep 2025
Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation
Street-Level Geolocalization Using Multimodal Large Language Models and Retrieval-Augmented Generation
Yunus Serhat Bicakci
Joseph Shingleton
Anahid Basiri
80
0
0
01 Sep 2025
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
Zhenwei Tang
Difan Jiao
Blair Yang
Ashton Anderson
VLMCoGe
114
1
0
25 Aug 2025
Limitations of Normalization in Attention Mechanism
Limitations of Normalization in Attention Mechanism
Timur Mudarisov
Mikhail Burtsev
Tatiana Petrova
Radu State
70
2
0
25 Aug 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
248
6
0
24 Aug 2025
Cross-Attention Multimodal Fusion for Breast Cancer Diagnosis: Integrating Mammography and Clinical Data with Explainability
Cross-Attention Multimodal Fusion for Breast Cancer Diagnosis: Integrating Mammography and Clinical Data with Explainability
Muhaisin Tiyumba Nantogmah
Abdul-Barik Alhassan
Salamudeen Alhassan
78
0
0
21 Aug 2025
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering
Farhaan Ebadulla
Chiraag Mudlapur
Gaurav BV
80
0
0
19 Aug 2025
VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine
VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine
Ziyang Zhang
Yang Yu
Xulei Yang
S. Yeo
VLM
94
0
0
16 Aug 2025
Recent Advances in Transformer and Large Language Models for UAV Applications
Recent Advances in Transformer and Large Language Models for UAV Applications
Hamza Kheddar
Yassine Habchi
Mohamed Chahine Ghanem
Mustapha Hemis
Dusit Niyato
110
2
0
15 Aug 2025
A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering
A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering
Chenliang Zhang
Lin Wang
Yuanyuan Lu
Yusheng Qi
Kexin Wang
P. Hou
Wenshi Chen
RALM
67
0
0
14 Aug 2025
AME: Aligned Manifold Entropy for Robust Vision-Language Distillation
AME: Aligned Manifold Entropy for Robust Vision-Language Distillation
Guiming Cao
Yuming Ou
AAMLVLM
131
2
0
12 Aug 2025
FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning
FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning
Van Duc Cuong
Ta Dinh Tam
Tran Duc Chinh
Nguyen Thi Hanh
64
1
0
10 Aug 2025
Remote Sensing Image Intelligent Interpretation with the Language-Centered Perspective: Principles, Methods and Challenges
Remote Sensing Image Intelligent Interpretation with the Language-Centered Perspective: Principles, Methods and Challenges
Haifeng Li
Wang Guo
Haiyang Wu
Mengwei Wu
Jipeng Zhang
Qing Zhu
Yu Liu
Xin Huang
Chao Tao
106
0
0
09 Aug 2025
Natural Language-Driven Viewpoint Navigation for Volume Exploration via Semantic Block Representation
Natural Language-Driven Viewpoint Navigation for Volume Exploration via Semantic Block Representation
Xuan Zhao
Jun Tao
69
0
0
09 Aug 2025
Adversarial Video Promotion Against Text-to-Video Retrieval
Adversarial Video Promotion Against Text-to-Video Retrieval
Qiwei Tian
Chenhao Lin
Zhengyu Zhao
Qian Li
Shuai Liu
Chao Shen
AAML
103
0
0
09 Aug 2025
Does Multimodality Improve Recommender Systems as Expected? A Critical Analysis and Future Directions
Does Multimodality Improve Recommender Systems as Expected? A Critical Analysis and Future Directions
Hongyu Zhou
Yinan Zhang
Aixin Sun
Zhiqi Shen
80
0
0
07 Aug 2025
Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models
Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models
Phuoc-Nguyen Bui
Khanh-Binh Nguyen
Hyunseung Choo
VLM
169
1
0
07 Aug 2025
RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding
RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding
Tianchen Fang
Guiru Liu
MedImVLM
69
2
0
07 Aug 2025
Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features
Surformer v1: Transformer-Based Surface Classification Using Tactile and Vision Features
Manish Kansana
Elias Hossain
Shahram Rahimi
Noorbakhsh Amiri Golilarz
ViT
57
3
0
07 Aug 2025
Latent Expression Generation for Referring Image Segmentation and Grounding
Latent Expression Generation for Referring Image Segmentation and Grounding
S. Yu
Joonbeom Hong
Joonseok Lee
Jeany Son
ObjD
133
1
0
07 Aug 2025
Multimodal Fact Checking with Unified Visual, Textual, and Contextual Representations
Multimodal Fact Checking with Unified Visual, Textual, and Contextual Representations
Aditya Kishore
Gaurav Kumar
Jasabanta Patro
68
0
0
07 Aug 2025
Chain of Questions: Guiding Multimodal Curiosity in Language Models
Chain of Questions: Guiding Multimodal Curiosity in Language Models
Nima Iji
Kia Dashtipour
LRM
116
0
0
06 Aug 2025
Parameter-Efficient Single Collaborative Branch for Recommendation
Parameter-Efficient Single Collaborative Branch for RecommendationACM Conference on Recommender Systems (RecSys), 2025
Marta Moscati
Shah Nawaz
Markus Schedl
BDL
133
0
0
05 Aug 2025
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou
Alexander Vilesov
Xuehai He
Ziyu Wan
Shuwang Zhang
Aditya Nagachandra
Di Chang
DongDong Chen
Xin Eric Wang
A. Kadambi
VLM
158
0
0
04 Aug 2025
A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
Yi Zhang
Erik Leo Haß
Kuo-Yi Chao
Nenad Petrovic
Yinglei Song
Chengdong Wu
Alois C. Knoll
101
1
0
31 Jul 2025
From Image Captioning to Visual Storytelling
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
150
0
0
31 Jul 2025
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
Pei Deng
Wenqian Zhou
Hanlin Wu
80
0
0
30 Jul 2025
Goal-Based Vision-Language Driving
Goal-Based Vision-Language Driving
Santosh Patapati
Trisanth Srinivasan
103
0
0
30 Jul 2025
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Weide Liu
Wei Zhou
Jun Liu
Ping Hu
Jun Cheng
Jungong Han
Weisi Lin
3DV
159
3
0
30 Jul 2025
Color as the Impetus: Transforming Few-Shot Learner
Color as the Impetus: Transforming Few-Shot Learner
Chaofei Qi
Zhitai Liu
Jianbin Qiu
VLM
191
0
0
29 Jul 2025
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction
A Survey on Generative Model Unlearning: Fundamentals, Taxonomy, Evaluation, and Future Direction
Xiaohua Feng
Jiaming Zhang
Fengyuan Yu
C. Wang
Li Zhang
Kaixiang Li
Yuyuan Li
Chaochao Chen
Jianwei Yin
MU
190
2
0
26 Jul 2025
Closing the Modality Gap for Mixed Modality Search
Closing the Modality Gap for Mixed Modality Search
Binxu Li
Yuhui Zhang
Xiaohan Wang
Weixin Liang
Ludwig Schmidt
Serena Yeung-Levy
VLM
88
4
0
25 Jul 2025
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
Yubin Chen
Xuyang Guo
Zhenmei Shi
Zhao Song
Jiahao Zhang
VGen
535
7
0
24 Jul 2025
Describe Anything Model for Visual Question Answering on Text-rich Images
Describe Anything Model for Visual Question Answering on Text-rich Images
Yen-Linh Vu
Dinh-Thang Duong
Truong-Binh Duong
Anh-Khoi Nguyen
Thanh-Huy Nguyen
...
Jianhua Xing
Xingjian Li
Tianyang Wang
Ulas Bagci
Min Xu
VLM
223
2
0
16 Jul 2025
ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
Zhiyuan Wang
Bokui Chen
VLMLRM
149
0
0
24 Jun 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLMVLM
138
1
0
20 Jun 2025
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong-Jin Liu
SongLi Wu
Sule Bai
Jiahao Wang
Yitong Wang
Yansong Tang
VLMVOS
242
0
0
19 Jun 2025
Understanding GUI Agent Localization Biases through Logit Sharpness
Understanding GUI Agent Localization Biases through Logit Sharpness
Xingjian Tao
Yiwei Wang
Yujun Cai
Zhicheng YANG
Jing Tang
LLMAG
126
4
0
18 Jun 2025
Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation
Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation
Numair Nadeem
Saeed Anwar
Muhammad Asad
Abdul Bais
VLM
184
0
0
16 Jun 2025
Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency
Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency
Hiroshi Tanaka
Anika Rao
Hana Satou
Michael Johnson
Sofia García
115
0
0
15 Jun 2025
Generative or Discriminative? Revisiting Text Classification in the Era of Transformers
Generative or Discriminative? Revisiting Text Classification in the Era of Transformers
Siva Rajesh Kasa
Karan Gupta
Sumegh Roychowdhury
Ashutosh Kumar
Yaswanth Biruduraju
Santhosh Kumar Kasa
Nikhil Pattisapu
Arindam Bhattacharya
Shailendra Agarwal
Vijay huddar
148
2
0
13 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
236
0
0
13 Jun 2025
Previous
12345...434445
Next