ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,231 papers shown
Title
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
568
4
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer
  Vision
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
98
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
An Information Criterion for Controlled Disentanglement of Multimodal DataInternational Conference on Learning Representations (ICLR), 2024
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
336
6
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLMMLLMLRM
225
71
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
250
5
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLMLRM
242
9
0
28 Oct 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
557
16
0
27 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
207
3
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language TuningInternational Journal of Computer Vision (IJCV), 2024
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
255
3
0
23 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using
  Transformer-based Method in Vietnamese Text-based Visual Question Answering
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question AnsweringPacific Asia Conference on Language, Information and Computation (PACLIC), 2024
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
199
0
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language AlignmentInternational Conference on Learning Representations (ICLR), 2024
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
572
12
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLMCoGeReLMVLMLRM
203
1
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-TrainingACM Multimedia (ACM MM), 2022
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
337
9
0
16 Oct 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across
  Modalities
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Lawrence Yunliang Chen
Hexiang Hu
Ruotong Wang
Yiran Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming-Hsuan Yang
Boqing Gong
133
8
0
16 Oct 2024
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human SensingInternational Conference on Learning Representations (ICLR), 2024
Xinyan Chen
Jianfei Yang
326
9
0
14 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic ModelingComputer Vision and Pattern Recognition (CVPR), 2024
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
257
10
0
14 Oct 2024
Leveraging Customer Feedback for Multi-modal Insight Extraction
Leveraging Customer Feedback for Multi-modal Insight ExtractionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sandeep Sricharan Mukku
Abinesh Kanagarajan
Pushpendu Ghosh
Chetan Aggarwal
162
0
0
13 Oct 2024
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
nach0-pc: Multi-task Language Model with Molecular Point Cloud EncoderAAAI Conference on Artificial Intelligence (AAAI), 2024
Maksim Kuznetsov
Airat Valiev
Alex Aliper
Daniil Polykovskiy
E. Tutubalina
Rim Shayakhmetov
Z. Miftahutdinov
185
3
0
11 Oct 2024
A social context-aware graph-based multimodal attentive learning framework for disaster content classification during emergencies: a benchmark dataset and method
A social context-aware graph-based multimodal attentive learning framework for disaster content classification during emergencies: a benchmark dataset and methodExpert systems with applications (ESWA), 2024
Shahid Shafi Dar
Mohammad Zia Ur Rehman
Karan Bais
Mohammed Abdul Haseeb
Nagendra Kumara
159
19
0
11 Oct 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A
  Comprehensive Survey
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan Yu
Tianxiao Li
Yuxin Zhu
Rongze Pan
208
4
0
10 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal
  Representation Inference
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation InferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
155
0
0
10 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent
  Representations
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
127
1
0
10 Oct 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large
  Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
203
9
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
200
3
0
09 Oct 2024
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and
  Performance of SGD for Fine-Tuning Language Models
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Zeman Li
Xinwei Zhang
Peilin Zhong
Yuan Deng
Meisam Razaviyayn
Vahab Mirrokni
274
8
0
09 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document
  Understanding Models
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
252
4
0
04 Oct 2024
Multi-modal clothing recommendation model based on large model and VAE
  enhancement
Multi-modal clothing recommendation model based on large model and VAE enhancementArtificial Intelligence and Cloud Computing Conference (AICC), 2024
Bingjie Huang
Qingyi Lu
Shuaishuai Huang
Xue-she Wang
Haowei Yang
219
7
0
03 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
341
1
0
01 Oct 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Towards Open-Vocabulary Semantic Segmentation Without Semantic LabelsNeural Information Processing Systems (NeurIPS), 2024
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
231
9
0
30 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image CaptioningAsian Conference on Computer Vision (ACCV), 2024
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
232
6
0
28 Sep 2024
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Improving Agent Behaviors with RL Fine-tuning for Autonomous DrivingEuropean Conference on Computer Vision (ECCV), 2024
Zhenghao Peng
Wenjie Luo
Yiren Lu
Tianyi Shen
Cole Gulino
Ari Seff
Justin Fu
167
22
0
26 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in
  Cold-Start and Missing Modality Scenarios
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality ScenariosACM Conference on Recommender Systems (RecSys), 2024
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
197
16
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
414
20
0
26 Sep 2024
PASS: Path-selective State Space Model for Event-based Recognition
PASS: Path-selective State Space Model for Event-based Recognition
Jiazhou Zhou
Kanghao Chen
Lei Zhang
Lin Wang
307
0
0
25 Sep 2024
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic
  Segmentation
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic SegmentationEuropean Conference on Computer Vision (ECCV), 2024
Soojin Jang
Jungmin Yun
Junehyoung Kwon
Eunju Lee
Youngbin Kim
325
7
0
24 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
434
9
0
23 Sep 2024
LARE: Latent Augmentation using Regional Embedding with Vision-Language
  Model
LARE: Latent Augmentation using Regional Embedding with Vision-Language ModelMachine Learning with Applications (MLWA), 2024
Kosuke Sakurai
Tatsuya Ishii
Ryotaro Shimizu
Linxin Song
Masayuki Goto
VLM
220
1
0
19 Sep 2024
Multi-Cohort Framework with Cohort-Aware Attention and Adversarial
  Mutual-Information Minimization for Whole Slide Image Classification
Multi-Cohort Framework with Cohort-Aware Attention and Adversarial Mutual-Information Minimization for Whole Slide Image Classification
Sharon Peled
Y. Maruvka
Moti Freiman
179
1
0
17 Sep 2024
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli
Di Liu
Abhishek Aich
Dimitris Metaxas
S. Schulter
183
1
0
15 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple
  Operators for Forecasting Fluid Dynamics
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
211
19
0
15 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGeVLM
186
1
0
12 Sep 2024
Recent Trends of Multimodal Affective Computing: A Survey from NLP
  Perspective
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Guimin Hu
Yi Xin
Weimin Lyu
Haojian Huang
Chang Sun
Zehan Zhu
Lin Gui
Ruichu Cai
Erik Cambria
Hasti Seifi
320
15
0
11 Sep 2024
What to align in multimodal contrastive learning?
What to align in multimodal contrastive learning?International Conference on Learning Representations (ICLR), 2024
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
305
26
0
11 Sep 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large
  Language Model
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLMLRM
163
14
0
10 Sep 2024
VidLPRO: A $\underline{Vid}$eo-$\underline{L}$anguage
  $\underline{P}$re-training Framework for $\underline{Ro}$botic and
  Laparoscopic Surgery
VidLPRO: A Vid‾\underline{Vid}Vid​eo-L‾\underline{L}L​anguage P‾\underline{P}P​re-training Framework for Ro‾\underline{Ro}Ro​botic and Laparoscopic Surgery
Mohammadmahdi Honarmand
Muhammad Abdullah Jamal
Omid Mohareri
328
5
0
07 Sep 2024
MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with
  Missing Modality
MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality
Ruiting Dai
Yuqiao Tan
Lisi Mo
Tao He
Ke Qin
Shuang Liang
VLM
209
4
0
07 Sep 2024
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront SchedulingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Yujie Wang
Shenhan Zhu
Fangcheng Fu
Xupeng Miao
Jie Zhang
Juan Zhu
Fan Hong
Yongbin Li
Bin Cui
113
0
0
05 Sep 2024
Vision-Language Navigation with Continual Learning
Vision-Language Navigation with Continual Learning
Zhiyuan Li
Yanfeng Lv
Ziqin Tu
Di Shang
Hong Qiao
239
3
0
04 Sep 2024
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
Ivana Beňová
Michal Gregor
Albert Gatt
276
1
0
02 Sep 2024
HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution
HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution
Masoomeh Aslahishahri
Jordan R. Ubbens
Ian Stavness
229
1
0
30 Aug 2024
Previous
123...567...434445
Next