ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,223 papers shown
Title
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
LLM supervised Pre-training for Multimodal Emotion Recognition in ConversationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Soumya Dutta
Sriram Ganapathy
224
14
0
20 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token MarksComputer Vision and Pattern Recognition (CVPR), 2025
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjDVLM
468
6
0
14 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
359
2
0
13 Jan 2025
MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection
MTPareto: A MultiModal Targeted Pareto Framework for Fake News DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Kaiying Yan
Moyang Liu
Yukun Liu
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Zhengqi Wen
Guanjun Li
225
1
0
12 Jan 2025
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference FrameworkIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Run Shao
Cheng Yang
Qiujun Li
Qing Zhu
Yongjun Zhang
...
Yu Liu
Yong Tang
Dapeng Liu
Shizhong Yang
Haifeng Li
392
0
0
08 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
100
0
0
07 Jan 2025
Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
Malak Mansour
Ahmed Aly
Bahey Tharwat
Sarim Hashmi
Dong An
Ian Reid
LM&RoELMLRM
279
3
0
07 Jan 2025
Foundations of GenIR
Jiaxin Mao
Jingtao Zhan
Wenshu Fan
214
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
410
32
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
671
113
0
03 Jan 2025
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code PlatformJournal of Intelligence and Information Systems (JIIS), 2025
Cheonsu Jeong
399
11
0
01 Jan 2025
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi
Shivam Sharma
Tanmoy Chakraborty
170
4
0
31 Dec 2024
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang
Ziqiao Ma
Jialu Li
Yanyuan Qiao
Zun Wang
J. Chai
Qi Wu
Joey Tianyi Zhou
Parisa Kordjamshidi
LRM
335
57
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
763
26
0
28 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through
  Zero-shot Generation
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
333
2
0
25 Dec 2024
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
205
2
0
24 Dec 2024
Rationale-guided Prompting for Knowledge-based Visual Question Answering
Rationale-guided Prompting for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
355
79
0
22 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Bringing Multimodality to Amazon Visual Search SystemKnowledge Discovery and Data Mining (KDD), 2024
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
229
7
0
17 Dec 2024
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched
  EMR
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched EMRIEEE Access (IEEE Access), 2024
Jangyeong Jeon
Sangyeon Cho
Dongjoon Lee
Changhee Lee
Junyeong Kim
171
0
0
16 Dec 2024
ViSymRe: Vision-guided Multimodal Symbolic Regression
ViSymRe: Vision-guided Multimodal Symbolic Regression
Da Li
Junping Yin
Jin Xu
Xinxin Li
Juan Zhang
289
1
0
15 Dec 2024
Rebalanced Vision-Language Retrieval Considering Structure-Aware
  Distillation
Rebalanced Vision-Language Retrieval Considering Structure-Aware DistillationIEEE Transactions on Image Processing (TIP), 2024
Yang Yang
Wenjuan Xi
Luping Zhou
Jinhui Tang
281
5
0
14 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
323
4
0
13 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Unified Framework for Open-World Compositional Zero-shot LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
230
1
0
05 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
538
1
0
04 Dec 2024
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment
  Analysis
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis
Hao Yang
Zhenyu Zhang
Yanyan Zhao
Bing Qin
196
0
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models
  Assessment for Traffic Monitoring Tasks
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
320
3
0
02 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal
  Alignment
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal AlignmentComputer Vision and Pattern Recognition (CVPR), 2024
Yan Li
Yifei Xing
X. Lan
Xuzhao Li
Haifeng Chen
Shihong Deng
Mamba
229
16
0
01 Dec 2024
MIMIC: Multimodal Islamophobic Meme Identification and Classification
MIMIC: Multimodal Islamophobic Meme Identification and Classification
Safrin Sanzida Islam
Sahid Hossain Mustakim
Sadia Ahmmed
Md. Faiyaz Abdullah Sayeedi
Swapnil Khandoker
Syed Tasdid Azam Dhrubo
Nahid Md Lokman Hossain
182
1
0
01 Dec 2024
Planning from Imagination: Episodic Simulation and Episodic Memory for
  Vision-and-Language Navigation
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language NavigationAAAI Conference on Artificial Intelligence (AAAI), 2024
Yiyuan Pan
Yunzhe Xu
Yanfeng Guo
Hesheng Wang
LM&Ro
356
6
0
30 Nov 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
131
1
0
30 Nov 2024
LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation
Huadong Tang
Youpeng Zhao
Y. Huang
Min Xu
Jun Wang
Qiang Wu
MLLMVLM
233
1
0
30 Nov 2024
SentiXRL: An advanced large language Model Framework for Multilingual
  Fine-Grained Emotion Classification in Complex Text Environment
SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment
Jie Wang
Yichen Wang
Zhilin Zhang
Jianhao Zeng
Kaidi Wang
Zhiyang Chen
277
1
0
27 Nov 2024
Cross-Modal Pre-Aligned Method with Global and Local Information for
  Remote-Sensing Image and Text Retrieval
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text RetrievalIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
219
11
0
22 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
229
7
0
17 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
546
2
0
15 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference
  Understanding
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
217
0
0
13 Nov 2024
Prompt-enhanced Network for Hateful Meme Classification
Prompt-enhanced Network for Hateful Meme ClassificationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
280
3
0
12 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
117
1
0
11 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
MEANT: Multimodal Encoder for Antecedent InformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
132
0
1
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
315
2
0
09 Nov 2024
Hierarchical Visual Feature Aggregation for OCR-Free Document
  Understanding
Hierarchical Visual Feature Aggregation for OCR-Free Document UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Jaeyoo Park
Jin Young Choi
Jeonghyung Park
Bohyung Han
VLM
87
7
0
08 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
217
1
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
516
4
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer
  Vision
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
94
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
An Information Criterion for Controlled Disentanglement of Multimodal DataInternational Conference on Learning Representations (ICLR), 2024
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
324
6
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLMMLLMLRM
201
67
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
242
5
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLMLRM
222
7
0
28 Oct 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
541
13
0
27 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
187
3
0
24 Oct 2024
Previous
123456...434445
Next