ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSL
    VLM
ArXivPDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,088 papers shown
Title
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
88
0
0
04 Dec 2024
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment
  Analysis
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis
Hao-Yu Yang
Zhenyu Zhang
Yanyan Zhao
Bing Qin
69
0
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models
  Assessment for Traffic Monitoring Tasks
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
64
1
0
02 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal
  Alignment
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li
Yifei Xing
X. Lan
X. Li
Haifeng Chen
D. Jiang
Mamba
71
0
0
01 Dec 2024
MIMIC: Multimodal Islamophobic Meme Identification and Classification
MIMIC: Multimodal Islamophobic Meme Identification and Classification
Safrin Sanzida Islam
Sahid Hossain Mustakim
Sadia Ahmmed
Md. Faiyaz Abdullah Sayeedi
Swapnil Khandoker
Syed Tasdid Azam Dhrubo
Nahid Md Lokman Hossain
64
0
0
01 Dec 2024
Planning from Imagination: Episodic Simulation and Episodic Memory for
  Vision-and-Language Navigation
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation
Yiyuan Pan
Yunzhe Xu
Zhe Liu
Hesheng Wang
LM&Ro
73
0
0
30 Nov 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
64
0
0
30 Nov 2024
LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation
Huadong Tang
Youpeng Zhao
Y. Huang
Min Xu
Jun Wang
Qiang Wu
MLLM
VLM
78
0
0
30 Nov 2024
SentiXRL: An advanced large language Model Framework for Multilingual
  Fine-Grained Emotion Classification in Complex Text Environment
SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment
Jie Wang
Yichen Wang
Zhilin Zhang
Jianhao Zeng
Kaidi Wang
Zhiyang Chen
62
0
0
27 Nov 2024
Cross-Modal Pre-Aligned Method with Global and Local Information for
  Remote-Sensing Image and Text Retrieval
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
88
3
0
22 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf
  Foundation Models for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
47
1
0
15 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference
  Understanding
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
34
0
0
13 Nov 2024
Prompt-enhanced Network for Hateful Meme Classification
Prompt-enhanced Network for Hateful Meme Classification
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
53
0
0
12 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
19
0
0
11 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
MEANT: Multimodal Encoder for Antecedent Information
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
19
0
0
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
25
0
0
09 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
26
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in
  Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
59
1
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer
  Vision
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
21
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
An Information Criterion for Controlled Disentanglement of Multimodal Data
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
32
1
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
W. Liu
X. Wang
VLM
MLLM
LRM
35
12
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
46
1
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
47
5
0
28 Oct 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
33
1
0
27 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging
  Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
28
0
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
21
0
0
23 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using
  Transformer-based Method in Vietnamese Text-based Visual Question Answering
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
26
0
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
63
4
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
29
0
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
52
9
0
16 Oct 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across
  Modalities
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
L. Chen
Hexiang Hu
Mingda Zhang
Y. Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming Yang
Boqing Gong
26
2
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic
  Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
2
0
14 Oct 2024
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen
Jianfei Yang
28
1
0
14 Oct 2024
Leveraging Customer Feedback for Multi-modal Insight Extraction
Leveraging Customer Feedback for Multi-modal Insight Extraction
Sandeep Sricharan Mukku
Abinesh Kanagarajan
Pushpendu Ghosh
Chetan Aggarwal
27
0
0
13 Oct 2024
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
Maksim Kuznetsov
Airat Valiev
Alex Aliper
Daniil Polykovskiy
E. Tutubalina
Rim Shayakhmetov
Z. Miftahutdinov
20
0
0
11 Oct 2024
A Social Context-aware Graph-based Multimodal Attentive Learning
  Framework for Disaster Content Classification during Emergencies
A Social Context-aware Graph-based Multimodal Attentive Learning Framework for Disaster Content Classification during Emergencies
Shahid Shafi Dar
Mohammad Zia Ur Rehman
Karan Bais
Mohammed Abdul Haseeb
Nagendra Kumara
22
10
0
11 Oct 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A
  Comprehensive Survey
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan Yu
Tianxiao Li
Yuxin Zhu
Rongze Pan
33
0
0
10 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal
  Representation Inference
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
30
0
0
10 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent
  Representations
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
26
0
0
10 Oct 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large
  Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
31
4
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
35
2
0
09 Oct 2024
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and
  Performance of SGD for Fine-Tuning Language Models
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li
Xinwei Zhang
Peilin Zhong
Yuan Deng
Meisam Razaviyayn
Vahab Mirrokni
15
2
0
09 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document
  Understanding Models
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
34
0
0
04 Oct 2024
Multi-modal clothing recommendation model based on large model and VAE
  enhancement
Multi-modal clothing recommendation model based on large model and VAE enhancement
Bingjie Huang
Qingyi Lu
Shuaishuai Huang
Xue-she Wang
Haowei Yang
29
3
0
03 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
30
0
0
01 Oct 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
34
1
0
30 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
21
1
0
28 Sep 2024
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng
Wenjie Luo
Yiren Lu
Tianyi Shen
Cole Gulino
Ari Seff
Justin Fu
26
6
0
26 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in
  Cold-Start and Missing Modality Scenarios
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
26
2
0
26 Sep 2024
Previous
123456...404142
Next