ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.12513
  4. Cited By
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining
  on Visual Language Understanding
v1v2 (latest)

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

Computer Vision and Pattern Recognition (CVPR), 2023
21 March 2023
Morris Alper
Michael Fiman
Hadar Averbuch-Elor
    VLMLRM
ArXiv (abs)PDFHTMLGithub

Papers citing "Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding"

16 / 16 papers shown
AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?
AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?
Hyunjong Ok
Suho Yoo
Hyeonjun Kim
Jaeho Lee
AuLLMRALMLRM
212
1
0
22 Sep 2025
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Suho Yoo
Hyunjong Ok
Jaeho Lee
AuLLMRALM
355
2
0
21 Mar 2025
AudioBERT: Audio Knowledge Augmented Language Model
AudioBERT: Audio Knowledge Augmented Language ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLMRALMVLM
310
2
0
17 Jan 2025
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision
  Language Models
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
Nam Hyeon-Woo
Moon Ye-Bin
Wonseok Choi
Lee Hyun
Tae-Hyun Oh
CoGe
284
8
0
23 Sep 2024
Improving the Efficiency of Visually Augmented Language Models
Improving the Efficiency of Visually Augmented Language ModelsInternational Conference on Computational Linguistics (COLING), 2024
Paula Ontalvilla
Aitor Ormazabal
Gorka Azkune
VLM
287
0
0
17 Sep 2024
What does Kiki look like? Cross-modal associations between speech sounds
  and visual shapes in vision-and-language models
What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models
Tessa Verhoef
Kiana Shahrasbi
Tom Kouwenhoven
VLM
243
5
0
25 Jul 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper
Hadar Averbuch-Elor
VLM
498
23
0
11 Jul 2024
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and
  Lexical Alterations
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
461
35
0
17 Jun 2024
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large
  Language Models Reveal Human-like Patterns
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns
Asaf Yehudai
Taelin Karidi
Gabriel Stanovsky
Ariel Goldstein
Omri Abend
225
3
0
23 May 2024
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and
  Lexical Alterations
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
VLMCoGe
315
2
0
25 Apr 2024
Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs
Template-Based Probes Are Imperfect Lenses for Counterfactual Bias Evaluation in LLMs
Farnaz Kohankhaki
D. B. Emerson
David B. Emerson
Laleh Seyyed-Kalantari
Faiza Khan Khattak
518
2
0
04 Apr 2024
VCD: A Dataset for Visual Commonsense Discovery in Images
VCD: A Dataset for Visual Commonsense Discovery in Images
Xiangqing Shen
Yurun Song
Siwei Wu
Rui Xia
352
6
0
27 Feb 2024
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLMVLM
518
16
0
06 Dec 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Kiki or Bouba? Sound Symbolism in Vision-and-Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Morris Alper
Hadar Averbuch-Elor
352
17
0
25 Oct 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
362
2
0
06 Sep 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language
  Pretraining?
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Haiwei Yang
Liang Ding
Jun Rao
Ye Liu
Li Shen
Changxing Ding
332
27
0
24 Aug 2023
1
Page 1 of 1