Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.06775
Cited By
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
14 October 2020
Hao Tan
Mohit Bansal
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision"
18 / 18 papers shown
Title
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
70
0
0
27 Apr 2025
AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLM
RALM
VLM
42
0
0
17 Jan 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
117
0
0
01 Dec 2024
Grounding Spatial Relations in Text-Only Language Models
Gorka Azkune
Ander Salaberria
Eneko Agirre
34
0
0
20 Mar 2024
Detecting Concrete Visual Tokens for Multimodal Machine Translation
Braeden Bowen
Vipin Vijayan
Scott Grigsby
Timothy Anderson
Jeremy Gwinnup
26
2
0
05 Mar 2024
WinoViz: Probing Visual Properties of Objects Under Different States
Woojeong Jin
Tejas Srinivasan
Jesse Thomason
Xiang Ren
28
1
0
21 Feb 2024
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du
Haoxin Li
Xu Guo
Boyang Li
25
1
0
05 Dec 2023
Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining
Qin Chao
Eunsoo Kim
Boyang Albert Li
11
1
0
20 Apr 2023
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Mohit Bansal
VLM
16
9
0
21 Nov 2022
How direct is the link between words and images?
Hassan Shahmohammadi
Maria Heitmeier
Elnaz Shafaei-Bajestan
Hendrik P. A. Lensch
Harald Baayen
24
0
0
30 Jun 2022
VALHALLA: Visual Hallucination for Machine Translation
Yi Li
Rameswar Panda
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
36
38
0
31 May 2022
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua-Hong Wu
Haifeng Wang
MLLM
11
21
0
17 Mar 2022
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
43
94
0
01 Jul 2021
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
Zhiyong Wu
Lingpeng Kong
W. Bi
Xiang Li
B. Kao
LRM
15
76
0
30 May 2021
Deep Learning and the Global Workspace Theory
R. V. Rullen
Ryota Kanai
29
65
0
04 Dec 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
926
0
24 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
268
10,196
0
16 Nov 2016
1