Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.00823
Cited By
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
2 May 2017
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset"
50 / 65 papers shown
Title
Multilingual Vision-Language Models, A Survey
Andrei-Alexandru Manea
Jindřich Libovický
VLM
107
1
0
26 Sep 2025
Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model
Amanuel Tafese Dufera
ViT
VLM
60
0
0
22 Sep 2025
Competition and Attraction Improve Model Fusion
Annual Conference on Genetic and Evolutionary Computation (GECCO), 2025
João Abrantes
Robert Tjarko Lange
Yujin Tang
MoMe
183
0
0
22 Aug 2025
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs
Zaid Alyafeai
Maged S. Al-Shaibani
Bernard Ghanem
233
2
0
26 May 2025
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
Kyle Buettner
Jacob Emmerson
Adriana Kovashka
119
0
0
19 Apr 2025
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Qing Zhou
Tao Yang
Junyu Gao
W. Ni
Junzheng Wu
Qi Wang
208
2
0
06 Mar 2025
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Juil Sock
Kenneth Church
Mohamed Elhoseiny
VLM
163
15
0
06 Nov 2024
Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kyle Buettner
Adriana Kovashka
VLM
223
6
0
02 Oct 2024
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
ACM Multimedia (MM), 2024
Yabing Wang
Le Wang
Qiang-feng Zhou
Zhibin Wang
Hao Li
Gang Hua
Wei Tang
202
20
0
30 Sep 2024
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Conference on Multimedia Modeling (MMM), 2024
Yuki Imajuku
Yoko Yamakata
Kiyoharu Aizawa
173
3
0
27 Sep 2024
Cross-Lingual and Cross-Cultural Variation in Image Descriptions
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Uri Berger
Edoardo M. Ponti
277
4
0
25 Sep 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
129
7
0
01 Jul 2024
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Zhijie Nie
Richong Zhang
Zhangchi Feng
Hailang Huang
Xudong Liu
166
5
0
26 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
384
6
0
13 Jun 2024
Image captioning in different languages
Emiel van Miltenburg
VLM
397
0
0
31 May 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
178
1
0
29 Mar 2024
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
Shun Inadumi
Seiya Kawano
Akishige Yuguchi
Yasutomo Kawanishi
Koichiro Yoshino
151
4
0
26 Mar 2024
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
Anh-Cuong Pham
Van-Quang Nguyen
Thi-Hong Vuong
Quang-Thuy Ha
213
2
0
16 Jan 2024
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yabing Wang
Fan Wang
Jianfeng Dong
Hao Luo
VLM
149
19
0
14 Dec 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
184
5
0
07 Nov 2023
NLLB-CLIP -- train performant multilingual image retrieval model on a budget
Alexander Visheratin
VLM
316
24
0
04 Sep 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavaš
VLM
MLLM
120
6
0
14 Jun 2023
Document Understanding Dataset and Evaluation (DUDE)
IEEE International Conference on Computer Vision (ICCV), 2023
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
...
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
236
107
0
15 May 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
287
0
0
10 Mar 2023
A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
Findings (Findings), 2023
Uri Berger
Lea Frermann
Gabriel Stanovsky
Omri Abend
117
3
0
09 Feb 2023
Universal Multimodal Representation for Language Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhuosheng Zhang
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Z. Li
Hai Zhao
SSL
254
29
0
09 Jan 2023
X
2
^2
2
-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yan Zeng
Xinsong Zhang
Hang Li
Jiawei Wang
Jipeng Zhang
Hkust Wangchunshu Zhou
VLM
MLLM
179
24
0
22 Nov 2022
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
British Machine Vision Conference (BMVC), 2022
Vaclav Kosar
A. Hoskovec
Milan Šulc
Radek Bartyzal
VLM
136
5
0
17 Nov 2022
Multilingual Multimodality: A Taxonomical Survey of Datasets, Techniques, Challenges and Opportunities
Khyathi Chandu
A. Geramifard
177
3
0
30 Oct 2022
MaXM: Towards Multilingual Visual Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
205
8
0
12 Sep 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
241
37
0
01 Jun 2022
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition
Liang Zhang
Anwen Hu
Qin Jin
VLM
129
6
0
29 May 2022
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Chen Tang
Frank Guerin
Chenghua Lin
AI4CE
OOD
320
20
0
06 Mar 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
International Conference on Machine Learning (ICML), 2022
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
Edoardo Ponti
Ivan Vulić
MLLM
VLM
ELM
318
69
0
27 Jan 2022
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
401
199
0
28 Sep 2021
Towards Zero-shot Cross-lingual Image Retrieval and Tagging
Pranav Aggarwal
Ritiz Tambi
Ajinkya Kale
VLM
237
7
0
15 Sep 2021
Bornon: Bengali Image Captioning with Transformer-based Deep learning approach
Faisal Muhammad Shah
Mayeesha Humaira
Md Abidur Rahman Khan Jim
Amit Saha Ami
Shimul Paul
95
20
0
11 Sep 2021
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
257
57
0
10 Sep 2021
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring
IEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Jianing Qiu
Frank P.-W. Lo
Xiao Gu
M. Jobarteh
Wenyan Jia
...
M. McCrory
Edward Sazonov
Mingui Sun
Gary Frost
Benny Lo
EgoV
118
20
0
01 Jul 2021
Grounding 'Grounding' in NLP
Findings (Findings), 2021
Khyathi Chandu
Yonatan Bisk
A. Black
141
57
0
04 Jun 2021
Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Mikhail Fain
Niall Twomey
Danushka Bollegala
120
2
0
11 May 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2021
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
197
102
0
01 Apr 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
171
89
0
16 Mar 2021
Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model
Mohammad Faiyaz Khan
S. M. S. Shifath
Md. Saiful Islam
VLM
106
23
0
14 Feb 2021
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Machine Translation (MT), 2020
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
170
9
0
13 Dec 2020
Towards Zero-shot Cross-lingual Image Retrieval
Pranav Aggarwal
Ajinkya Kale
VLM
212
29
0
24 Nov 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
International Conference on Computational Linguistics (COLING), 2020
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
172
39
0
26 Oct 2020
A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences
Andrew C. Merritt
Chenhui Chu
Yuki Arase
114
6
0
17 Oct 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
International Conference on Computational Linguistics (COLING), 2020
Khyathi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffM
VLM
201
3
0
10 Sep 2020
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
244
7
0
04 Jun 2020
1
2
Next