Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1805.08661
Cited By
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
22 May 2018
Xirong Li
Chaoxi Xu
Xiaoxu Wang
Weiyu Lan
Zhengxiong Jia
Gang Yang
Jieping Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval"
23 / 23 papers shown
Title
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
98
4
0
12 Feb 2025
Unsupervised Translation of Emergent Communication
Ido Levy
Orr Paradise
Boaz Carmeli
Ron Meir
S. Goldwasser
Yonatan Belinkov
120
0
0
11 Feb 2025
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
82
25
0
04 Oct 2024
MATE: Meet At The Embedding -- Connecting Images with Long Texts
Young Kyun Jang
Junmo Kang
Yong Jae Lee
Donghyun Kim
VLM
36
5
0
26 Jun 2024
Image captioning in different languages
Emiel van Miltenburg
VLM
37
0
0
31 May 2024
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
48
3
0
22 Oct 2023
Improving Emotional Expression and Cohesion in Image-Based Playlist Description and Music Topics: A Continuous Parameterization Approach
Yuelyu Ji
Yuheng Song
Wei Wang
Ruoyi Xu
Zhongqian Xie
Huiyun Liu
DiffM
30
1
0
02 Oct 2023
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
Vaclav Kosar
A. Hoskovec
Milan Šulc
Radek Bartyzal
VLM
26
3
0
17 Nov 2022
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
MLLM
VLM
11
7
0
09 Nov 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
VLM
22
19
0
30 Sep 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
35
87
0
14 Feb 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
E. Ponti
Ivan Vulić
MLLM
VLM
ELM
40
62
0
27 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
55
254
0
14 Jul 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks
Lin Su
Nan Duan
Edward Cui
Lei Ji
Chenfei Wu
Huaishao Luo
Yongfei Liu
Ming Zhong
Taroon Bharti
Arun Sacheti
VLM
19
19
0
18 Jun 2021
FootApp: an AI-Powered System for Football Match Annotation
Silvio Barra
S. Carta
Alessandro Giuliani
Alessia Pisu
Alessandro Sebastian Podda
Daniele Riboni
14
4
0
04 Mar 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
197
310
0
02 Mar 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
24
28
0
03 Feb 2021
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
11
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
23
16
0
15 Feb 2020
Learn to Segment Retinal Lesions and Beyond
Qijie Wei
Xirong Li
Weihong Yu
Xiao Zhang
Yongpeng Zhang
...
Bin Mo
Di Gong
N. Chen
Dayong Ding
You-xin Chen
MedIm
8
47
0
25 Dec 2019
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
24
39
0
08 Sep 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
15
540
0
06 Apr 2019
1