Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2105.03236
Cited By
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Computer Vision and Pattern Recognition (CVPR), 2021
23 April 2021
Guanghui Xu
Shuaicheng Niu
Zhuliang Yu
Yucheng Luo
Qing Du
Qi Wu
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Accurate Text-based Image Captioning with Content Diversity Exploration"
22 / 22 papers shown
Title
MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval
J. Huang
Yaxiong Chen
Ganchao Liu
92
0
0
17 Oct 2025
TSalV360: A Method and Dataset for Text-driven Saliency Detection in 360-Degrees Videos
Ioannis Kontostathis
Evlampios Apostolidis
Vasileios Mezaris
3DGS
84
0
0
30 Sep 2025
LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning
Md. Zihad Bin Jahangir
Muhammad Ashad Kabir
Sumaiya Akter
Israt Jahan
Minh Chau
LM&MA
101
0
0
29 May 2025
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
251
0
0
14 Feb 2025
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Asian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
150
5
0
06 Oct 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
340
0
0
09 Aug 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
187
17
0
02 Jun 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
260
0
0
26 Mar 2024
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting
Zhuliang Yu
Guohao Chen
Jiaxiang Wu
Yifan Zhang
Yaofo Chen
Peilin Zhao
Shuaicheng Niu
TTA
OOD
307
9
0
18 Mar 2024
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
Pattern Recognition (Pattern Recogn.), 2023
Dongsheng Xu
Qingbao Huang
Shuang Feng
Yiru Cai
Feng Shuang
Yi Cai
ViT
VLM
402
1
0
03 Feb 2023
Towards Models that Can See and Read
IEEE International Conference on Computer Vision (ICCV), 2023
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
261
16
0
18 Jan 2023
Improving Radiology Summarization with Radiograph and Anatomy Prompts
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Jinpeng Hu
Zhihong Chen
Yang Liu
Xiang Wan
Tsung-Hui Chang
MedIm
148
10
0
15 Oct 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
ACM Multimedia (ACM MM), 2022
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
169
47
0
18 Aug 2022
A Self-Guided Framework for Radiology Report Generation
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Jun Li
Shibo Li
Ying Hu
Huiren Tao
MedIm
164
31
0
19 Jun 2022
Multimodal Learning with Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
475
814
0
13 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
582
696
0
27 May 2022
Efficient Test-Time Model Adaptation without Forgetting
International Conference on Machine Learning (ICML), 2022
Shuaicheng Niu
Jiaxiang Wu
Yifan Zhang
Yaofo Chen
S. Zheng
P. Zhao
Zhuliang Yu
OOD
VLM
TTA
305
476
0
06 Apr 2022
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Computer Vision and Pattern Recognition (CVPR), 2022
Shangbang Long
Siyang Qin
Dmitry Panteleev
Alessandro Bissacco
Yasuhisa Fujii
Michalis Raptis
256
112
0
28 Mar 2022
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
223
48
0
13 Dec 2021
Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning
IEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Yang Yang
Haoran Wei
Hengshu Zhu
Dianhai Yu
Hui Xiong
Jian Yang
SSL
85
42
0
22 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten
L. G. I. Bigorda
Dimosthenis Karatzas
322
80
0
04 Oct 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
379
342
0
14 Jul 2021
1