ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.05963
  4. Cited By
Image Captioning: Transforming Objects into Words

Image Captioning: Transforming Objects into Words

14 June 2019
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
    ViT
ArXivPDFHTML

Papers citing "Image Captioning: Transforming Objects into Words"

50 / 161 papers shown
Title
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
H. Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
24
0
0
04 May 2025
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
Maofu Liu
Jiahui Liu
Xiaokang Zhang
37
0
0
30 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
77
0
0
03 Mar 2025
Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities
Shounak Datta
Dhanasekar Sundararaman
39
1
0
28 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
45
0
0
03 Jan 2025
CEGI: Measuring the trade-off between efficiency and carbon emissions
  for SLMs and VLMs
CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs
Abhas Kumar
Kapil Pathak
Rajesh Kavuru
Prabhakar Srinivasan
65
0
0
03 Dec 2024
A Monte Carlo Framework for Calibrated Uncertainty Estimation in
  Sequence Prediction
A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction
Qidong Yang
Weicheng Zhu
Joseph Keslin
L. Zanna
Tim G. J. Rudner
Carlos Fernandez-Granda
BDL
UQCV
AI4TS
44
0
0
30 Oct 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
25
0
0
28 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
27
0
0
09 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
33
6
0
29 Jul 2024
Continual Panoptic Perception: Towards Multi-modal Incremental
  Interpretation of Remote Sensing Images
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
Bo Yuan
Danpei Zhao
Zhuoran Liu
Wentao Li
Tian Li
CLL
VLM
28
2
0
19 Jul 2024
MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark
  Emphasizing Modality Consistency
MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Junzhe Zhang
Huixuan Zhang
Xunjian Yin
Baizhou Huang
Xu Zhang
Xinyu Hu
Xiaojun Wan
29
7
0
19 Jun 2024
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with
  Visual Insights for Retinal Image Medical Description Generation
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation
Nagur Shareef Shaik
T. Cherukuri
Dong Hye Ye
MedIm
32
0
0
19 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
32
0
0
01 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
19
9
0
21 May 2024
New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for
  Vietnamese Multimodal Aspect-Category Sentiment Analysis
New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis
Quy Hoang Nguyen
Minh-Van Truong Nguyen
Kiet Van Nguyen
24
2
0
01 May 2024
Exploring the Distinctiveness and Fidelity of the Descriptions Generated
  by Large Vision-Language Models
Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models
Yuhang Huang
Zihan Wu
Chongyang Gao
Jiawei Peng
Xu Yang
24
2
0
26 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image
  Captions as Prompts
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
36
10
0
12 Apr 2024
Text Data-Centric Image Captioning with Interactive Prompts
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
30
0
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
36
0
0
26 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
46
10
0
12 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via
  Chart-to-Table Pre-training and Multitask Instruction Tuning
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng
Wenqi Shao
Quanfeng Lu
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
27
45
0
04 Jan 2024
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
Improving Image Captioning via Predicting Structured Concepts
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
24
8
0
14 Nov 2023
Neural Network Methods for Radiation Detectors and Imaging
Neural Network Methods for Radiation Detectors and Imaging
S. Lin
S. Ning
H. Zhu
T. Zhou
C. L. Morris
S. Clayton
M. Cherukara
R. T. Chen
Z. Wang
AI4CE
24
5
0
09 Nov 2023
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures
  for Image Captioning Models
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
23
4
0
07 Nov 2023
Knowledge Editing for Large Language Models: A Survey
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Jundong Li
KELM
66
133
0
24 Oct 2023
Can We Edit Multimodal Large Language Models?
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
28
28
0
12 Oct 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning
  Robustness
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
22
0
0
27 Sep 2023
PoseFix: Correcting 3D Human Poses with Natural Language
PoseFix: Correcting 3D Human Poses with Natural Language
Ginger Delmas
Philippe Weinzaepfel
Francesc Moreno-Noguer
Grégory Rogez
22
22
0
15 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
53
19
0
23 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
34
6
0
02 Aug 2023
Enhancing image captioning with depth information using a
  Transformer-based framework
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
17
4
0
24 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image
  Captioning
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
19
2
0
19 Jul 2023
Self-Supervised Image Captioning with CLIP
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLM
SSL
21
2
0
26 Jun 2023
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Zhongzhen Huang
Xiaofan Zhang
Shaoting Zhang
MedIm
17
51
0
20 Jun 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
16
3
0
13 Jun 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
27
21
0
25 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq R. Joty
LRM
19
100
0
24 May 2023
A request for clarity over the End of Sequence token in the
  Self-Critical Sequence Training
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
24
6
0
20 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in
  Vietnamese
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
9
3
0
07 May 2023
Transforming Visual Scene Graphs to Image Captions
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
39
19
0
03 May 2023
Relational Context Learning for Human-Object Interaction Detection
Relational Context Learning for Human-Object Interaction Detection
Sanghyun Kim
Deunsol Jung
Minsu Cho
19
36
0
11 Apr 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning
  Evaluation
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
13
55
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
75
159
0
21 Mar 2023
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
Yangqiaoyu Zhou
Kai-Lang Yao
Wusuo Li
MedIm
11
1
0
17 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
23
13
0
07 Mar 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
22
29
0
16 Feb 2023
1234
Next