ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.05230
  4. Cited By
Injecting Semantic Concepts into End-to-End Image Captioning

Injecting Semantic Concepts into End-to-End Image Captioning

9 December 2021
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
    ViT
    VLM
ArXivPDFHTML

Papers citing "Injecting Semantic Concepts into End-to-End Image Captioning"

46 / 46 papers shown
Title
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
63
0
0
11 Mar 2025
Image Embedding Sampling Method for Diverse Captioning
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
52
0
0
14 Feb 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chao Wang
Yang Zhou
Yan Peng
LRM
ReLM
51
0
0
02 Feb 2025
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile
  Vision-Language Model
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Qianhan Feng
Wenshuo Li
Tong Lin
Xinghao Chen
VLM
67
0
0
02 Dec 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
18
0
0
09 Nov 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
22
0
0
22 Oct 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
24
0
0
09 Aug 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
14
6
0
09 Jun 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive
  Counterparts for Image-to-Text Generation?
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang
Shuhuai Ren
Rundong Gao
Linli Yao
Qingyan Guo
Kaikai An
Jianhong Bai
Xu Sun
DiffM
VLM
36
6
0
16 Apr 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
18
13
0
06 Mar 2024
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
16
3
0
25 Dec 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
30
0
0
28 Nov 2023
Improving Image Captioning via Predicting Structured Concepts
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
10
5
0
14 Nov 2023
NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
Shu Zhao
Huijuan Xu
17
6
0
02 Oct 2023
Artificial-Spiking Hierarchical Networks for Vision-Language
  Representation Learning
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
26
0
0
18 Aug 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
14
3
0
13 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
18
0
0
02 Jun 2023
Image Captioning with Multi-Context Synthetic Data
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
22
7
0
29 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
25
21
0
25 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
14
5
0
20 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal
  Controls
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
102
81
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
37
18
0
03 May 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
  Report Generation
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
11
31
0
28 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
59
73
0
10 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based
  Polishing
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
11
32
0
04 Mar 2023
Enhancing Classification with Hierarchical Scalable Query on Fusion
  Transformer
Enhancing Classification with Hierarchical Scalable Query on Fusion Transformer
S. K. Sahoo
Sathish Chalasani
Abhishek Joshi
K. N. Iyer
17
2
0
28 Feb 2023
IC3: Image Captioning by Committee Consensus
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
19
17
0
02 Feb 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
61
0
0
05 Jan 2023
Efficient Image Captioning for Edge Devices
Efficient Image Captioning for Edge Devices
Ning Wang
Jiangrong Xie
Hangzai Luo
Qinglin Cheng
Jihao Wu
Mingbo Jia
Linlin Li
VLM
CLIP
12
20
0
18 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
19
61
0
06 Dec 2022
Controllable Image Captioning via Prompting
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
11
23
0
04 Dec 2022
Exploring Discrete Diffusion Models for Image Captioning
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng-Wei Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
21
16
0
21 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
12
27
0
17 Nov 2022
OSIC: A New One-Stage Image Captioner Coined
OSIC: A New One-Stage Image Captioner Coined
Bo Wang
Zhao Zhang
Ming Zhao
Xiaojie Jin
Mingliang Xu
Meng Wang
VLM
12
4
0
04 Nov 2022
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer
  Using Patches
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches
Mengya Xu
Mobarakol Islam
Hongliang Ren
MedIm
22
11
0
30 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
6
106
0
15 Jun 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
18
524
0
27 May 2022
Deep Learning Approaches on Image Captioning: A Review
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
8
46
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
MLLM
17
16
0
30 Jan 2022
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object
  Knowledge Distillation
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
39
28
0
22 Sep 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
101
53
0
23 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
845
0
17 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
Weakly Supervised Attention Learning for Textual Phrases Grounding
Weakly Supervised Attention Learning for Textual Phrases Grounding
Zhiyuan Fang
Shu Kong
Tianshu Yu
Yezhou Yang
11
12
0
01 May 2018
Range Loss for Deep Face Recognition with Long-tail
Range Loss for Deep Face Recognition with Long-tail
Xiao Zhang
Zhiyuan Fang
Yandong Wen
Zhifeng Li
Yu Qiao
CVBM
229
443
0
28 Nov 2016
1