ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.03925
  4. Cited By
Image Captioning with Semantic Attention

Image Captioning with Semantic Attention

12 March 2016
Quanzeng You
Hailin Jin
Zhaowen Wang
Chen Fang
Jiebo Luo
    VLM
ArXivPDFHTML

Papers citing "Image Captioning with Semantic Attention"

50 / 191 papers shown
Title
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
42
0
0
03 Apr 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
S. Chen
Guang Chen
Yanfeng Wang
Y. Zhang
51
0
0
18 Mar 2025
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Xin Gu
Yaojie Shen
Chenxi Luo
Tiejian Luo
Yan Huang
Yuewei Lin
Heng Fan
L. Zhang
63
1
0
16 Feb 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
45
3
0
28 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
48
0
0
03 Jan 2025
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
Visual Grounding with Attention-Driven Constraint Balancing
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
35
4
0
03 Jul 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge
  with Retrieved Tags
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
42
2
0
16 Jun 2024
Understanding attention-based encoder-decoder networks: a case study
  with chess scoresheet recognition
Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
Sergio Y. Hayashi
N. Hirata
48
0
0
23 Apr 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
36
0
0
26 Mar 2024
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani
Ivan Skorokhodov
Victor Rong
Gordon Wetzstein
Leonidas J. Guibas
Peter Wonka
Sergey Tulyakov
Jeong Joon Park
Andrea Tagliasacchi
David B. Lindell
DiffM
46
103
0
29 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
  Tuning
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLM
VLM
27
94
0
13 Nov 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Asynchronous Evolution of Deep Neural Network Architectures
J. Liang
H. Shahrzad
Risto Miikkulainen
23
0
0
08 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image
  Captioning
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
28
34
0
31 Jul 2023
GEST: the Graph of Events in Space and Time as a Common Representation
  between Vision and Language
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
14
0
0
22 May 2023
Generation-Guided Multi-Level Unified Network for Video Grounding
Generation-Guided Multi-Level Unified Network for Video Grounding
Xingyi Cheng
Xiangyu Wu
Dong Shen
Hezheng Lin
Fan Yang
19
0
0
14 Mar 2023
On The Coherence of Quantitative Evaluation of Visual Explanations
On The Coherence of Quantitative Evaluation of Visual Explanations
Benjamin Vandersmissen
José Oramas
XAI
FAtt
26
3
0
14 Feb 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image
  Captioning
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Mozhgan Pourkeshavarz
Shahabedin Nabavi
Mohsen Moghaddam
M. Shamsfard
31
4
0
08 Feb 2023
Training Integer-Only Deep Recurrent Neural Networks
Training Integer-Only Deep Recurrent Neural Networks
V. Nia
Eyyub Sari
Vanessa Courville
M. Asgharian
MQ
45
2
0
22 Dec 2022
Backdoor Attack Detection in Computer Vision by Applying Matrix
  Factorization on the Weights of Deep Networks
Backdoor Attack Detection in Computer Vision by Applying Matrix Factorization on the Weights of Deep Networks
Khondoker Murad Hossain
Tim Oates
AAML
23
4
0
15 Dec 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach
  to Cross-Modal Sarcasm Generation
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
29
1
0
20 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
22
27
0
17 Nov 2022
Prophet Attention: Predicting Attention with Future Attention for Image
  Captioning
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
24
46
0
19 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
37
10
0
04 Oct 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning
  in Wikipedia
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
28
10
0
21 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information
Belief Revision based Caption Re-ranker with Visual Semantic Information
Ahmed Sabir
Francesc Moreno-Noguer
Pranava Madhyastha
Lluís Padró
BDL
14
2
0
16 Sep 2022
M^4I: Multi-modal Models Membership Inference
M^4I: Multi-modal Models Membership Inference
Pingyi Hu
Zihan Wang
Ruoxi Sun
Hu Wang
Minhui Xue
39
26
0
15 Sep 2022
Disentangle and Remerge: Interventional Knowledge Distillation for
  Few-Shot Object Detection from A Conditional Causal Perspective
Disentangle and Remerge: Interventional Knowledge Distillation for Few-Shot Object Detection from A Conditional Causal Perspective
Jiangmeng Li
Yanan Zhang
Wenwen Qiang
Lingyu Si
Chengbo Jiao
Xiaohui Hu
Changwen Zheng
Fuchun Sun
CML
34
28
0
26 Aug 2022
A Medical Semantic-Assisted Transformer for Radiographic Report
  Generation
A Medical Semantic-Assisted Transformer for Radiographic Report Generation
Zhanyu Wang
Mingkang Tang
Lei Wang
Xiu Li
Luping Zhou
ViT
MedIm
24
56
0
22 Aug 2022
Vision-Language Matching for Text-to-Image Synthesis via Generative
  Adversarial Networks
Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks
Qingrong Cheng
Keyu Wen
X. Gu
VLM
EGVM
26
16
0
20 Aug 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate
  Semantic Attention Refinement
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
16
34
0
18 Aug 2022
CSSAM:Code Search via Attention Matching of Code Semantics and
  Structures
CSSAM:Code Search via Attention Matching of Code Semantics and Structures
Y. Hu
Bowen Cai
Yaoxiang Yu
21
3
0
08 Aug 2022
Invariant Feature Learning for Generalized Long-Tailed Classification
Invariant Feature Learning for Generalized Long-Tailed Classification
Kaihua Tang
Mingyuan Tao
Jiaxin Qi
Zhenguang Liu
Hanwang Zhang
VLM
24
52
0
19 Jul 2022
Image Captioning based on Feature Refinement and Reflective Decoding
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
14
3
0
16 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
17
87
0
14 Jun 2022
Dual Windows Are Significant: Learning from Mediastinal Window and
  Focusing on Lung Window
Dual Windows Are Significant: Learning from Mediastinal Window and Focusing on Lung Window
Qiuli Wang
Xin Tan
Chen Liu
15
0
0
08 Jun 2022
Prompt-based Learning for Unpaired Image Captioning
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Xiao Wang
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
C. L. P. Chen
VLM
21
31
0
26 May 2022
Importance Weighted Structure Learning for Scene Graph Generation
Importance Weighted Structure Learning for Scene Graph Generation
Daqing Liu
M. Bober
J. Kittler
21
5
0
14 May 2022
UTC: A Unified Transformer with Inter-Task Contrastive Learning for
  Visual Dialog
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog
Cheng Chen
Yudong Zhu
Zhenshan Tan
Qingrong Cheng
Xin Jiang
Qun Liu
X. Gu
25
39
0
01 May 2022
Collaborative Transformers for Grounded Situation Recognition
Collaborative Transformers for Grounded Situation Recognition
Junhyeong Cho
Youngseok Yoon
Suha Kwak
ViT
19
25
0
30 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
26
21
0
29 Mar 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement
  Networks for End-to-End Visual Grounding
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
11
61
0
29 Mar 2022
Quantifying Societal Bias Amplification in Image Captioning
Quantifying Societal Bias Amplification in Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
19
49
0
29 Mar 2022
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
  Tags for Medical Report Generation
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation
Di You
Fenglin Liu
Shen Ge
Xiaoxia Xie
Jing Zhang
Xian Wu
ViT
MedIm
21
106
0
18 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual
  Storytelling
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
19
9
0
10 Mar 2022
TableFormer: Table Structure Understanding with Transformers
TableFormer: Table Structure Understanding with Transformers
A. Nassar
Nikolaos Livathinos
Maksym Lysak
Peter W. J. Staar
LMTD
ViT
11
73
0
02 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
28
27
0
21 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
Adversarial Attack and Defense of YOLO Detectors in Autonomous Driving
  Scenarios
Adversarial Attack and Defense of YOLO Detectors in Autonomous Driving Scenarios
Jung Im Choi
Qing Tian
AAML
22
38
0
10 Feb 2022
1234
Next