ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXivPDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,507 papers shown
Title
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
34
7
0
23 Aug 2024
VALE: A Multimodal Visual and Language Explanation Framework for Image
  Classifiers using eXplainable AI and Language Models
VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models
Purushothaman Natarajan
Athira Nambiar
AAML
17
3
0
23 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
M. Zhang
Xunliang Cai
LRM
29
6
0
21 Aug 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation
TraDiffusion: Trajectory-Based Training-Free Image Generation
Mingrui Wu
Oucheng Huang
Jiayi Ji
Jiale Li
Xinyue Cai
Huafeng Kuang
Jianzhuang Liu
Xiaoshuai Sun
Rongrong Ji
40
3
0
19 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted
  Attack for Image-to-Text Models
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
40
1
0
16 Aug 2024
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating
  Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
Arpan Mahara
N. Rishe
Liangdong Deng
VLM
GAN
32
2
0
15 Aug 2024
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
Fan Yang
Sicheng Zhao
Yanhao Zhang
Haoxiang Chen
Hui Chen
Wenbo Tang
Guiguang Ding
33
4
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
35
3
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
27
0
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
43
2
0
07 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
19
2
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
ST-SACLF: Style Transfer Informed Self-Attention Classifier for
  Bias-Aware Painting Classification
ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification
Mridula Vijendran
Frederick W. B. Li
Jingjing Deng
Hubert P. H. Shum
48
0
0
03 Aug 2024
Review of Cloud Service Composition for Intelligent Manufacturing
Review of Cloud Service Composition for Intelligent Manufacturing
Cuixia Li
Liqiang Liu
Li Shi
19
0
0
03 Aug 2024
Towards End-to-End Explainable Facial Action Unit Recognition via
  Vision-Language Joint Learning
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Yaming Yang
Zhe Wang
Fuhai Chen
Wei Zhao
Weigang Lu
Joemon M. Jose
CVBM
23
1
0
01 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional
  Generalization
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
19
0
0
01 Aug 2024
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided
  Attention for Enhanced Document-level Relation Extraction
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction
Yanxu Mao
Peipei Liu
Tiehan Cui
24
0
0
31 Jul 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Adam Wojciechowski
Mateusz Lango
Ondrej Dusek
FAtt
41
0
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
33
6
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
44
3
0
26 Jul 2024
Attention Beats Linear for Fast Implicit Neural Representation
  Generation
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
40
1
0
22 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
40
5
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
28
0
0
19 Jul 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text
  Generation: A State-of-the-Art Investigation
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
31
8
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible
  Collisions in Object Placement Tasks
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
27
0
0
18 Jul 2024
XEdgeAI: A Human-centered Industrial Inspection Framework with
  Data-centric Explainable Edge AI Approach
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
Truong Thanh Hung Nguyen
Phuc Truong Loc Nguyen
Hung Cao
24
2
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
33
2
0
15 Jul 2024
Predicting Winning Captions for Weekly New Yorker Comics
Predicting Winning Captions for Weekly New Yorker Comics
Stanley Cao
Sonny Young
ViT
VLM
27
1
0
12 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi
42
0
0
10 Jul 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang
Ruohan Dong
Jiayi Ji
Yiwei Ma
Haowei Wang
Xiaoshuai Sun
Rongrong Ji
44
3
0
07 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with
  Answer-awareness and Region-reference
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
22
0
0
06 Jul 2024
Towards Context-Aware Emotion Recognition Debiasing from a Causal
  Demystification Perspective via De-confounded Training
Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training
Dingkang Yang
Kun Yang
Haopeng Kuang
Zhaoyu Chen
Yuzheng Wang
Lihua Zhang
CML
36
4
0
06 Jul 2024
Explainable Image Captioning using CNN- CNN architecture and
  Hierarchical Attention
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention
Rishi Mohan
Sanjay Sureshkumar
Vignesh Sivasubramaniam
25
1
0
28 Jun 2024
Analyzing Quality, Bias, and Performance in Text-to-Image Generative
  Models
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models
Nila Masrourisaadat
Nazanin Sedaghatkish
Fatemeh Sarshartehrani
Edward A. Fox
37
6
0
28 Jun 2024
Brain Tumor Classification using Vision Transformer with Selective
  Cross-Attention Mechanism and Feature Calibration
Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration
M. Khaniki
Alireza Golkarieh
Mohammad Manthouri
MedIm
24
4
0
25 Jun 2024
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas
Rafael Carranza
42
0
0
24 Jun 2024
Reading Is Believing: Revisiting Language Bottleneck Models for Image
  Classification
Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification
Honori Udo
Takafumi Koshinaka
VLM
32
0
0
22 Jun 2024
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
Panagiotis Kaliosis
John Pavlopoulos
Foivos Charalampakos
Georgios Moschovis
Ion Androutsopoulos
MedIm
19
1
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of
  Traffic Safety Critical Events
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
37
3
0
19 Jun 2024
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain
  Learning
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning
Xiaowen Ma
Jiawei Yang
Rui Che
Huanting Zhang
Wei Zhang
16
4
0
19 Jun 2024
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with
  Visual Insights for Retinal Image Medical Description Generation
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation
Nagur Shareef Shaik
T. Cherukuri
Dong Hye Ye
MedIm
30
0
0
19 Jun 2024
Improving Large Models with Small models: Lower Costs and Better
  Performance
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
37
4
0
15 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A
  Survey
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao-Yu Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
34
8
0
12 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
22
6
0
09 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and
  Challenges
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
23
3
0
04 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image
  Captioning
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
36
3
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in
  Large Multi-modal Models
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
39
10
0
04 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via
  Unsupervised Guidance
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
18
3
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
27
0
0
01 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
35
28
0
31 May 2024
Previous
123456...697071
Next