Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.14231
Cited By
v1
v2 (latest)
Image Captioning through Image Transformer
Asian Conference on Computer Vision (ACCV), 2020
29 April 2020
Sen He
Wentong Liao
Hamed R. Tavakoli
M. Yang
Bodo Rosenhahn
N. Pugeault
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Image Captioning through Image Transformer"
31 / 31 papers shown
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
291
0
0
31 Jul 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Computer Science Review (CSR), 2025
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
305
11
0
03 Jun 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Computers & electrical engineering (Comput. Electr. Eng.), 2025
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
419
22
0
28 Jan 2025
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
506
2
0
09 Nov 2024
M
2
^2
2
PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLM
VLM
LRM
483
40
0
24 Sep 2024
Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
Dinh Phu Tran
Dao Duy Hung
Daeyoung Kim
SupR
312
7
0
23 Jul 2024
β
β
β
-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Nature Communications (Nat. Commun.), 2023
Alberto Solera-Rico
Carlos Sanmiguel Vila
Miguel Gómez-López
Yuning Wang
Abdulrahman Almashjary
Scott T. M. Dawson
Ricardo Vinuesa
DRL
343
185
0
07 Apr 2023
Towards Universal Vision-language Omni-supervised Segmentation
Bowen Dong
Jiaxi Gu
Jianhua Han
Hang Xu
W. Zuo
VLM
315
1
0
12 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
The Visual Computer (TVC), 2023
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
373
39
0
07 Mar 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Journal of Computing and Information Science in Engineering (JCISE), 2023
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
439
73
0
14 Feb 2023
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2022
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
317
33
0
27 Dec 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
255
1
0
10 Nov 2022
A Spatio-Temporal Attentive Network for Video-Based Crowd Counting
International Symposium on Computers and Communications (ISCC), 2022
Marco Avvenuti
Marco Bongiovanni
Luca Ciampi
Fabrizio Falchi
Claudio Gennaro
Nicola Messina
278
13
0
24 Aug 2022
Iterative Scene Graph Generation
Neural Information Processing Systems (NeurIPS), 2022
Siddhesh Khandelwal
Leonid Sigal
OCL
299
39
0
27 Jul 2022
The impact of memory on learning sequence-to-sequence tasks
Alireza Seif
S. Loos
Gennaro Tucci
É. Roldán
Sebastian Goldt
249
6
0
29 May 2022
BodyMap: Learning Full-Body Dense Correspondence Map
Computer Vision and Pattern Recognition (CVPR), 2022
A. Ianina
N. Sarafianos
Yuanlu Xu
Ignacio Rocco
Tony Tung
3DH
162
18
0
18 May 2022
Controllable Image Captioning
Luka Maxwell
443
0
0
28 Apr 2022
Deep Learning Approaches on Image Captioning: A Review
ACM Computing Surveys (ACM CSUR), 2022
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
687
177
0
31 Jan 2022
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
International Conference on Information Photonics (ICIP), 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
186
4
0
28 Dec 2021
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Karl Lowenmark
C. Taal
S. Schnabel
Marcus Liwicki
Fredrik Sandin
270
9
0
11 Dec 2021
Explaining Face Presentation Attack Detection Using Natural Language
H. Mirzaalian
Mohamed E. Hussein
L. Spinoulas
Jonathan May
Wael AbdAlmageed
CVBM
FAtt
AAML
126
5
0
08 Nov 2021
Bornon: Bengali Image Captioning with Transformer-based Deep learning approach
Faisal Muhammad Shah
Mayeesha Humaira
Md Abidur Rahman Khan Jim
Amit Saha Ami
Shimul Paul
160
22
0
11 Sep 2021
Journalistic Guidelines Aware News Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Xuewen Yang
Svebor Karaman
Joel R. Tetreault
Alex Jaimes
354
32
0
07 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
211
1
0
04 Sep 2021
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
ACM Multimedia (ACM MM), 2021
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
384
74
0
05 Aug 2021
ReFormer: The Relational Transformer for Image Captioning
ACM Multimedia (ACM MM), 2021
Xuewen Yang
Yingru Liu
Xin Wang
ViT
273
67
0
29 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
IEEE International Conference on Computer Vision (ICCV), 2021
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
452
157
0
26 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
585
373
0
14 Jul 2021
Exploring Dynamic Context for Multi-path Trajectory Prediction
IEEE International Conference on Robotics and Automation (ICRA), 2020
Hao Cheng
Wentong Liao
Xuejiao Tang
M. Yang
Monika Sester
Bodo Rosenhahn
491
36
0
30 Oct 2020
Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks
Chiara Plizzari
Marco Cannici
Matteo Matteucci
ViT
MedIm
453
384
0
17 Aug 2020
AMENet: Attentive Maps Encoder Network for Trajectory Prediction
Hao Cheng
Wentong Liao
M. Yang
Bodo Rosenhahn
Monika Sester
276
49
0
15 Jun 2020
1
Page 1 of 1