v1v2 (latest)

Image Captioning through Image Transformer

Asian Conference on Computer Vision (ACCV), 2020

29 April 2020

Papers citing "Image Captioning through Image Transformer"

31 / 31 papers shown

From Image Captioning to Visual Storytelling

291

31 Jul 2025

Attention-based transformer models for image captioning across languages: An in-depth survey and evaluationComputer Science Review (CSR), 2025

305

03 Jun 2025

An Ensemble Model with Attention Based Mechanism for Image CaptioningComputers & electrical engineering (Comput. Electr. Eng.), 2025

Israa Al Badarneh

Bassam Hammo

Omar Al-Kadi

419

28 Jan 2025

ViTOC: Vision Transformer and Object-aware Captioner

Feiyang Huang

506

09 Nov 2024

M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

^2

PT: Multimodal Prompt Tuning for Zero-shot Instruction LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

...

Lifu Huang

483

24 Sep 2024

Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

312

23 Jul 2024

$β$-Variational autoencoders and transformers for reduced-order
modelling of fluid flows

β

-Variational autoencoders and transformers for reduced-order modelling of fluid flowsNature Communications (Nat. Commun.), 2023

Alberto Solera-Rico

Carlos Sanmiguel Vila

Miguel Gómez-López

Yuning Wang

Abdulrahman Almashjary

Scott T. M. Dawson

Ricardo Vinuesa

DRL

343

185

07 Apr 2023

Towards Universal Vision-language Omni-supervised Segmentation

Bowen Dong

Jiaxi Gu

Jianhua Han

Hang Xu

W. Zuo

VLM

315

12 Mar 2023

Graph Neural Networks in Vision-Language Image Understanding: A SurveyThe Visual Computer (TVC), 2023

373

07 Mar 2023

Multi-modal Machine Learning in Engineering Design: A Review and Future DirectionsJournal of Computing and Information Science in Engineering (JCISE), 2023

439

14 Feb 2023

Noise-aware Learning from Web-crawled Image-Text Data for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2022

317

27 Dec 2022

VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning

255

10 Nov 2022

A Spatio-Temporal Attentive Network for Video-Based Crowd CountingInternational Symposium on Computers and Communications (ISCC), 2022

278

24 Aug 2022

Iterative Scene Graph GenerationNeural Information Processing Systems (NeurIPS), 2022

Siddhesh Khandelwal

Leonid Sigal

OCL

299

27 Jul 2022

The impact of memory on learning sequence-to-sequence tasks

249

29 May 2022

BodyMap: Learning Full-Body Dense Correspondence MapComputer Vision and Pattern Recognition (CVPR), 2022

Ignacio Rocco

162

18 May 2022

Controllable Image Captioning

Luka Maxwell

443

28 Apr 2022

Deep Learning Approaches on Image Captioning: A ReviewACM Computing Surveys (ACM CSUR), 2022

687

177

31 Jan 2022

Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text TranslationInternational Conference on Information Photonics (ICIP), 2021

186

28 Dec 2021

Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry

270

11 Dec 2021

Explaining Face Presentation Attack Detection Using Natural Language

126

08 Nov 2021

Bornon: Bengali Image Captioning with Transformer-based Deep learning approach

Faisal Muhammad Shah

Mayeesha Humaira

Md Abidur Rahman Khan Jim

Amit Saha Ami

Shimul Paul

160

11 Sep 2021

Journalistic Guidelines Aware News Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

354

07 Sep 2021

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

Mohammad Abuzar Shaikh

211

04 Sep 2021

Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image CaptioningACM Multimedia (ACM MM), 2021

384

05 Aug 2021

ReFormer: The Relational Transformer for Image CaptioningACM Multimedia (ACM MM), 2021

273

29 Jul 2021

Spatial-Temporal Transformer for Dynamic Scene Graph GenerationIEEE International Conference on Computer Vision (ICCV), 2021

452

157

26 Jul 2021

From Show to Tell: A Survey on Deep Learning-based Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Lorenzo Baraldi

585

373

14 Jul 2021

Exploring Dynamic Context for Multi-path Trajectory PredictionIEEE International Conference on Robotics and Automation (ICRA), 2020

491

30 Oct 2020

Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks

Chiara Plizzari

Marco Cannici

Matteo Matteucci

ViT MedIm

453

384

17 Aug 2020

AMENet: Attentive Maps Encoder Network for Trajectory Prediction

276

15 Jun 2020