ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.09666
  4. Cited By
GRIT: Faster and Better Image captioning Transformer Using Dual Visual
  Features

GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

20 July 2022
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
    ViT
ArXivPDFHTML

Papers citing "GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features"

20 / 20 papers shown
Title
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
39
0
0
13 May 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
56
0
0
28 Jan 2025
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
42
2
0
01 Aug 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
26
1
0
16 Jul 2024
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Jielin Qiu
William Jongwon Han
Xuandong Zhao
Shangbang Long
Christos Faloutsos
Lei Li
51
1
0
06 Jun 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
18
13
0
06 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
22
24
0
28 Feb 2024
Return-Aligned Decision Transformer
Return-Aligned Decision Transformer
Tsunehiko Tanaka
Kenshi Abe
Kaito Ariu
Tetsuro Morimura
Edgar Simo-Serra
OffRL
54
1
0
06 Feb 2024
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
51
18
0
23 Aug 2023
A request for clarity over the End of Sequence token in the
  Self-Critical Sequence Training
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
16
6
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
19
5
0
20 May 2023
Cross-Modal Causal Intervention for Medical Report Generation
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
Liang Lin
19
5
0
16 Mar 2023
Multilingual Communication System with Deaf Individuals Utilizing
  Natural and Visual Languages
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
11
0
0
01 Dec 2022
Deep Learning Approaches on Image Captioning: A Review
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
8
88
0
31 Jan 2022
ViDT: An Efficient and Effective Fully Transformer-based Object Detector
ViDT: An Efficient and Effective Fully Transformer-based Object Detector
Hwanjun Song
Deqing Sun
Sanghyuk Chun
Varun Jampani
Dongyoon Han
Byeongho Heo
Wonjae Kim
Ming-Hsuan Yang
78
75
0
08 Oct 2021
Improving Image Captioning by Leveraging Intra- and Inter-layer Global
  Representation in Transformer Network
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network
Jiayi Ji
Yunpeng Luo
Xiaoshuai Sun
Fuhai Chen
Gen Luo
Yongjian Wu
Yue Gao
Rongrong Ji
ViT
38
163
0
13 Dec 2020
Normalized and Geometry-Aware Self-Attention Network for Image
  Captioning
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
110
188
0
19 Mar 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
Neural Baby Talk
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
189
432
0
27 Mar 2018
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image
  Captioning
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
83
1,440
0
06 Dec 2016
1