ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07841
  4. Cited By
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning

Multimodal Transformer with Multi-View Visual Representation for Image Captioning

20 May 2019
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
    ViT
ArXivPDFHTML

Papers citing "Multimodal Transformer with Multi-View Visual Representation for Image Captioning"

25 / 25 papers shown
Title
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
Md Safwan Mondal
S. Ramasamy
Luca Russo
James D. Humann
James M. Dotterweich
Pranav A. Bhounsule
36
0
0
29 Apr 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
27
0
0
23 Apr 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
45
3
0
28 Jan 2025
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
48
10
0
12 Mar 2024
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
34
15
0
13 Nov 2023
LOIS: Looking Out of Instance Semantics for Visual Question Answering
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
25
4
0
26 Jul 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
22
4
0
28 May 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
Brandon Birmingham
A. Muscat
22
1
0
07 Feb 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open
  Vocabulary Instance Segmentation
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
82
31
0
02 Jan 2023
Multi-Attention Network for Compressed Video Referring Object
  Segmentation
Multi-Attention Network for Compressed Video Referring Object Segmentation
Weidong Chen
Dexiang Hong
Yuankai Qi
Zhenjun Han
Shuhui Wang
Laiyun Qing
Qingming Huang
Guorong Li
VOS
18
35
0
26 Jul 2022
MedFuse: Multi-modal fusion with clinical time-series data and chest
  X-ray images
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Nasir Hayat
Krzysztof J. Geras
Farah E. Shamout
MedIm
19
40
0
14 Jul 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
  Counterfactual Training for Robust Content-based Image Retrieval
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
47
6
0
09 Jul 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
24
15
0
11 Feb 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For
  UAV-View Geo-Localization
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
Ming Dai
Jian Hu
Jiedong Zhuang
E. Zheng
ViT
42
111
0
23 Jan 2022
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and
  Unpaired Text-based Image Captioning
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
19
46
0
13 Dec 2021
All-In-One: Artificial Association Neural Networks
All-In-One: Artificial Association Neural Networks
Seokjun Kim
Jaeeun Jang
Hyeoncheol Kim
19
0
0
31 Oct 2021
Multimodality in Meta-Learning: A Comprehensive Survey
Multimodality in Meta-Learning: A Comprehensive Survey
Yao Ma
Shilin Zhao
Weixiao Wang
Yaoman Li
Irwin King
47
53
0
28 Sep 2021
VisGraphNet: a complex network interpretation of convolutional neural
  features
VisGraphNet: a complex network interpretation of convolutional neural features
J. Florindo
Young-Sup Lee
Kyungkoo Jun
Gwanggil Jeon
M. Albertini
FAtt
GNN
13
14
0
27 Aug 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
33
1,221
0
22 Apr 2021
Learning Emergent Discrete Message Communication for Cooperative
  Reinforcement Learning
Learning Emergent Discrete Message Communication for Cooperative Reinforcement Learning
Sheng Li
Yutai Zhou
R. Allen
Mykel J. Kochenderfer
26
13
0
24 Feb 2021
Multiresolution and Multimodal Speech Recognition with Transformers
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
18
29
0
29 Apr 2020
Normalized and Geometry-Aware Self-Attention Network for Image
  Captioning
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
117
189
0
19 Mar 2020
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
35
1,912
0
09 Aug 2019
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
1