ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02127
  4. Cited By
Aligning Linguistic Words and Visual Semantic Units for Image Captioning

Aligning Linguistic Words and Visual Semantic Units for Image Captioning

ACM Multimedia (ACM MM), 2019
6 August 2019
Longteng Guo
Jing Liu
Jinhui Tang
Jiangwei Li
W. Luo
Hanqing Lu
ArXiv (abs)PDFHTML

Papers citing "Aligning Linguistic Words and Visual Semantic Units for Image Captioning"

27 / 27 papers shown
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioningAAAI Conference on Artificial Intelligence (AAAI), 2025
Xu Zhang
Jin Yuan
Hanwang Zhang
Guojin Zhong
Yongsheng Zang
Jiacheng Lin
Zhiyong Li
DiffMVLM
178
2
0
01 Dec 2025
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
227
2
0
05 Aug 2023
Semantic Composition in Visually Grounded Language Models
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
254
1
0
15 May 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A SurveyThe Visual Computer (TVC), 2023
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
357
38
0
07 Mar 2023
Cross-modal Attention Congruence Regularization for Vision-Language
  Relation Alignment
Cross-modal Attention Congruence Regularization for Vision-Language Relation AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Rohan Pandey
Rulin Shao
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
253
21
0
20 Dec 2022
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic,
  and Multimodal
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and MultimodalIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
K. Liang
Lingyuan Meng
Meng Liu
Yue Liu
Wenxuan Tu
Siwei Wang
Sihang Zhou
Xinwang Liu
Fu Sun
LRM
577
256
0
12 Dec 2022
Controllable Image Captioning
Luka Maxwell
420
0
0
28 Apr 2022
Generating More Pertinent Captions by Leveraging Semantics and Style on
  Multi-Source Datasets
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
388
15
0
24 Nov 2021
Unifying Multimodal Transformer for Bi-directional Image and Text
  Generation
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
259
65
0
19 Oct 2021
Domain Adaptive Semantic Segmentation without Source Data
Domain Adaptive Semantic Segmentation without Source Data
Fuming You
Jingjing Li
Lei Zhu
Ke Lu
Zhi Chen
Zi Huang
264
67
0
13 Oct 2021
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Ling Cheng
Wei Wei
Feida Zhu
Yong Liu
Chunyan Miao
ViT
241
3
0
29 Sep 2021
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for
  Stylized Image Captioning
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image CaptioningACM Multimedia (ACM MM), 2021
Guodun Li
Yuchen Zhai
Peng Liu
Yin Zhang
327
22
0
26 Aug 2021
Scene Designer: a Unified Model for Scene Search and Synthesis from
  Sketch
Scene Designer: a Unified Model for Scene Search and Synthesis from Sketch
Leo Sampaio Ferraz Ribeiro
Tu Bui
John Collomosse
M. Ponti
3DV
242
8
0
16 Aug 2021
OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution
OSCAR-Net: Object-centric Scene Graph Attention for Image AttributionIEEE International Conference on Computer Vision (ICCV), 2021
Eric N. D. Nguyen
Tu Bui
Vishy Swaminathan
John Collomosse
197
19
0
07 Aug 2021
ReFormer: The Relational Transformer for Image Captioning
ReFormer: The Relational Transformer for Image CaptioningACM Multimedia (ACM MM), 2021
Xuewen Yang
Yingru Liu
Xin Wang
ViT
268
67
0
29 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
  in Visual Question Answering
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question AnsweringACM Multimedia (ACM MM), 2021
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Jingjing Jiang
N. Zheng
OOD
290
20
0
24 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DVVLMMLLM
575
368
0
14 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
456
116
0
01 Jul 2021
LayoutGMN: Neural Graph Matching for Structural Layout Similarity
LayoutGMN: Neural Graph Matching for Structural Layout SimilarityComputer Vision and Pattern Recognition (CVPR), 2020
A. Patil
Manyi Li
Matthew Fisher
Manolis Savva
Hao Zhang
316
41
0
11 Dec 2020
DIRV: Dense Interaction Region Voting for End-to-End Human-Object
  Interaction Detection
DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction DetectionAAAI Conference on Artificial Intelligence (AAAI), 2020
Haoshu Fang
Yichen Xie
Dian Shao
Cewu Lu
347
64
0
02 Oct 2020
Dynamic Context-guided Capsule Network for Multimodal Machine
  Translation
Dynamic Context-guided Capsule Network for Multimodal Machine TranslationACM Multimedia (ACM MM), 2020
Huan Lin
Fandong Meng
Jinsong Su
Yongjing Yin
Zhengyuan Yang
Yubin Ge
Jie Zhou
Jiebo Luo
258
94
0
04 Sep 2020
HOSE-Net: Higher Order Structure Embedded Network for Scene Graph
  Generation
HOSE-Net: Higher Order Structure Embedded Network for Scene Graph GenerationACM Multimedia (ACM MM), 2020
Meng Wei
C. Yuan
Xiaoyu Yue
Kuo Zhong
371
19
0
12 Aug 2020
Improving Image Captioning with Better Use of Captions
Improving Image Captioning with Better Use of Captions
Zhan Shi
Xu Zhou
Xipeng Qiu
Xiao-Dan Zhu
209
155
0
21 Jun 2020
Non-Autoregressive Image Captioning with Counterfactuals-Critical
  Multi-Agent Learning
Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning
Longteng Guo
Jing Liu
Xinxin Zhu
Xingjian He
Jie Jiang
Hanqing Lu
BDL
201
68
0
10 May 2020
Image Captioning through Image Transformer
Image Captioning through Image TransformerAsian Conference on Computer Vision (ACCV), 2020
Sen He
Wentong Liao
Hamed R. Tavakoli
M. Yang
Bodo Rosenhahn
N. Pugeault
ViT
310
116
0
29 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching ModelComputer Vision and Pattern Recognition (CVPR), 2020
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
270
145
0
01 Apr 2020
Normalized and Geometry-Aware Self-Attention Network for Image
  Captioning
Normalized and Geometry-Aware Self-Attention Network for Image CaptioningComputer Vision and Pattern Recognition (CVPR), 2020
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
362
221
0
19 Mar 2020
1
Page 1 of 1