Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1707.07102
Cited By
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
22 July 2017
Xuwang Yin
Vicente Ordonez
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts"
30 / 30 papers shown
Title
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
91
4
0
18 Apr 2024
Generalized Decoding for Pixel, Image, and Language
Computer Vision and Pattern Recognition (CVPR), 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
252
322
0
21 Dec 2022
Harnessing Knowledge and Reasoning for Human-Like Natural Language Generation: A Brief Review
IEEE Data Engineering Bulletin (DEB), 2022
Jiangjie Chen
Yanghua Xiao
200
5
0
07 Dec 2022
Caption Generation on Scenes with Seen and Unseen Object Categories
Image and Vision Computing (IVC), 2021
B. Demirel
R. G. Cinbis
VLM
226
2
0
13 Aug 2021
ReFormer: The Relational Transformer for Image Captioning
ACM Multimedia (ACM MM), 2021
Xuewen Yang
Yingru Liu
Xin Wang
ViT
179
62
0
29 Jul 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Computer Vision and Pattern Recognition (CVPR), 2021
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
184
73
0
28 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
119
36
0
05 Jan 2021
Language-Mediated, Object-Centric Representation Learning
Findings (Findings), 2020
Ruocheng Wang
Jiayuan Mao
S. Gershman
Jiajun Wu
203
13
0
31 Dec 2020
Trying Bilinear Pooling in Video-QA
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
155
4
0
18 Dec 2020
On Modality Bias in the TVQA Dataset
British Machine Vision Conference (BMVC), 2020
T. Winterbottom
S. Xiao
A. McLean
Noura Al Moubayed
153
44
0
18 Dec 2020
Image Captioning with Visual Object Representations Grounded in the Textual Modality
Duvsan Varivs
Katsuhito Sudoh
Satoshi Nakamura
140
1
0
19 Oct 2020
Interpretable Neural Computation for Real-World Compositional Visual Question Answering
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2020
Ruixue Tang
Chao Ma
CoGe
70
2
0
10 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Findings (Findings), 2020
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
134
11
0
05 Oct 2020
DRG: Dual Relation Graph for Human-Object Interaction Detection
European Conference on Computer Vision (ECCV), 2020
Chen Gao
Jiarui Xu
Yuliang Zou
Jia-Bin Huang
215
228
0
26 Aug 2020
Comprehensive Image Captioning via Scene Graph Decomposition
European Conference on Computer Vision (ECCV), 2020
Yiwu Zhong
Liwei Wang
Jianshu Chen
Dong Yu
Yin Li
191
136
0
23 Jul 2020
Incorporating Textual Evidence in Visual Storytelling
Tianyi Li
Sujian Li
DiffM
84
3
0
21 Nov 2019
ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences
ACM Multimedia (ACM MM), 2019
Zhizhong Han
Chao Chen
Yu-Shen Liu
Matthias Zwicker
3DPC
165
49
0
31 Jul 2019
Image Captioning with Unseen Objects
British Machine Vision Conference (BMVC), 2019
B. Demirel
R. G. Cinbis
Nazli Ikizler-Cinbis
VLM
199
17
0
31 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Journal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
332
141
0
22 Jul 2019
VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Pranava Madhyastha
Josiah Wang
Lucia Specia
130
38
0
22 Jul 2019
Video Question Generation via Cross-Modal Self-Attention Networks Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Yu-Siang Wang
Hung-Ting Su
Chen-Hsi Chang
Zhe-Yu Liu
Winston H. Hsu
135
12
0
05 Jul 2019
Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering
IEEE International Joint Conference on Neural Network (IJCNN), 2019
Junyeong Kim
Minuk Ma
Kyungsu Kim
Sungjin Kim
Chang D. Yoo
120
27
0
28 May 2019
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum
Abhinav Shrivastava
Vlad I. Morariu
L. Davis
92
5
0
08 Apr 2019
Chat-crowd: A Dialog-based Platform for Visual Layout Composition
Paola Cascante-Bonilla
Xuwang Yin
Vicente Ordonez
Song Feng
149
8
0
10 Dec 2018
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
378
249
0
05 Dec 2018
End-to-end Image Captioning Exploits Multimodal Distributional Similarity
Pranava Madhyastha
Josiah Wang
Lucia Specia
CoGe
148
7
0
11 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
372
710
0
05 Sep 2018
Text2Scene: Generating Compositional Scenes from Textual Descriptions
Fuwen Tan
Song Feng
Vicente Ordonez
200
18
0
04 Sep 2018
Object Counts! Bringing Explicit Detections Back into Image Captioning
North American Chapter of the Association for Computational Linguistics (NAACL), 2018
Josiah Wang
Pranava Madhyastha
Lucia Specia
ObjD
116
38
0
23 Apr 2018
Neural Motifs: Scene Graph Parsing with Global Context
Rowan Zellers
Mark Yatskar
Sam Thomson
Yejin Choi
GNN
240
1,080
0
17 Nov 2017
1