Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1412.6632
Cited By
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
20 December 2014
Junhua Mao
W. Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"
50 / 417 papers shown
Title
Automatic Rule Induction for Interpretable Semi-Supervised Learning
Reid Pryzant
Ziyi Yang
Yichong Xu
Chenguang Zhu
Michael Zeng
28
9
0
18 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Chia-Wen Kuo
Z. Kira
6
52
0
09 May 2022
Diverse Image Captioning with Grounded Style
Franz Klein
Shweta Mahajan
S. Roth
14
7
0
03 May 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai
Gukyeong Kwon
Avinash Ravichandran
Erhan Bas
Z. Tu
Rahul Bhotika
Stefano Soatto
ObjD
MLLM
VLM
17
49
0
12 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
30
16
0
08 Apr 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Xiao Wang
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
C. L. P. Chen
22
12
0
07 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
S. Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
16
36
0
03 Mar 2022
Inference of captions from histopathological patches
M. Tsuneki
F. Kanavati
8
29
0
07 Feb 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
26
4
0
04 Feb 2022
Multi-Label Classification on Remote-Sensing Images
A. Singh
B. Uma Shankar
6
0
0
06 Jan 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
32
192
0
29 Nov 2021
Contrastive Learning of Visual-Semantic Embeddings
Anurag Jain
Yashaswi Verma
SSL
25
1
0
17 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning
Chi-Yin Wang
Yulin Shen
Luping Ji
ViT
39
49
0
01 Oct 2021
Cross Modification Attention Based Deliberation Model for Image Captioning
Zheng Lian
Yanan Zhang
Haichang Li
Rui Wang
Xiaohui Hu
19
4
0
17 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
6
18
0
20 Aug 2021
Caption Generation on Scenes with Seen and Unseen Object Categories
B. Demirel
R. G. Cinbis
VLM
15
1
0
13 Aug 2021
A Better Loss for Visual-Textual Grounding
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
17
3
0
11 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
162
152
0
07 Aug 2021
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval
Xuri Ge
Fuhai Chen
J. Jose
Zhilong Ji
Zhongqin Wu
Xiao-Chang Liu
18
53
0
05 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
53
254
0
14 Jul 2021
A comparison of LSTM and GRU networks for learning symbolic sequences
Roberto Cahuantzi
Xinye Chen
S. Güttel
11
135
0
05 Jul 2021
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words
Chuan Tang
Xi Yang
Bojian Wu
Zhizhong Han
Yi Chang
3DPC
28
13
0
05 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Motonari Kambara
K. Sugiura
ViT
11
6
0
02 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
23
36
0
01 Jul 2021
New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching
Chang-Hwan Son
Pung-Hwi Ye
15
3
0
28 May 2021
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation
Xingyi Yang
Muchao Ye
Quanzeng You
Fenglong Ma
MedIm
8
37
0
25 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
13
3
0
16 May 2021
End-to-End Attention-based Image Captioning
Carola Sundaramoorthy
Lin Ziwen Kelvin
Mahak Sarin
Shubham Gupta
ViT
17
6
0
30 Apr 2021
Multi-view Deep One-class Classification: A Systematic Exploration
Siqi Wang
Jiyuan Liu
Guang Yu
Xinwang Liu
Sihang Zhou
En Zhu
Yuexiang Yang
Jianping Yin
11
1
0
27 Apr 2021
Towards Open-World Text-Guided Face Image Generation and Manipulation
Weihao Xia
Yujiu Yang
Jing-Hao Xue
Baoyuan Wu
DiffM
33
39
0
18 Apr 2021
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Wei-Neng Chen
Yu Liu
E. Bakker
M. Lew
GAN
11
27
0
11 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining
Jieneng Chen
K. Yan
Yu-Dong Zhang
Youbao Tang
Xun Xu
...
Lingyun Huang
Jing Xiao
Alan Yuille
Ya-Qin Zhang
Le Lu
6
2
0
09 Mar 2021
Analysis of Convolutional Decoder for Image Caption Generation
Sulabh Katiyar
S. Borgohain
13
0
0
08 Mar 2021
A Universal Model for Cross Modality Mapping by Relational Reasoning
Zun Li
Congyan Lang
Liqian Liang
Tao Wang
Songhe Feng
Jun Wu
Yidong Li
11
2
0
26 Feb 2021
Comparative evaluation of CNN architectures for Image Caption Generation
Sulabh Katiyar
S. Borgohain
8
24
0
23 Feb 2021
Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation
Sulabh Katiyar
S. Borgohain
VLM
8
14
0
22 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David F. Harwath
Christopher Song
James R. Glass
CLIP
27
66
0
31 Dec 2020
SubICap: Towards Subword-informed Image Captioning
Naeha Sharif
Bennamoun
Wei Liu
Syed Afaq Ali Shah
20
2
0
24 Dec 2020
AutoCaption: Image Captioning with Neural Architecture Search
Xinxin Zhu
Weining Wang
Longteng Guo
Jing Liu
16
9
0
16 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
37
14
0
08 Dec 2020
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
Weihao Xia
Yujiu Yang
Jing-Hao Xue
Baoyuan Wu
DiffM
38
23
0
06 Dec 2020
Robust Image Captioning
Daniel Yarnell
Xian Wang
11
0
0
06 Dec 2020
Understanding Guided Image Captioning Performance across Domains
Edwin G. Ng
Bo Pang
P. Sharma
Radu Soricut
21
24
0
04 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
Jing Su
Qingyun Dai
Frank Guerin
Mian Zhou
16
24
0
03 Dec 2020
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan
Stefan Roth
11
41
0
02 Nov 2020
Personalized Multimodal Feedback Generation in Education
Haochen Liu
Zitao Liu
Zhongqin Wu
Jiliang Tang
24
9
0
31 Oct 2020
DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in the Conversation
Yuzhao Mao
Qi Sun
Guang Liu
Xiaojie Wang
Weiguo Gao
Xuan Li
Jianping Shen
19
24
0
15 Oct 2020
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
12
0
0
29 Sep 2020
Previous
1
2
3
4
5
6
7
8
9
Next