Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

20 December 2014

Yi Yang

Papers citing "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"

50 / 417 papers shown

Title
Automatic Rule Induction for Interpretable Semi-Supervised Learning Reid Pryzant Ziyi Yang Yichong Xu Chenguang Zhu Michael Zeng 28 9 0 18 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning Chia-Wen Kuo Z. Kira 6 52 0 09 May 2022
Diverse Image Captioning with Grounded Style Franz Klein Shweta Mahajan S. Roth 14 7 0 03 May 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks Zhaowei Cai Gukyeong Kwon Avinash Ravichandran Erhan Bas Z. Tu Rahul Bhotika Stefano Soatto ObjD MLLM VLM 17 49 0 12 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting Jiuniu Wang Wenjia Xu Qingzhong Wang Antoni B. Chan 30 16 0 08 Apr 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition Peipei Zhu Xiao Wang Yong Luo Zhenglong Sun Wei-Shi Zheng Yaowei Wang C. L. P. Chen 22 12 0 07 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models Feng Li Hao Zhang Yi-Fan Zhang S. Liu Jian Guo L. Ni Pengchuan Zhang Lei Zhang AI4TS VLM 16 36 0 03 Mar 2022
Inference of captions from histopathological patches M. Tsuneki F. Kanavati 8 29 0 07 Feb 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators Lois Orosa Skanda Koppula Yaman Umuroglu Konstantinos Kanellopoulos Juan Gómez Luna Michaela Blott K. Vissers O. Mutlu 26 4 0 04 Feb 2022
Multi-Label Classification on Remote-Sensing Images A. Singh B. Uma Shankar 6 0 0 06 Jan 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Yoad Tewel Yoav Shalev Idan Schwartz Lior Wolf VLM 32 192 0 29 Nov 2021
Contrastive Learning of Visual-Semantic Embeddings Anurag Jain Yashaswi Verma SSL 25 1 0 17 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning Chi-Yin Wang Yulin Shen Luping Ji ViT 39 49 0 01 Oct 2021
Cross Modification Attention Based Deliberation Model for Image Captioning Zheng Lian Yanan Zhang Haichang Li Rui Wang Xiaohui Hu 19 4 0 17 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Katsuyuki Nakamura Hiroki Ohashi Mitsuhiro Okada EgoV 31 12 0 07 Sep 2021
Group-based Distinctive Image Captioning with Memory Attention Jiuniu Wang Wenjia Xu Qingzhong Wang Antoni B. Chan 6 18 0 20 Aug 2021
Caption Generation on Scenes with Seen and Unseen Object Categories B. Demirel R. G. Cinbis VLM 15 1 0 13 Aug 2021
A Better Loss for Visual-Textual Grounding Davide Rigoni Luciano Serafini A. Sperduti ObjD 17 3 0 11 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning Bryan Wang Gang Li Xin Zhou Zhourong Chen Tovi Grossman Yang Li 162 152 0 07 Aug 2021
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval Xuri Ge Fuhai Chen J. Jose Zhilong Ji Zhongqin Wu Xiao-Chang Liu 18 53 0 05 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 53 254 0 14 Jul 2021
A comparison of LSTM and GRU networks for learning symbolic sequences Roberto Cahuantzi Xinye Chen S. Güttel 11 135 0 05 Jul 2021
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words Chuan Tang Xi Yang Bojian Wu Zhizhong Han Yi Chang 3DPC 28 13 0 05 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions Motonari Kambara K. Sugiura ViT 11 6 0 02 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation Jing Liu Xinxin Zhu Fei Liu Longteng Guo Zijia Zhao ... Weining Wang Hanqing Lu Shiyu Zhou Jiajun Zhang Jinqiao Wang 23 36 0 01 Jul 2021
New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching Chang-Hwan Son Pung-Hwi Ye 15 3 0 28 May 2021
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation Xingyi Yang Muchao Ye Quanzeng You Fenglong Ma MedIm 8 37 0 25 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval K. Ueki 13 3 0 16 May 2021
End-to-End Attention-based Image Captioning Carola Sundaramoorthy Lin Ziwen Kelvin Mahak Sarin Shubham Gupta ViT 17 6 0 30 Apr 2021
Multi-view Deep One-class Classification: A Systematic Exploration Siqi Wang Jiyuan Liu Guang Yu Xinwang Liu Sihang Zhou En Zhu Yuexiang Yang Jianping Yin 11 1 0 27 Apr 2021
Towards Open-World Text-Guided Face Image Generation and Manipulation Weihao Xia Yujiu Yang Jing-Hao Xue Baoyuan Wu DiffM 33 39 0 18 Apr 2021
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval Wei-Neng Chen Yu Liu E. Bakker M. Lew GAN 11 27 0 11 Apr 2021
A Comprehensive Review of the Video-to-Text Problem Jesus Perez-Martin B. Bustos S. Guimarães I. Sipiran Jorge A. Pérez Grethel Coello Said 13 17 0 27 Mar 2021
Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining Jieneng Chen K. Yan Yu-Dong Zhang Youbao Tang Xun Xu ... Lingyun Huang Jing Xiao Alan Yuille Ya-Qin Zhang Le Lu 6 2 0 09 Mar 2021
Analysis of Convolutional Decoder for Image Caption Generation Sulabh Katiyar S. Borgohain 13 0 0 08 Mar 2021
A Universal Model for Cross Modality Mapping by Relational Reasoning Zun Li Congyan Lang Liqian Liang Tao Wang Songhe Feng Jun Wu Yidong Li 11 2 0 26 Feb 2021
Comparative evaluation of CNN architectures for Image Caption Generation Sulabh Katiyar S. Borgohain 8 24 0 23 Feb 2021
Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation Sulabh Katiyar S. Borgohain VLM 8 14 0 22 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David F. Harwath Christopher Song James R. Glass CLIP 27 66 0 31 Dec 2020
SubICap: Towards Subword-informed Image Captioning Naeha Sharif Bennamoun Wei Liu Syed Afaq Ali Shah 20 2 0 24 Dec 2020
AutoCaption: Image Captioning with Neural Architecture Search Xinxin Zhu Weining Wang Longteng Guo Jing Liu 16 9 0 16 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval Andrés Mafla Rafael Sampaio de Rezende Lluís Gómez Diane Larlus Dimosthenis Karatzas 3DV 37 14 0 08 Dec 2020
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation Weihao Xia Yujiu Yang Jing-Hao Xue Baoyuan Wu DiffM 38 23 0 06 Dec 2020
Robust Image Captioning Daniel Yarnell Xian Wang 11 0 0 06 Dec 2020
Understanding Guided Image Captioning Performance across Domains Edwin G. Ng Bo Pang P. Sharma Radu Soricut 21 24 0 04 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling Jing Su Qingyun Dai Frank Guerin Mian Zhou 16 24 0 03 Dec 2020
Diverse Image Captioning with Context-Object Split Latent Spaces Shweta Mahajan Stefan Roth 11 41 0 02 Nov 2020
Personalized Multimodal Feedback Generation in Education Haochen Liu Zitao Liu Zhongqin Wu Jiliang Tang 24 9 0 31 Oct 2020
DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in the Conversation Yuzhao Mao Qi Sun Guang Liu Xiaojie Wang Weiguo Gao Xuan Li Jianping Shen 19 24 0 15 Oct 2020
Spatial Attention as an Interface for Image Captioning Models P. Sadler 12 0 0 29 Sep 2020