ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.09049
  4. Cited By
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

17 July 2020
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
    LRM
ArXivPDFHTML

Papers citing "Learning to Discretely Compose Reasoning Module Networks for Video Captioning"

21 / 21 papers shown
Title
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
44
0
0
12 Mar 2025
Sentiment-oriented Transformer-based Variational Autoencoder Network for
  Live Video Commenting
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
24
3
0
19 Apr 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context
  and Dynamics of Human Interactions Within Social Groups
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard
Zhixi Cai
Shiki Wen
Hamid Rezatofighi
26
6
0
06 Apr 2024
Video Captioning with Aggregated Features Based on Dual Graphs and Gated
  Fusion
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Bin Liu
Jing Wang
19
1
0
13 Aug 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
36
185
0
12 Jun 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
22
51
0
25 May 2023
TCR: Short Video Title Generation and Cover Selection with Attention
  Refinement
TCR: Short Video Title Generation and Cover Selection with Attention Refinement
Yu
Jiuding Yang
Weidong Guo
Hui Liu
Yu-Syuan Xu
Di Niu
17
2
0
25 Apr 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
14
20
0
22 Apr 2023
Spatial-Aware Token for Weakly Supervised Object Localization
Spatial-Aware Token for Weakly Supervised Object Localization
Ping Wu
Wei Zhai
Yang Cao
Jiebo Luo
Zhengjun Zha
WSOL
19
9
0
18 Mar 2023
Grounding 3D Object Affordance from 2D Interactions in Images
Grounding 3D Object Affordance from 2D Interactions in Images
Yuhang Yang
Wei Zhai
Hongcheng Luo
Yang Cao
Jiebo Luo
Zhengjun Zha
14
31
0
18 Mar 2023
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
8
16
0
17 Nov 2022
Robustness Analysis of Video-Language Models Against Visual and Language
  Perturbations
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Y. S. Rawat
Vibhav Vineet
VLM
101
17
0
05 Jul 2022
Support-set based Multi-modal Representation Enhancement for Video
  Captioning
Support-set based Multi-modal Representation Enhancement for Video Captioning
Xiaoya Chen
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Hengtao Shen
14
4
0
19 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from
  Weakly-labeled Videos
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
13
2
0
28 Apr 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the route
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
16
11
0
12 Apr 2022
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
  Embedding Matching
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
25
33
0
17 Nov 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning
Visual-aware Attention Dual-stream Decoder for Video Captioning
Zhixin Sun
X. Zhong
Shuqin Chen
Lin Li
Luo Zhong
10
3
0
16 Oct 2021
Discriminative Latent Semantic Graph for Video Captioning
Discriminative Latent Semantic Graph for Video Captioning
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
M. Pagnucco
Yu Guan
36
31
0
08 Aug 2021
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
12
3
0
18 Nov 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
19
27
0
08 Oct 2020
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
60
158
0
27 Aug 2019
1