ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.09049
  4. Cited By
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

International Joint Conference on Artificial Intelligence (IJCAI), 2020
17 July 2020
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
    LRM
ArXiv (abs)PDFHTMLGithub (79★)

Papers citing "Learning to Discretely Compose Reasoning Module Networks for Video Captioning"

21 / 21 papers shown
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
336
3
0
12 Mar 2025
LoTLIP: Improving Language-Image Pre-training for Long Text
  Understanding
LoTLIP: Improving Language-Image Pre-training for Long Text UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Wei Wu
Kecheng Zheng
Shuailei Ma
Fan Lu
Yuxin Guo
Yifei Zhang
Wei Chen
Qingpei Guo
Yujun Shen
Zheng-Jun Zha
VLM
533
28
0
07 Oct 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for
  Live Video Commenting
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViTVGen
210
9
0
19 Apr 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context
  and Dynamics of Human Interactions Within Social Groups
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard
Zhixi Cai
Shiki Wen
Hamid Rezatofighi
209
19
0
06 Apr 2024
Video Captioning with Aggregated Features Based on Dual Graphs and Gated
  Fusion
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Yinan Han
Jing Wang
196
2
0
13 Aug 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
712
262
0
12 Jun 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
385
78
0
25 May 2023
TCR: Short Video Title Generation and Cover Selection with Attention
  Refinement
TCR: Short Video Title Generation and Cover Selection with Attention RefinementPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2023
Yu
Jiuding Yang
Weidong Guo
Hui Liu
Yu-Syuan Xu
Di Niu
177
5
0
25 Apr 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Xiaoshi Zhong
Fatih Porikli
3DV
273
48
0
22 Apr 2023
Spatial-Aware Token for Weakly Supervised Object Localization
Spatial-Aware Token for Weakly Supervised Object LocalizationIEEE International Conference on Computer Vision (ICCV), 2023
Ping Wu
Wei Zhai
Yang Cao
Jiebo Luo
Zhengjun Zha
WSOL
356
17
0
18 Mar 2023
Grounding 3D Object Affordance from 2D Interactions in Images
Grounding 3D Object Affordance from 2D Interactions in ImagesIEEE International Conference on Computer Vision (ICCV), 2023
Yuhang Yang
Wei Zhai
Hongcheng Luo
Yang Cao
Jiebo Luo
Zhengjun Zha
378
67
0
18 Mar 2023
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video CaptioningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
195
25
0
17 Nov 2022
Robustness Analysis of Video-Language Models Against Visual and Language
  Perturbations
Robustness Analysis of Video-Language Models Against Visual and Language PerturbationsNeural Information Processing Systems (NeurIPS), 2022
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Yogesh S Rawat
Vibhav Vineet
VLM
657
32
0
05 Jul 2022
Support-set based Multi-modal Representation Enhancement for Video
  Captioning
Support-set based Multi-modal Representation Enhancement for Video CaptioningIEEE International Conference on Multimedia and Expo (ICME), 2022
Xiaoya Chen
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Hengtao Shen
155
5
0
19 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from
  Weakly-labeled Videos
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
184
2
0
28 Apr 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the routeComputer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
291
17
0
12 Apr 2022
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
  Embedding Matching
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
298
44
0
17 Nov 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning
Visual-aware Attention Dual-stream Decoder for Video Captioning
Zhixin Sun
Zhuo Zhou
Shuqin Chen
Lin Li
Luo Zhong
226
4
0
16 Oct 2021
Discriminative Latent Semantic Graph for Video Captioning
Discriminative Latent Semantic Graph for Video CaptioningACM Multimedia (ACM MM), 2021
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
315
33
0
08 Aug 2021
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
259
3
0
18 Nov 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
397
39
0
08 Oct 2020
1
Page 1 of 1