ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.08299
  4. Cited By
DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style
  Word Generator

DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator

1 April 2020
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Doo Soon Kim
Trung Bui
Kyomin Jung
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator"

11 / 11 papers shown
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
HEAR: Hearing Enhanced Audio Response for Video-grounded DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sunjae Yoon
Dahyun Kim
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
C. Yoo
475
13
0
15 Dec 2023
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Haoyu Zhang
Meng Liu
Yaowei Wang
Da Cao
Weili Guan
Liqiang Nie
447
1
0
11 Oct 2023
Information-Theoretic Text Hallucination Reduction for Video-grounded
  Dialogue
Information-Theoretic Text Hallucination Reduction for Video-grounded DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Sunjae Yoon
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
Changdong Yoo
213
27
0
12 Dec 2022
End-to-End Multimodal Representation Learning for Video Dialog
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
251
7
0
26 Oct 2022
Video Dialog as Conversation about Objects Living in Space-Time
Video Dialog as Conversation about Objects Living in Space-TimeEuropean Conference on Computer Vision (ECCV), 2022
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
261
14
0
08 Jul 2022
$C^3$: Compositional Counterfactual Contrastive Learning for
  Video-grounded Dialogues
C3C^3C3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
189
2
0
16 Jun 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
334
21
0
16 Apr 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Structured Co-reference Graph Attention for Video-grounded DialogueAAAI Conference on Artificial Intelligence (AAAI), 2021
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
239
31
0
24 Mar 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded
  Dialogues
Learning Reasoning Paths over Semantic Graphs for Video-grounded DialoguesInternational Conference on Learning Representations (ICLR), 2021
Hung Le
Nancy F. Chen
Guosheng Lin
261
18
0
01 Mar 2021
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized UtterancesComputer Vision and Pattern Recognition (CVPR), 2020
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
393
71
0
10 Dec 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal
  Sequence Representations in Audio Visual Scene-aware Dialog
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware DialogInterspeech (Interspeech), 2020
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
210
6
0
21 Oct 2020
1
Page 1 of 1