ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05396
  4. Cited By
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

11 September 2023
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
ArXivPDFHTML

Papers citing "SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus"

6 / 6 papers shown
Title
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
H. Wang
Kai Hu
Liangcai Gao
129
0
0
20 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
34
1
0
13 Sep 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
36
2
0
09 Jun 2024
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
224
0
12 Feb 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,233
0
14 Jun 2018
1