MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank FusionAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

Saron Samuel

Dan DeGenaro

Jimena Guallar-Blasco

Kate Sanders

...

518

26 Mar 2025

4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding

...

362

22 Mar 2025

VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning

1.1K

17 Mar 2025

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding

448

13 Mar 2025

Towards Fine-Grained Video Question Answering

329

10 Mar 2025

Towards Data-Efficient Language Models: A Child-Inspired Approach to Language Learning

Mohammad Amin Ghanizadeh

Mohammad Javad Dousti

235

06 Mar 2025

MUSE: Mamba is Efficient Multi-scale Learner for Text-video RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2024

417

24 Feb 2025

LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection

413

18 Jan 2025

Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024

626

10 Jan 2025

Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT

Wen-Dong Jiang

Chih-Yung Chang

Diptendu Sinha Roy

638

07 Jan 2025

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2024

355

18 Dec 2024

Do Language Models Understand Time?The Web Conference (WWW), 2024

Xi Ding

Lei Wang

1.0K

18 Dec 2024

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval

406

02 Dec 2024

Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and GroundingIEEE transactions on multimedia (IEEE TMM), 2024

294

26 Nov 2024

Grounded Video Caption Generation

Evangelos Kazakos

Cordelia Schmid

Josef Sivic

328

12 Nov 2024

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text UnderstandingACM Multimedia (MM), 2024

262

17 Oct 2024

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video RetrievalComputer Vision and Pattern Recognition (CVPR), 2024

...

599

15 Oct 2024

Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI TechnologiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

304

11 Oct 2024

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video UnderstandingNeural Information Processing Systems (NeurIPS), 2024

388

11 Oct 2024

Realizing Video Summarization from the Path of Language-based Semantic Understanding

Kuan-Chen Mu

Zhi-Yi Chin

Wei-Chen Chiu

216

06 Oct 2024

Language-based Audio Moment RetrievalIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

607

24 Sep 2024

Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Haojian Huang

Haodong Chen

...

284

29 Aug 2024

QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

192

23 Aug 2024

Disentangle and denoise: Tackling context misalignment for video moment retrieval

Yongxiang Li

272

14 Aug 2024

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding

Yubin Wang

279

13 Aug 2024

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and SynopsesACM Multimedia (MM), 2024

478

03 Aug 2024