ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.10079
  4. Cited By
Localizing Events in Videos with Multimodal Queries

Localizing Events in Videos with Multimodal Queries

14 June 2024
Gengyuan Zhang
Mang Ling Ada Fok
Yan Xia
Yansong Tang
Daniel Cremers
Philip H. S. Torr
Volker Tresp
Jindong Gu
ArXivPDFHTML

Papers citing "Localizing Events in Videos with Multimodal Queries"

9 / 9 papers shown
Title
Glance and Focus: Memory Prompting for Multi-Event Video Question
  Answering
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Ziyi Bai
Ruiping Wang
Xilin Chen
85
8
0
03 Jan 2024
MH-DETR: Video Moment and Highlight Detection with Cross-modal
  Transformer
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer
Yifang Xu
Yunzhuo Sun
Yang Li
Yilei Shi
Xiaoxia Zhu
S. Du
ViT
35
33
0
29 Apr 2023
Query-Dependent Video Representation for Moment Retrieval and Highlight
  Detection
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
WonJun Moon
Sangeek Hyun
S. Park
Dongchan Park
Jae-Pil Heo
ViT
41
104
0
24 Mar 2023
Learning Grounded Vision-Language Representation for Versatile
  Understanding in Untrimmed Videos
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
26
10
0
11 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video
  Temporal Grounding
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
27
23
0
22 Sep 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
224
1,017
0
13 Oct 2021
Zero-shot Natural Language Video Localization
Zero-shot Natural Language Video Localization
Jinwoo Nam
Daechul Ahn
Dongyeop Kang
S. Ha
Jonghyun Choi
87
43
0
29 Aug 2021
Natural Language Video Localization: A Revisit in Span-based Question
  Answering Framework
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Joey Tianyi Zhou
Rick Siow Mong Goh
111
84
0
26 Feb 2021
1