ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04501
  4. Cited By
Learning Video Representations from Large Language Models

Learning Video Representations from Large Language Models

8 December 2022
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
    VLM
    AI4TS
ArXivPDFHTML

Papers citing "Learning Video Representations from Large Language Models"

31 / 31 papers shown
Title
Object-Shot Enhanced Grounding Network for Egocentric Video
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
36
0
0
07 May 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
34
0
0
02 Apr 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
76
0
0
17 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
74
0
0
08 Mar 2025
LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces
LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces
Anxhelo Diko
Antonino Furnari
Luigi Cinque
G. Farinella
88
0
0
23 Nov 2024
Mamba Fusion: Learning Actions Through Questioning
Mamba Fusion: Learning Actions Through Questioning
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
60
2
0
17 Sep 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
53
0
0
28 Jul 2024
Can Large Language Models Understand Spatial Audio?
Can Large Language Models Understand Spatial Audio?
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
...
Jun Zhang
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
44
4
0
12 Jun 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
31
3
0
15 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
29
1
0
28 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
54
35
0
24 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
RobWE: Robust Watermark Embedding for Personalized Federated Learning
  Model Ownership Protection
RobWE: Robust Watermark Embedding for Personalized Federated Learning Model Ownership Protection
Yang Xu
Yunlin Tan
Cheng Zhang
Kai Chi
Peng Sun
Wenyuan Yang
Ju Ren
Hongbo Jiang
Yaoxue Zhang
FedML
43
2
0
29 Feb 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
27
44
0
20 Feb 2024
Enhancing Zero-shot Counting via Language-guided Exemplar Learning
Enhancing Zero-shot Counting via Language-guided Exemplar Learning
Mingjie Wang
Jun Zhou
Yong Dai
Eric Buys
Minglun Gong
31
0
0
08 Feb 2024
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
24
5
0
21 Dec 2023
LEAP: LLM-Generation of Egocentric Action Programs
LEAP: LLM-Generation of Egocentric Action Programs
Eadom Dessalene
Michael Maynord
Cornelia Fermuller
Yiannis Aloimonos
16
3
0
29 Nov 2023
GG-LLM: Geometrically Grounding Large Language Models for Zero-shot
  Human Activity Forecasting in Human-Aware Task Planning
GG-LLM: Geometrically Grounding Large Language Models for Zero-shot Human Activity Forecasting in Human-Aware Task Planning
Moritz Graule
Volkan Isler
LM&Ro
19
14
0
30 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
35
199
0
20 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
28
12
0
09 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily
  Behavior Recognition in Group Settings
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings
Surbhi Madan
Rishabh Jain
Gulshan Sharma
Ramanathan Subramanian
Abhinav Dhall
25
2
0
19 Sep 2023
Language-based Action Concept Spaces Improve Video Self-Supervised
  Learning
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
20
12
0
20 Jul 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
22
19
0
05 Apr 2023
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
225
0
20 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
224
1,017
0
13 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
557
0
28 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,978
0
09 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
1