ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.01670
  4. Cited By
Egocentric Video-Language Pretraining

Egocentric Video-Language Pretraining

3 June 2022
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
Eric Z. Xu
Difei Gao
Rong-Cheng Tu
Wenzhe Zhao
Weijie Kong
Chengfei Cai
Hongfa Wang
Dima Damen
Bernard Ghanem
Wei Liu
Mike Zheng Shou
    VLM
    EgoV
ArXivPDFHTML

Papers citing "Egocentric Video-Language Pretraining"

37 / 37 papers shown
Title
Object-Shot Enhanced Grounding Network for Egocentric Video
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
36
0
0
07 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
62
0
0
06 May 2025
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Congqi Cao
Lanshu Hu
Yating Yu
Y. Zhang
VLM
73
0
0
03 May 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
76
0
0
17 Mar 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei
Y. Huang
Jilan Xu
Guo Chen
Yuping He
...
Yali Wang
Weidi Xie
Yu Qiao
Fei Wu
Limin Wang
41
0
0
02 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
79
2
0
10 Jan 2025
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
94
5
0
05 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Naresh Boddeti
Du Tran
VLM
72
0
0
02 Dec 2024
LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces
LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces
Anxhelo Diko
Antonino Furnari
Luigi Cinque
G. Farinella
88
0
0
23 Nov 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
42
2
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
53
0
0
28 Jul 2024
CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation
CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation
Yuejiao Su
Yi Wang
Lap-Pui Chau
57
1
0
08 Jul 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
31
3
0
15 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
39
1
0
01 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
29
1
0
28 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
54
35
0
24 Mar 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
48
35
0
16 Jan 2024
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
26
3
0
11 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
40
1
0
30 Nov 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
18
9
0
05 Sep 2023
ARGUS: Visualization of AI-Assisted Task Guidance in AR
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Sonia Castelo
Joao Rulff
Erin McGowan
Bea Steers
Guande Wu
...
Qinghong Sun
Huy Q. Vo
J. P. Bello
M. Krone
Claudio Silva
29
18
0
11 Aug 2023
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to
  the Ego4D Moment Queries Challenge 2023
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023
Lin Sui
Fangzhou Mu
Yin Li
20
2
0
05 Jul 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
14
30
0
08 Jun 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
23
38
0
31 Mar 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
8
7
0
16 Feb 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
77
39
0
02 Jan 2023
Egocentric Video Task Translation
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
21
13
0
13 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
32
16
0
08 Dec 2022
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D
  Moment Queries Challenge
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge
Fangzhou Mu
Sicheng Mo
Gillian Wang
Yin Li
10
3
0
16 Nov 2022
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Bernard Ghanem
Wei Liu
Mike Zheng Shou
EgoV
16
7
0
04 Jul 2022
The Metaverse Data Deluge: What Can We Do About It?
The Metaverse Data Deluge: What Can We Do About It?
Beng Chin Ooi
Gang Chen
Mike Zheng Shou
K. Tan
A. Tung
X. Xiao
J. Yip
Meihui Zhang
11
10
0
14 Jun 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
27
27
0
08 Mar 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
224
1,017
0
13 Oct 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,978
0
09 Feb 2021
1