Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.14104
Cited By
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
26 March 2022
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos"
16 / 16 papers shown
Title
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
Daniel Sliwowski
Dongheui Lee
19
1
0
25 Apr 2025
LEAP: LLM-Generation of Egocentric Action Programs
Eadom Dessalene
Michael Maynord
Cornelia Fermuller
Yiannis Aloimonos
16
3
0
29 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
16
1
0
28 Nov 2023
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation
Zijia Lu
Ehsan Elhamifar
38
2
0
28 Aug 2023
Enhancing Transformer Backbone for Egocentric Video Action Segmentation
Sakib Reza
Balaji Sundareshan
Mohsen Moghaddam
Octavia Camps
ViT
16
4
0
19 May 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
23
38
0
31 Mar 2023
SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency
Cong Wang
Jin-shan Pan
Wanyu Lin
Jiangxin Dong
Xiaomei Wu
VLM
MDE
26
39
0
13 Mar 2023
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Muhammad Gul Zain Ali Khan
Yongqin Xian
Muhammad Zeshan Afzal
D. Stricker
Luc Van Gool
F. Tombari
VLM
22
51
0
05 Dec 2022
ASFormer: Transformer for Action Segmentation
Fangqiu Yi
Hongyu Wen
Tingting Jiang
ViT
69
168
0
16 Oct 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
360
0
17 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Video Summarization Using Deep Neural Networks: A Survey
Evlampios Apostolidis
E. Adamantidou
Alexandros I. Metsai
Vasileios Mezaris
Ioannis Patras
AI4TS
55
196
0
15 Jan 2021
Global2Local: Efficient Structure Search for Video Action Segmentation
Shanghua Gao
Qi Han
Zhong-Yu Li
Pai Peng
Liang Wang
Ming-Ming Cheng
EgoV
81
72
0
04 Jan 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
401
594
0
21 Jul 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
256
1,584
0
21 Jan 2020
1