Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.01657
Cited By
Cross-modal Representation Learning for Zero-shot Action Recognition
3 May 2022
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cross-modal Representation Learning for Zero-shot Action Recognition"
23 / 23 papers shown
Title
ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
57
0
0
23 Nov 2024
Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification
Tengfei Liu
Yongli Hu
Junbin Gao
Yanfeng Sun
Baocai Yin
26
0
0
14 Jul 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
16
8
0
22 Jan 2024
Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data
Jinjin Zhu
Yucheng Chen
Lin Wang
25
2
0
10 Jan 2024
TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity Recognition
Yunjiao Zhou
Jianfei Yang
Han Zou
Lihua Xie
VLM
21
16
0
14 Nov 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
28
21
0
08 Oct 2023
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
Yan Zhu
Junbao Zhuo
B. Ma
Jiajia Geng
Xiaoming Wei
Xiaolin K. Wei
Shuhui Wang
VLM
22
5
0
14 Aug 2023
Synthetic Sample Selection for Generalized Zero-Shot Learning
Shreyank N. Gowda
11
16
0
06 Apr 2023
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
21
11
0
22 Mar 2023
Improving Zero-Shot Action Recognition using Human Instruction with Text Description
Na Wu
Hiroshi Kera
K. Kawamoto
13
7
0
21 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
94
47
0
31 Dec 2022
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
14
4
0
29 Sep 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Saeed Mian
ViT
19
43
0
13 Sep 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
19
25
0
20 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming Yang
Serge J. Belongie
Yin Cui
VLM
30
22
0
15 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
Universal Prototype Transport for Zero-Shot Action Recognition and Localization
Pascal Mettes
14
5
0
08 Mar 2022
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
35
492
0
22 Dec 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
410
594
0
21 Jul 2020
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Biagio Brattoli
Joseph Tighe
Fedor Zhdanov
Pietro Perona
Krzysztof Chalupka
VLM
129
127
0
03 Mar 2020
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,150
0
16 Jan 2013
1