Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2304.06708
Cited By
Verbs in Action: Improving verb understanding in video-language models
IEEE International Conference on Computer Vision (ICCV), 2023
13 April 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Verbs in Action: Improving verb understanding in video-language models"
50 / 59 papers shown
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
182
1
0
24 Nov 2025
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Piyush Bagad
Andrew Zisserman
AI4TS
252
4
0
10 Sep 2025
Punching Bag vs. Punching Person: Motion Transferability in Videos
Raiyaan Abdullah
Jared Claypoole
Michael Cogswell
Ajay Divakaran
Yogesh S Rawat
180
0
0
31 Jul 2025
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim
MinJu Jeon
Ye-Chan Kim
Soeun Lee
Taewhan Kim
Dong-Jin Kim
195
3
0
24 Jul 2025
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
Hyunwoo Oh
SeungJu Cha
Kwanyoung Lee
Si-Woo Kim
Dong-Jin Kim
236
2
0
24 Jul 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
374
2
0
06 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
1.1K
1
0
06 May 2025
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
Mainak Singha
Subhankar Roy
Sarthak Mehrotra
Ankit Jha
Moloud Abdar
Biplab Banerjee
Elisa Ricci
VLM
VPVLM
613
2
0
29 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
915
1
0
07 Apr 2025
Can Text-to-Video Generation help Video-Language Alignment?
Computer Vision and Pattern Recognition (CVPR), 2025
Luca Zanella
Goran Frehse
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
347
1
0
24 Mar 2025
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
Computer Vision and Pattern Recognition (CVPR), 2025
M. Cui
Divyam Gupta
Mainak Singha
Sai Bhargav Rongali
Ankit Jha
Muhammad Haris Khan
Biplab Banerjee
VLM
395
5
0
20 Mar 2025
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
Computer Vision and Pattern Recognition (CVPR), 2025
SeungJu Cha
Kwanyoung Lee
Ye-Chan Kim
Hyunwoo Oh
Dong-Jin Kim
246
4
0
20 Mar 2025
AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
Computer Vision and Pattern Recognition (CVPR), 2025
Wenxin Ma
Xu Zhang
Qingsong Yao
Fenghe Tang
Chenxu Wu
Yongbin Li
Rui Yan
Zihang Jiang
S. Kevin Zhou
VLM
323
46
0
09 Mar 2025
VidCtx: Context-aware Video Question Answering with Image Models
Andreas Goulas
Vasileios Mezaris
Ioannis Patras
991
2
0
23 Dec 2024
Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training
Yun Yvonna Li
Zhe Liu
Lina Yao
CoGe
156
1
0
10 Dec 2024
Progress-Aware Video Frame Captioning
Computer Vision and Pattern Recognition (CVPR), 2024
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
620
7
0
03 Dec 2024
ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
300
0
0
23 Nov 2024
Extending Video Masked Autoencoders to 128 frames
Neural Information Processing Systems (NeurIPS), 2024
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming-Hsuan Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
318
4
0
20 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
295
7
0
17 Nov 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Neural Information Processing Systems (NeurIPS), 2024
Penghui Ruan
Pichao Wang
Divya Saxena
Jiannong Cao
Yuhui Shi
DiffM
VGen
231
1
0
31 Oct 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Neural Information Processing Systems (NeurIPS), 2024
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
243
3
0
31 Oct 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Asian Conference on Computer Vision (ACCV), 2024
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
308
0
0
16 Oct 2024
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
303
9
0
12 Oct 2024
Question-Answering Dense Video Events
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
612
9
0
06 Sep 2024
Semantically Controllable Augmentations for Generalizable Robot Learning
Zoey Chen
Zhao Mandi
Homanga Bharadhwaj
Mohit Sharma
Shuran Song
Abhishek Gupta
Vikash Kumar
LM&Ro
355
21
0
02 Sep 2024
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
International Conference on Pattern Recognition (ICPR), 2024
Massimo Bosetti
Shibingfeng Zhang
Bendetta Liberatori
Giacomo Zara
Elisa Ricci
Paolo Rota
VLM
239
6
0
29 Aug 2024
ReCorD: Reasoning and Correcting Diffusion for HOI Generation
Jian-Yu Jiang-Lin
Kang-Yang Huang
Ling Lo
Yi-Ning Huang
Terence Lin
Jhih-Ciang Wu
Hong-Han Shuai
Wen-Huang Cheng
DiffM
281
9
0
25 Jul 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
395
62
0
13 Jun 2024
Diving Deep into the Motion Representation of Video-Text Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Chinmaya Devaraj
Cornelia Fermuller
Yiannis Aloimonos
DiffM
VGen
228
0
0
07 Jun 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
266
5
0
27 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
248
9
0
17 May 2024
Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
Computer Vision and Pattern Recognition (CVPR), 2024
Yuchen Zhou
Linkai Liu
Chao Gou
254
17
0
16 May 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
436
67
0
09 Apr 2024
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori
Alessandro Conti
Paolo Rota
Yiming Wang
Elisa Ricci
372
11
0
08 Apr 2024
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Yiwu Zhong
Zi-Yuan Hu
Michael R. Lyu
Liwei Wang
255
6
0
27 Mar 2024
VidLA: Video-Language Alignment at Scale
Computer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
236
8
0
21 Mar 2024
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang
Yuhui Zhang
Orr Zohar
Serena Yeung-Levy
VLM
400
233
0
15 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
364
21
0
03 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
399
77
0
20 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
303
5
0
16 Feb 2024
Using Left and Right Brains Together: Towards Vision and Language Planning
Jun Cen
Chenfei Wu
Xiao Liu
Sheng-Siang Yin
Yixuan Pei
Jinglong Yang
Qifeng Chen
Nan Duan
Jianguo Zhang
286
11
0
16 Feb 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
International Conference on Learning Representations (ICLR), 2024
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
278
49
0
05 Feb 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
224
6
0
15 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
400
158
0
28 Dec 2023
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Yasser Benigmim
Subhankar Roy
S. Essid
Vicky Kalogeiton
Stéphane Lathuilière
432
35
0
15 Dec 2023
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
352
1
0
10 Dec 2023
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMM), 2023
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
328
6
0
01 Dec 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
416
13
0
30 Nov 2023
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Lorenzo Bianchi
F. Carrara
Nicola Messina
Claudio Gennaro
Fabrizio Falchi
ObjD
371
29
0
29 Nov 2023
LEAP: LLM-Generation of Egocentric Action Programs
Eadom Dessalene
Michael Maynord
Cornelia Fermuller
Yiannis Aloimonos
331
2
0
29 Nov 2023
1
2
Next
Page 1 of 2