ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.04914
  4. Cited By
Analyzing Zero-Shot Abilities of Vision-Language Models on Video
  Understanding Tasks

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

7 October 2023
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
    VLM
ArXivPDFHTML

Papers citing "Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks"

6 / 6 papers shown
Title
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action
  Recognition with Language Knowledge
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Rameswar Panda
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
100
38
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
148
259
0
07 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
106
268
0
24 Jan 2020
1