ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.15785
  4. Cited By
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via
  Language Explainer

Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer

24 April 2024
Jiaming Lei
Lin Li
Chunping Wang
Jun Xiao
Long Chen
ArXivPDFHTML

Papers citing "Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer"

10 / 10 papers shown
Title
Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension
Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension
Lin Li
Wei Chen
Jiahui Li
L. Chen
LRM
33
1
0
20 Apr 2025
InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images
Jiun Tian Hoe
Weipeng Hu
Wei Zhou
Chao Xie
Ziwei Wang
Chee Seng Chan
Xudong Jiang
Y. Tan
58
0
0
12 Mar 2025
Scene Graph Generation with Role-Playing Large Language Models
Scene Graph Generation with Role-Playing Large Language Models
Guikun Chen
Jin Li
Wenguan Wang
VLM
37
5
0
20 Oct 2024
Hyperbolic Learning with Synthetic Captions for Open-World Detection
Hyperbolic Learning with Synthetic Captions for Open-World Detection
Fanjie Kong
Yanbei Chen
Jiarui Cai
Davide Modolo
VLM
ObjD
23
7
0
07 Apr 2024
Annolid: Annotate, Segment, and Track Anything You Need
Annolid: Annotate, Segment, and Track Anything You Need
Chen Yang
Thomas A. Cleland
VOS
16
2
0
27 Mar 2024
Towards Unified Interactive Visual Grounding in The Wild
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
25
2
0
30 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
39
35
0
16 Jan 2024
CATR: Combinatorial-Dependence Audio-Queried Transformer for
  Audio-Visual Video Segmentation
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
28
49
0
18 Sep 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action
  Recognition with Language Knowledge
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Rameswar Panda
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
100
38
0
15 Mar 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
1