ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.05152
  4. Cited By
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
  Downstream Tasks

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

9 November 2023
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
    VLM
ArXivPDFHTML

Papers citing "Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks"

12 / 12 papers shown
Title
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
Shuhang Xun
Sicheng Tao
J. Li
Yibo Shi
Zhixin Lin
...
Shikang Wang
Y. Liu
H. Zhang
Ying Ma
Xuming Hu
VLM
LRM
43
0
0
04 May 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
41
0
0
17 Mar 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
65
5
0
10 Feb 2025
Transforming Game Play: A Comparative Study of DCQN and DTQN
  Architectures in Reinforcement Learning
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning
William A. Stigall
45
0
0
14 Oct 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
45
0
0
12 Sep 2024
Beyond Two-Tower Matching: Learning Sparse Retrievable
  Cross-Interactions for Recommendation
Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation
Liangcai Su
Fan Yan
Jieming Zhu
Xi Xiao
Haoyi Duan
Zhou Zhao
Zhenhua Dong
Ruiming Tang
16
9
0
30 Nov 2023
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
186
530
0
06 Oct 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
114
264
0
02 Feb 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally
  Across Scales and Tasks
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Xiao Liu
Kaixuan Ji
Yicheng Fu
Weng Lam Tam
Zhengxiao Du
Zhilin Yang
Jie Tang
VLM
236
804
0
14 Oct 2021
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,261
0
02 Sep 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,843
0
18 Apr 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
185
198
0
08 Jan 2021
1