Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2412.08158
Cited By
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey
11 December 2024
Yayun Qi
Hongxi Li
Yiqi Song
Xinxiao Wu
Jiebo Luo
LRM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey"
3 / 3 papers shown
TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement
Zhiyong Ma
Jiahao Chen
Qingyuan Chuai
Zhengping Li
123
0
0
12 Nov 2025
Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
Liyang Peng
Sihan Zhu
Yunjie Guo
185
0
0
24 Aug 2025
DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
Lav Varshney
433
0
0
27 Apr 2025
1
Page 1 of 1