Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07874
Cited By
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
12 March 2024
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond Text: Frozen Large Language Models in Visual Signal Comprehension"
18 / 18 papers shown
Title
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
46
0
0
01 Apr 2025
V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation
Guiwei Zhang
Tianyu Zhang
Mohan Zhou
Yalong Bai
Biye Li
59
0
0
10 Mar 2025
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao Liang
Baoquan Zhang
Zhiyuan Wen
Junteng Zhao
Yunming Ye
Kola Ye
Yao He
50
0
0
03 Mar 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
Rui Liu
Shuwei He
Yifan Hu
H. Li
VLM
87
1
0
16 Dec 2024
Factorized Visual Tokenization and Generation
Zechen Bai
Jianxiong Gao
Ziteng Gao
Pichao Wang
Zheng Zhang
Tong He
Mike Zheng Shou
75
3
0
25 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
46
9
0
08 Nov 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
47
7
0
23 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
Bridging Compressed Image Latents and Multimodal Large Language Models
Chia-Hao Kao
Cheng Chien
Yu-Jen Tseng
Yi-Hsin Chen
Alessandro Gnutti
Shao-Yuan Lo
Wen-Hsiao Peng
Riccardo Leonardi
42
0
0
29 Jul 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
41
33
0
17 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
39
15
0
14 Jun 2024
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Jianyuan Ni
Hao Tang
Syed Tousiful Haque
Yan Yan
A. Ngu
69
6
0
14 Apr 2024
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
206
900
0
27 Apr 2023
Regularized Vector Quantization for Tokenized Image Synthesis
Jiahui Zhang
Fangneng Zhan
Christian Theobalt
Shijian Lu
DiffM
MQ
33
30
0
11 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
265
4,229
0
30 Jan 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,909
0
04 Mar 2022
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
170
325
0
03 Mar 2022
1