Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1711.00937
Cited By
v1
v2 (latest)
Neural Discrete Representation Learning
2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Neural Discrete Representation Learning"
50 / 3,807 papers shown
An Efficient Transfer Learning Method Based on Adapter with Local Attributes for Speech Emotion Recognition
Haoyu Song
Ian Mcloughlin
Qing Gu
Nan Jiang
Yan Song
71
0
0
28 Sep 2025
Language Model Planning from an Information Theoretic Perspective
Muhammed Ustaomeroglu
Baris Askin
Gauri Joshi
Carlee Joe-Wong
Guannan Qu
143
0
0
28 Sep 2025
MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing
Ruibing Hou
Mingshuang Luo
Hongyu Pan
Hong Chang
Shiguang Shan
144
2
0
28 Sep 2025
GSID: Generative Semantic Indexing for E-Commerce Product Understanding
Haiyang Yang
Qinye Xie
Qingheng Zhang
Liyu Chen
Huike Zou
Chengbao Lian
Shuguang Han
Fei Huang
Jufeng Chen
Bo Zheng
109
1
0
28 Sep 2025
ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning
Xincheng Yao
Chao Shi
Muming Zhao
Guangtao Zhai
Chongyang Zhang
OODD
182
1
0
28 Sep 2025
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
Junyou Wang
Zehua Chen
Binjie Yuan
Kaiwen Zheng
Chang Li
Yuxuan Jiang
Jun Zhu
161
0
0
28 Sep 2025
Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
Xavier Aramayo Carrasco
Grigoriy Ksenofontov
Aleksei Leonov
Iaroslav Koshelev
Alexander Korotin
OT
222
0
0
27 Sep 2025
Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
Zetian Wu
Tianshuo Zhou
Stefan Lee
Liang Huang
SLR
269
0
0
27 Sep 2025
ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View
Wenbin Teng
Gonglin Chen
Haiwei Chen
Yajie Zhao
DiffM
VGen
169
0
0
27 Sep 2025
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Yuhan Song
Linhao Zhang
Chuhan Wu
Aiwei Liu
Wei Jia
Houfeng Wang
Xiao-bin Zhou
135
0
0
26 Sep 2025
Developing Vision-Language-Action Model from Egocentric Videos
Tomoya Yoshida
Shuhei Kurita
Taichi Nishimura
Shinsuke Mori
121
1
0
26 Sep 2025
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
Yihao Chen
Kai Hu
Long Zhou
Shulin Feng
Xusheng Yang
Hangting Chen
Xie Chen
165
2
0
26 Sep 2025
Group Critical-token Policy Optimization for Autoregressive Image Generation
Guohui Zhang
Hu Yu
Xiaoxiao Ma
Jinghao Zhang
Yaning Pan
Mingde Yao
Jie Xiao
Linjiang Huang
Feng Zhao
159
2
0
26 Sep 2025
One Prompt Fits All: Universal Graph Adaptation for Pretrained Models
Yongqi Huang
Jitao Zhao
Dongxiao He
Xiaobao Wang
Yawen Li
Yuxiao Huang
Di Jin
Zhiyong Feng
VLM
246
1
0
26 Sep 2025
Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization
Takashi Morita
MQ
178
0
0
26 Sep 2025
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Yuanzhi Zhu
Xi Wang
Stéphane Lathuilière
Vicky Kalogeiton
145
2
0
26 Sep 2025
Rate-Distortion Optimized Communication for Collaborative Perception
Genjia Liu
Anning Hu
Yue Hu
Wenjun Zhang
Siheng Chen
125
0
0
26 Sep 2025
Residual Vector Quantization For Communication-Efficient Multi-Agent Perception
Dereje Shenkut
B.V.K Vijaya Kumar
330
1
0
25 Sep 2025
CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization
Ruiyu Wang
Shizhao Sun
Weijian Ma
Jiang Bian
92
0
0
25 Sep 2025
AJAHR: Amputated Joint Aware 3D Human Mesh Recovery
Hyunjin Cho
Giyun Choi
Jongwon Choi
3DH
135
0
0
24 Sep 2025
COLT: Enhancing Video Large Language Models with Continual Tool Usage
Yuyang Liu
Xinyuan Shi
Xiaondan Liang
Xiaondan Liang
KELM
CLL
290
0
0
23 Sep 2025
DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision
Azad Singh
Deepak Mishra
134
0
0
23 Sep 2025
Online Adaptation via Dual-Stage Alignment and Self-Supervision for Fast-Calibration Brain-Computer Interfaces
Sheng-Bin Duan
Jian-Long Hao
Tian-Yu Xiang
Xiao-Hu Zhou
Mei-Jiang Gui
Xiao-Liang Xie
Shi-Qi Liu
Zeng-Guang Hou
117
0
0
23 Sep 2025
Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps
Gabriel Maldonado
Narges Rashvand
Armin Danesh Pazho
Ghazal Alinezhad Noghre
Vinit Katariya
Hamed Tabkhi
146
0
0
23 Sep 2025
Improving Test-Time Performance of RVQ-based Neural Codecs
Hyeongju Kim
Junhyeok Lee
Jacob Morton
Juheon Lee
Jinhyeok Yang
104
0
0
23 Sep 2025
Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation
Yuanhuiyi Lyu
Chi Kit Wong
Chenfei Liao
Lutao Jiang
Xu Zheng
Zexin Lu
Linfeng Zhang
Xuming Hu
374
2
0
23 Sep 2025
Learning Dexterous Manipulation with Quantized Hand State
Ying Feng
Hongjie Fang
Yinong He
Jingjing Chen
Chenxi Wang
Zihao He
Ruonan Liu
Cewu Lu
139
0
0
22 Sep 2025
VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation
Feng Han
Chao Gong
Zhipeng Wei
Yue Yu
Yu Jiang
DiffM
188
0
0
21 Sep 2025
Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
Rinka Nobukawa
Makito Kitamura
Tomohiko Nakamura
Shinnosuke Takamichi
Hiroshi Saruwatari
120
0
0
21 Sep 2025
DA-Font: Few-Shot Font Generation via Dual-Attention Hybrid Integration
Weiran Chen
Guiqian Zhu
Ying Li
Yi Ji
Chunping Liu
106
1
0
20 Sep 2025
Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers
Zahra Aref
Narayan B. Mandayam
OffRL
117
0
0
19 Sep 2025
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Yanghao Li
Rui Qian
Bowen Pan
Haotian Zhang
Haoshuo Huang
...
Zhengdong Zhang
Chen Chen
Yang Zhao
Ruoming Pang
Zhifeng Chen
MLLM
205
5
0
19 Sep 2025
Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems
Alan A. Lahoud
Erik Schaffernicht
J. A. Stork
99
0
0
19 Sep 2025
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Krati Saxena
Federico Jurado Ruiz
Guido Manzi
Dianbo Liu
Alex Lamb
206
0
0
19 Sep 2025
Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval
Ruohan Zhang
Jiacheng Li
Julian McAuley
Yupeng Hou
VLM
116
0
0
19 Sep 2025
SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models
Sen Wang
Jingyi Tian
Le Wang
Zhimin Liao
Jiayi Li
Huaiyi Dong
Kun Xia
Sanping Zhou
Wei Tang
Hua Gang
VGen
LRM
187
0
0
19 Sep 2025
Generative AI Meets Wireless Sensing: Towards Wireless Foundation Model
Zheng Yang
Guoxuan Chi
Chenshu Wu
Hanyu Liu
Yuchong Gao
Yunhao Liu
Jie Xu
Tony Xiao Han
132
2
0
18 Sep 2025
Back to Ear: Perceptually Driven High Fidelity Music Reconstruction
Kangdi Wang
Zhiyue Wu
Dinghao Zhou
Rui Lin
Junyu Dai
Tao Jiang
DiffM
166
0
0
18 Sep 2025
OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data
Björn Möller
Zhengyang Li
Malte Stelzer
Thomas Graave
Fabian Bettels
Muaaz Ataya
Tim Fingscheidt
VGen
186
0
0
18 Sep 2025
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
266
9
0
17 Sep 2025
AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck
Junan Zhang
Yunjia Zhang
Xueyao Zhang
Zhizheng Wu
167
0
0
17 Sep 2025
VQT-Light:Lightweight HDR Illumination Map Prediction with Richer Texture.pdf
Kunliang Xie
3DV
76
0
0
16 Sep 2025
SPGen: Spherical Projection as Consistent and Flexible Representation for Single Image 3D Shape Generation
Jingdong Zhang
Weikai Chen
Y. Liu
Jionghao Wang
Zhengming Yu
Zhuowen Shen
B. Yang
Wenping Wang
Xin Li
3DGS
123
0
0
16 Sep 2025
Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization
Hao Xu
Xiaolin Wu
Xi Zhang
3DGS
MQ
180
2
0
16 Sep 2025
Image Tokenizer Needs Post-Training
Kai Qiu
Xiang Li
Hao Chen
Jason Kuen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
VLM
204
4
0
15 Sep 2025
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
Yuchen Deng
Xiuyang Wu
Hai-Tao Zheng
Suiyang Zhang
Yi He
Yuxing Han
VGen
128
0
0
15 Sep 2025
PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis
Dinghao Zou
Yicheng Gong
Xiaokang Li
Xin Cao
Sunbowen Lee
251
0
0
15 Sep 2025
CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Wei-Hsin Yeh
Yu-An Su
Chih-Ning Chen
Yi-Hsueh Lin
Calvin Ku
Wen-Hsin Chiu
Min-Chun Hu
Lun-Wei Ku
112
0
0
15 Sep 2025
Lost in Embeddings: Information Loss in Vision-Language Models
Wenyan Li
Raphael Tang
Chengzu Li
Caiqi Zhang
Ivan Vulić
Anders Søgaard
VLM
131
5
0
15 Sep 2025
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
Md Mubtasim Ahasan
Rafat Hasan Khan
Tasnim Mohiuddin
Vasu Sharma
Tariq Iqbal
M. A. Amin
Amin Ahsan Ali
M. Islam
A. K. M. Mahbubur Rahman
256
1
0
14 Sep 2025
Previous
1
2
3
...
5
6
7
...
75
76
77
Next
Page 6 of 77
Page
of 77
Go