ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.00937
  4. Cited By
Neural Discrete Representation Learning
v1v2 (latest)

Neural Discrete Representation Learning

2 November 2017
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
    BDLSSLOCL
ArXiv (abs)PDFHTML

Papers citing "Neural Discrete Representation Learning"

50 / 3,807 papers shown
OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft
OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft
Zihao Wang
Muyao Li
K. He
Xiangyu Wang
Zhancun Mu
Anji Liu
Yitao Liang
LM&Ro
181
2
0
13 Sep 2025
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Yifan Chang
Jie Qin
Limeng Qiao
Xiaofeng Wang
Zheng Zhu
Lin Ma
Xingang Wang
MQ
151
3
0
12 Sep 2025
A Discrepancy-Based Perspective on Dataset Condensation
A Discrepancy-Based Perspective on Dataset Condensation
Tong Chen
Raghavendra Selvan
DD
275
0
0
12 Sep 2025
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Tao Han
Wanghan Xu
Junchao Gong
Xiaoyu Yue
Song Guo
Luping Zhou
Lei Bai
126
2
0
12 Sep 2025
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Harry Julian
Rachel Beeson
Lohith Konathala
Johanna Ulin
Jiameng Gao
188
1
0
11 Sep 2025
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
Ngoc Son Nguyen
Hieu-Nghia Huynh-Nguyen
Thanh V. T. Tran
Truong-Son Hy
Van Nguyen
169
0
0
11 Sep 2025
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
Marco Pasini
Stefan Lattner
George Fazekas
143
1
0
11 Sep 2025
DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners
DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners
Xiaoxue Luo
Jinwei Huang
Runyan Yang
Yingying Gao
Junlan Feng
Chao Deng
Shilei Zhang
146
2
0
11 Sep 2025
World Modeling with Probabilistic Structure Integration
World Modeling with Probabilistic Structure Integration
Klemen Kotar
Wanhee Lee
Rahul Venkatesh
Honglin Chen
Daniel M. Bear
...
Imran Thobani
Alex Durango
Khaled Jedoui
Atlas Kazemian
Dan Yamins
150
3
0
10 Sep 2025
Integrating Anatomical Priors into a Causal Diffusion Model
Integrating Anatomical Priors into a Causal Diffusion Model
Binxu Li
Wei Peng
Mingjie Li
Ehsan Adeli
K. Pohl
DiffMMedIm
149
0
0
10 Sep 2025
LatentVoiceGrad: Nonparallel Voice Conversion with Latent Diffusion/Flow-Matching Models
LatentVoiceGrad: Nonparallel Voice Conversion with Latent Diffusion/Flow-Matching ModelsIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Hirokazu Kameoka
Takuhiro Kaneko
Kou Tanaka
Yuto Kondo
DiffM
117
0
0
10 Sep 2025
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
Xiao Li
Qi Chen
Xiulian Peng
K. Yu
Xie Chen
Yan Lu
DiffMVGen
137
1
0
10 Sep 2025
Tokenizing Loops of Antibodies
Tokenizing Loops of Antibodies
Ada Fang
Robert G. Alberstein
Simon Kelow
Frédéric A. Dreyer
74
2
0
10 Sep 2025
Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction
Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction
Vivek Oommen
Siavash Khodakarami
Aniruddha Bora
Zhicheng Wang
George Karniadakis
DiffMAI4CE
171
3
0
10 Sep 2025
TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models
TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models
Z. Zhang
Haobo Xu
Zhuo Yang
Chenghao Yue
Zehao Lin
Huan-ang Gao
Ziwei Wang
Hang Zhao
115
7
0
09 Sep 2025
Reconstruction Alignment Improves Unified Multimodal Models
Reconstruction Alignment Improves Unified Multimodal Models
Ji Xie
Trevor Darrell
Luke Zettlemoyer
Xudong Wang
223
15
0
08 Sep 2025
Continuous Audio Language Models
Continuous Audio Language Models
Simon Rouard
Manu Orsini
Axel Roebel
Neil Zeghidour
Alexandre Défossez
AuLLMKELM
284
2
0
08 Sep 2025
UniSearch: Rethinking Search System with a Unified Generative Architecture
UniSearch: Rethinking Search System with a Unified Generative Architecture
Jiahui Chen
Xiaoze Jiang
Zhibo Wang
Quanzhi Zhu
Junyao Zhao
...
Cheng Chen
Jingshan Lv
Yupeng Huang
Xiao Liang
Han Li
171
2
0
08 Sep 2025
1 bit is all we need: binary normalized neural networks
1 bit is all we need: binary normalized neural networks
Eduardo Lobo Lustoda Cabral
Paulo Pirozelli
Larissa Driemeier
MQ
166
0
0
07 Sep 2025
Compression Beyond Pixels: Semantic Compression with Multimodal Foundation Models
Compression Beyond Pixels: Semantic Compression with Multimodal Foundation Models
Ruiqi Shen
Haotian Wu
Wenjing Zhang
Jiangjing Hu
Deniz Gündüz
VLM
127
3
0
07 Sep 2025
LatinX: Aligning a Multilingual TTS Model with Direct Preference Optimization
LatinX: Aligning a Multilingual TTS Model with Direct Preference Optimization
Luis Felipe Chary
Miguel Arjona Ramirez
84
0
0
06 Sep 2025
Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization
Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization
Lee Kezar
Zed Sevcikova Sehyr
Jesse Thomason
96
0
0
05 Sep 2025
Missing Fine Details in Images: Last Seen in High Frequencies
Missing Fine Details in Images: Last Seen in High Frequencies
Tejaswini Medi
Hsien-Yi Wang
Arianna Rampini
Margret Keuper
307
2
0
05 Sep 2025
Human Motion Video Generation: A Survey
Human Motion Video Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Haiwei Xue
Xiangyang Luo
Zhanghao Hu
Shu Zhang
Xunzhi Xiang
...
Fei Ma
Zhiyong Wu
Changpeng Yang
Zonghong Dai
Fei Richard Yu
EGVMVGen
235
25
0
04 Sep 2025
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model
Hongyang Wei
Baixin Xu
Hongbo Liu
Cyrus Wu
J. Liu
...
Ying He
Yang Liu
Xuchen Song
Eric Li
Y. Zhou
185
15
0
04 Sep 2025
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
Rui Zheng
Wenrui Liu
Hui-Peng Du
Qinglin Zhang
Chong Deng
Qian Chen
Wen Wang
Yang Ai
Zhen-Hua Ling
242
3
0
04 Sep 2025
OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search
OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search
Ben Chen
X. Guo
Siyuan Wang
Zihan Liang
Yue Lv
...
Jing Chen
Chenyi Lei
Wenwu Ou
Han Li
Kun Gai
254
6
0
03 Sep 2025
RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
Sashuai Zhou
Weinan Gan
Qijiong Liu
Ke Lei
Jieming Zhu
Hai Huang
Yan Xia
Ruiming Tang
Zhenhua Dong
Zhou Zhao
135
4
0
03 Sep 2025
SynBT: High-quality Tumor Synthesis for Breast Tumor Segmentation by 3D Diffusion Model
SynBT: High-quality Tumor Synthesis for Breast Tumor Segmentation by 3D Diffusion Model
Hongxu Yang
Edina Timko
Levente Lippenszky
Vanda Czipczer
L. Ferenczi
MedIm
79
0
0
03 Sep 2025
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing
Quan Dao
Xiaoxiao He
Ligong Han
Ngan Hoai Nguyen
Amin Heyrani Nobar
Faez Ahmed
Han Zhang
Viet Anh Nguyen
Dimitris N. Metaxas
DiffM
217
0
0
02 Sep 2025
Spectrogram Patch Codec: A 2D Block-Quantized VQ-VAE and HiFi-GAN for Neural Speech Coding
Spectrogram Patch Codec: A 2D Block-Quantized VQ-VAE and HiFi-GAN for Neural Speech Coding
Luis Felipe Chary
Miguel Arjona Ramirez
69
2
0
02 Sep 2025
Hierarchical Motion Captioning Utilizing External Text Data Source
Hierarchical Motion Captioning Utilizing External Text Data Source
Clayton Frederick Souza Leite
Yu Xiao
107
0
0
01 Sep 2025
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Zhengqiang Zhang
Rongyuan Wu
Lingchen Sun
Lei Zhang
286
2
0
01 Sep 2025
Distillation of a tractable model from the VQ-VAE
Distillation of a tractable model from the VQ-VAE
Armin Hadžić
Milan Papez
Tomáš Pevný
TPMBDL
269
0
0
01 Sep 2025
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
Jialong Zuo
Guangyan Zhang
Minghui Fang
Shengpeng Ji
Xiaoqi Jiao
Jingyu Li
Yiwen Guo
Zhou Zhao
106
0
0
30 Aug 2025
Generative AI for Industrial Contour Detection: A Language-Guided Vision System
Generative AI for Industrial Contour Detection: A Language-Guided Vision System
Liang Gong
Tommy
Wang
Sara Chaker
Yanchen Dong
Fouad Bousetouane
Brenden Morton
Mark Mendez
99
0
0
29 Aug 2025
Physics Informed Generative Models for Magnetic Field Images
Physics Informed Generative Models for Magnetic Field Images
Aye Phyu Phyu Aung
Lucas Lum
Zhansen Shi
Wen Qiu
Bernice Zee
JM Chin
Yeow Kheng Lim
J.Senthilnath
DiffMMedIm
64
1
0
28 Aug 2025
FORGE: Foundational Optimization Representations from Graph Embeddings
FORGE: Foundational Optimization Representations from Graph Embeddings
Zohair Shafi
Serdar Kadioglu
AI4CE
304
0
0
28 Aug 2025
Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion
Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion
Zheng Qin
Yabing Wang
Minghui Yang
Sanping Zhou
Ming Yang
Le Wang
146
1
0
28 Aug 2025
Quantum latent distributions in deep generative models
Quantum latent distributions in deep generative models
Omar Bacarreza
Thorin Farnsworth
Alexander Makarovskiy
Hugo Wallner
Tessa Hicks
Santiago Sempere-Llagostera
John Price
Robert J. A. Francis-Jones
William R. Clements
DiffM
107
3
0
27 Aug 2025
Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)
Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)
Zhi Li
Hau Phan
Matthew Emigh
Austin J. Brockmeier
CoGe
161
0
0
27 Aug 2025
Controllable Skin Synthesis via Lesion-Focused Vector Autoregression Model
Controllable Skin Synthesis via Lesion-Focused Vector Autoregression ModelInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Jiajun Sun
Zhen Yu
Siyuan Yan
Jason J. Ong
Zongyuan Ge
Lei Zhang
MedIm
119
0
0
27 Aug 2025
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Zhixuan Liang
Yizhuo Li
Tianshuo Yang
Chengyue Wu
Sitong Mao
...
Jiangmiao Pang
Xiaokang Yang
Ping Luo
Yao Mu
Ping Luo
179
30
0
27 Aug 2025
Fast 3D Diffusion for Scalable Granular Media Synthesis
Fast 3D Diffusion for Scalable Granular Media Synthesis
Muhammad Moeeze Hassan
Régis Cottereau
Filippo Gatti
Patryk Dec
DiffM
88
0
0
27 Aug 2025
MRExtrap: Longitudinal Aging of Brain MRIs using Linear Modeling in Latent Space
MRExtrap: Longitudinal Aging of Brain MRIs using Linear Modeling in Latent Space
J. Kapoor
Jakob H Macke
Christian F. Baumgartner
MedIm
163
0
0
26 Aug 2025
Interpretable by AI Mother Tongue: Native Symbolic Reasoning in Neural Models
Interpretable by AI Mother Tongue: Native Symbolic Reasoning in Neural Models
Hung Ming Liu
LRM
80
0
0
26 Aug 2025
EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models
EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models
Wei Xiong
Jiangtong Li
Jie Li
Kun Zhu
119
3
0
25 Aug 2025
PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation
PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation
Bin Tan
Wangyao Ge
Y. Wang
Xin Liu
Jeff Burtoft
Hao Fan
Hui Wang
222
2
0
25 Aug 2025
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
Haoyu Wang
Hao Tang
Donglin Di
Zhilu Zhang
W. Zuo
Feng Gao
Siwei Ma
Shiliang Zhang
DiffMVGen
154
0
0
24 Aug 2025
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Kaiyue Sun
Rongyao Fang
Chengqi Duan
Xian Liu
Xihui Liu
181
20
0
24 Aug 2025
Previous
123...678...757677
Next
Page 7 of 77
Pageof 77