Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1711.00937
Cited By

Neural Discrete Representation Learning

v1v2 (latest)

Neural Discrete Representation Learning

2 November 2017

Aaron van den Oord

Koray Kavukcuoglu

ArXiv (abs)PDF HTML

Papers citing "Neural Discrete Representation Learning"

50 / 3,807 papers shown

An Efficient Transfer Learning Method Based on Adapter with Local Attributes for Speech Emotion Recognition

An Efficient Transfer Learning Method Based on Adapter with Local Attributes for Speech Emotion Recognition

71

0

0

28 Sep 2025

Language Model Planning from an Information Theoretic Perspective

Language Model Planning from an Information Theoretic Perspective

Muhammed Ustaomeroglu

Carlee Joe-Wong

143

0

0

28 Sep 2025

MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing

MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing

144

2

0

28 Sep 2025

GSID: Generative Semantic Indexing for E-Commerce Product Understanding

GSID: Generative Semantic Indexing for E-Commerce Product Understanding

109

1

0

28 Sep 2025

ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning

ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning

Chongyang Zhang

182

1

0

28 Sep 2025

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

161

0

0

28 Sep 2025

Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport

Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport

Xavier Aramayo Carrasco

Grigoriy Ksenofontov

Iaroslav Koshelev

Alexander Korotin

222

0

0

27 Sep 2025

Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation

Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation

269

0

0

27 Sep 2025

ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View

ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View

169

0

0

27 Sep 2025

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

135

0

0

26 Sep 2025

Developing Vision-Language-Action Model from Egocentric Videos

Developing Vision-Language-Action Model from Egocentric Videos

Taichi Nishimura

121

1

0

26 Sep 2025

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook

165

2

0

26 Sep 2025

Group Critical-token Policy Optimization for Autoregressive Image Generation

Group Critical-token Policy Optimization for Autoregressive Image Generation

159

2

0

26 Sep 2025

One Prompt Fits All: Universal Graph Adaptation for Pretrained Models

One Prompt Fits All: Universal Graph Adaptation for Pretrained Models

246

1

0

26 Sep 2025

Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization

Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization

178

0

0

26 Sep 2025

Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings

Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings

Stéphane Lathuilière

Vicky Kalogeiton

145

2

0

26 Sep 2025

Rate-Distortion Optimized Communication for Collaborative Perception

Rate-Distortion Optimized Communication for Collaborative Perception

125

0

0

26 Sep 2025

Residual Vector Quantization For Communication-Efficient Multi-Agent Perception

Residual Vector Quantization For Communication-Efficient Multi-Agent Perception

B.V.K Vijaya Kumar

330

1

0

25 Sep 2025

CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization

CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization

92

0

0

25 Sep 2025

AJAHR: Amputated Joint Aware 3D Human Mesh Recovery

AJAHR: Amputated Joint Aware 3D Human Mesh Recovery

135

0

0

24 Sep 2025

COLT: Enhancing Video Large Language Models with Continual Tool Usage

COLT: Enhancing Video Large Language Models with Continual Tool Usage

Xiaondan Liang

290

0

0

23 Sep 2025

DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision

DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision

134

0

0

23 Sep 2025

Online Adaptation via Dual-Stage Alignment and Self-Supervision for Fast-Calibration Brain-Computer Interfaces

Online Adaptation via Dual-Stage Alignment and Self-Supervision for Fast-Calibration Brain-Computer Interfaces

117

0

0

23 Sep 2025

Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps

Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps

Gabriel Maldonado

Narges Rashvand

Armin Danesh Pazho

Ghazal Alinezhad Noghre

146

0

0

23 Sep 2025

Improving Test-Time Performance of RVQ-based Neural Codecs

Improving Test-Time Performance of RVQ-based Neural Codecs

104

0

0

23 Sep 2025

Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation

Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation

374

2

0

23 Sep 2025

Learning Dexterous Manipulation with Quantized Hand State

Learning Dexterous Manipulation with Quantized Hand State

139

0

0

22 Sep 2025

VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation

VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation

188

0

0

21 Sep 2025

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology

Makito Kitamura

Tomohiko Nakamura

Shinnosuke Takamichi

Hiroshi Saruwatari

120

0

0

21 Sep 2025

DA-Font: Few-Shot Font Generation via Dual-Attention Hybrid Integration

DA-Font: Few-Shot Font Generation via Dual-Attention Hybrid Integration

106

1

0

20 Sep 2025

Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers

Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers

Narayan B. Mandayam

117

0

0

19 Sep 2025

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

...

Zhengdong Zhang

205

5

0

19 Sep 2025

Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems

Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems

Erik Schaffernicht

99

0

0

19 Sep 2025

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers

Federico Jurado Ruiz

206

0

0

19 Sep 2025

Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval

Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval

116

0

0

19 Sep 2025

SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models

SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models

187

0

0

19 Sep 2025

Generative AI Meets Wireless Sensing: Towards Wireless Foundation Model

Generative AI Meets Wireless Sensing: Towards Wireless Foundation Model

132

2

0

18 Sep 2025

Back to Ear: Perceptually Driven High Fidelity Music Reconstruction

Back to Ear: Perceptually Driven High Fidelity Music Reconstruction

166

0

0

18 Sep 2025

OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data

OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data

Tim Fingscheidt

186

0

0

18 Sep 2025

AToken: A Unified Tokenizer for Vision

AToken: A Unified Tokenizer for Vision

266

9

0

17 Sep 2025

AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck

AnyAccomp: Generalizable Accompaniment Generation via Quantized Melodic Bottleneck

167

0

0

17 Sep 2025

VQT-Light:Lightweight HDR Illumination Map Prediction with Richer Texture.pdf

VQT-Light:Lightweight HDR Illumination Map Prediction with Richer Texture.pdf

76

0

0

16 Sep 2025

SPGen: Spherical Projection as Consistent and Flexible Representation for Single Image 3D Shape Generation

SPGen: Spherical Projection as Consistent and Flexible Representation for Single Image 3D Shape Generation

123

0

0

16 Sep 2025

Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization

Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization

180

2

0

16 Sep 2025

Image Tokenizer Needs Post-Training

Image Tokenizer Needs Post-Training

Marios Savvides

204

4

0

15 Sep 2025

AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective

AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective

128

0

0

15 Sep 2025

PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis

PoolingVQ: A VQVAE Variant for Reducing Audio Redundancy and Boosting Multi-Modal Fusion in Music Emotion Analysis

251

0

0

15 Sep 2025

CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model

CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

112

0

0

15 Sep 2025

Lost in Embeddings: Information Loss in Vision-Language Models

Lost in Embeddings: Information Loss in Vision-Language Models

Anders Søgaard

131

5

0

15 Sep 2025

FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs

FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs

Md Mubtasim Ahasan

Rafat Hasan Khan

Tasnim Mohiuddin

A. K. M. Mahbubur Rahman

256

1

0

14 Sep 2025

1 2 3...5 6 7...75 76 77

Page 6 of 77

Pageof 77