Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2311.15830
Cited By

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

v1v2v3 (latest)

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

27 November 2023

Zhengcong Fei

Junshi Huang

ArXiv (abs)PDF HTML

Papers citing "A-JEPA: Joint-Embedding Predictive Architecture Can Listen"

25 / 25 papers shown

CrossJEPA: Cross-Modal Joint-Embedding Predictive Architecture for Efficient 3D Representation Learning from 2D Images

CrossJEPA: Cross-Modal Joint-Embedding Predictive Architecture for Efficient 3D Representation Learning from 2D Images

Kumal Hewagamage

Kavishka Abeywardana

Hasitha Gallella

176

0

0

23 Nov 2025

Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops

Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops

Mattia Scardecchia

160

0

0

04 Oct 2025

WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms

WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms

Goksenin Yuksel

Pierre Guetschel

Michael Tangermann

Marcel van Gerven

Kiki van der Heijden

151

1

0

27 Sep 2025

Embodied AI: From LLMs to World Models

Embodied AI: From LLMs to World Models

330

11

0

24 Sep 2025

BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition

BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition

Brian Kingsbury

130

0

0

18 Sep 2025

Discrete JEPA: Learning Discrete Token Representations without Reconstruction

Discrete JEPA: Learning Discrete Token Representations without Reconstruction

Christopher Hoang

223

0

0

17 Jun 2025

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic SoundscapesInternational Conference on Learning Representations (ICLR), 2025

Philip J. B. Jackson

170

10

0

13 Jun 2025

A Survey on Cross-Modal Interaction Between Music and Multimodal Data

A Survey on Cross-Modal Interaction Between Music and Multimodal Data

310

1

0

17 Apr 2025

SkyReels-A2: Compose Anything in Video Diffusion Transformers

SkyReels-A2: Compose Anything in Video Diffusion Transformers

...

332

34

0

03 Apr 2025

Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study

Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study

316

3

0

24 Mar 2025

Predict, Cluster, Refine: A Joint Embedding Predictive Self-Supervised Framework for Graph Representation Learning

Predict, Cluster, Refine: A Joint Embedding Predictive Self-Supervised Framework for Graph Representation Learning

Srinitish Srinivasan

486

0

0

02 Feb 2025

Video Diffusion Transformers are In-Context Learners

Video Diffusion Transformers are In-Context Learners

882

7

0

14 Dec 2024

Sparsh: Self-supervised touch representations for vision-based tactile
sensing

Sparsh: Self-supervised touch representations for vision-based tactile sensingConference on Robot Learning (CoRL), 2024

Carolina Higuera

Chaithanya Krishna Bodduluri

Patrick E. Lancaster

...

Mustafa Mukadam

270

47

0

31 Oct 2024

Learning Latent Wireless Dynamics from Channel State Information

Learning Latent Wireless Dynamics from Channel State InformationIEEE Wireless Communications Letters (WCL), 2024

Charbel Bou Chaaya

Abanoub M. Girgis

Mehdi Bennis

187

8

0

16 Sep 2024

FLUX that Plays Music

FLUX that Plays Music

316

17

0

01 Sep 2024

Aligning Cyber Space with Physical World: A Comprehensive Survey on
Embodied AI

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Xiaodan Liang

Liang Lin

LM&Ro SyDa AI4CE

619

185

0

09 Jul 2024

Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks

Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks

Abanoub M. Girgis

Álvaro Valcarce

225

6

0

07 Jun 2024

LaT-PFN: A Joint Embedding Predictive Architecture for In-context
Time-series Forecasting

LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting

Stijn Verdenius

BDL AI4TS AI4CE

297

4

0

16 May 2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

...

362

81

0

06 May 2024

Music Consistency Models

Music Consistency Models

Zhengcong Fei

Junshi Huang

213

7

0

20 Apr 2024

World Models for Autonomous Driving: An Initial Survey

World Models for Autonomous Driving: An Initial Survey

Haicheng Liao

Chengzhong Xu

428

79

0

05 Mar 2024

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

...

316

17

0

08 Dec 2023

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

509

11

0

27 Sep 2023

Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Piotr Bojanowski

1.2K

4,653

0

17 Jun 2020

Mockingjay: Unsupervised Speech Representation Learning with Deep
Bidirectional Transformer Encoders

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

463

391

0

25 Oct 2019