MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025 |
Detecting Neurodegenerative Diseases using Frame-Level Handwriting EmbeddingsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 |
Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 |
Editing Music with Melody and Text: Using ControlNet for Diffusion TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEGIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 |
SF-Speech: Straightened Flow for Zero-Shot Voice CloneIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024 |