Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025 |
Synthetic Visual GenomeComputer Vision and Pattern Recognition (CVPR), 2025 |
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-CodeAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025 |
Learning Sparsity for Effective and Efficient Music Performance Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Spatial Knowledge Graph-Guided Multimodal SynthesisIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025 |