Has Multimodal Learning Delivered Universal Intelligence in Healthcare?
A Comprehensive SurveyInformation Fusion (Inf. Fusion), 2024 |
Semantic Alignment for Multimodal Large Language ModelsACM Multimedia (MM), 2024 |
Sapiens: Foundation for Human Vision ModelsEuropean Conference on Computer Vision (ECCV), 2024 |
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large
Language Model Augmented FrameworkAAAI Conference on Artificial Intelligence (AAAI), 2024 |
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual
Recognition TasksComputer Vision and Pattern Recognition (CVPR), 2024 |
Masked Image Modeling: A SurveyInternational Journal of Computer Vision (IJCV), 2024 |
Efficient Diffusion Transformer with Step-wise Dynamic Attention
MediatorsEuropean Conference on Computer Vision (ECCV), 2024 |
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond
ScalingNeural Information Processing Systems (NeurIPS), 2024 |