Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025 |
RoboFlamingo-Plus: Fusion of Depth and RGB Perception with Vision-Language Models for Enhanced Robotic ManipulationInternational Conference on Real-time Computing and Robotics (ICRCR), 2025 |
Scalable, Training-Free Visual Language Robotics: A Modular Multi-Model Framework for Consumer-Grade GPUsIEEE/SICE International Symposium on System Integration (SII), 2025 |
DocVLM: Make Your VLM an Efficient ReaderComputer Vision and Pattern Recognition (CVPR), 2024 |
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation ModelsIEEE International Conference on Robotics and Automation (ICRA), 2024 |
VideoQA in the Era of LLMs: An Empirical StudyInternational Journal of Computer Vision (IJCV), 2024 |
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor EnvironmentsIEEE Robotics and Automation Letters (RA-L), 2024 |