
Title |
|---|
![]() Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question AnsweringInternational Conference on Information Photonics (ICIP), 2025 |
![]() Period-LLM: Extending the Periodic Capability of Multimodal Large Language ModelComputer Vision and Pattern Recognition (CVPR), 2025 |
![]() HuMoCon: Concept Discovery for Human Motion UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 |
![]() Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational VideosInternational Conference on Artificial Intelligence in Education (AIED), 2025 |
![]() SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 |