
Title |
|---|
![]() A Survey on Video Temporal Grounding with Multimodal Large Language ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 |
![]() Vector-ICL: In-context Learning with Continuous Vector RepresentationsInternational Conference on Learning Representations (ICLR), 2024 |
![]() VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024 |
![]() CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and GenerationComputer Vision and Pattern Recognition (CVPR), 2024 |
![]() FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual PromptsAAAI Conference on Artificial Intelligence (AAAI), 2023 |