RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models

16 October 2023

Papers citing "RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models"

7 / 7 papers shown

Title
From Decision to Action in Surgical Autonomy: Multi-Modal Large Language Models for Robot-Assisted Blood Suction Sadra Zargarzadeh Maryam Mirzaei Yafei Ou Mahdi Tavakoli 21 1 0 14 Aug 2024
Robot Instance Segmentation with Few Annotations for Grasping Moshe Kimhi David Vainshtein Chaim Baskin Dotan Di Castro 45 2 0 01 Jul 2024
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion Zijun Long George Killick Lipeng Zhuang Gerardo Aragon Camarasa Zaiqiao Meng R. McCreadie VLM 29 2 0 22 Feb 2024
CrisisViT: A Robust Vision Transformer for Crisis Image Classification Zijun Long R. McCreadie Muhammad Imran 48 9 0 05 Jan 2024
Elucidating and Overcoming the Challenges of Label Noise in Supervised Contrastive Learning Zijun Long George Killick Lipeng Zhuang R. McCreadie Gerardo Aragon Camarasa Paul Henderson 18 5 0 25 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 244 4,186 0 30 Jan 2023
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao ViT 263 3,538 0 24 Feb 2021