ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10221
  4. Cited By
RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language
  Models

RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models

16 October 2023
Zijun Long
George Killick
R. McCreadie
Gerardo Aragon Camarasa
    VLM
ArXivPDFHTML

Papers citing "RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models"

7 / 7 papers shown
Title
From Decision to Action in Surgical Autonomy: Multi-Modal Large Language Models for Robot-Assisted Blood Suction
From Decision to Action in Surgical Autonomy: Multi-Modal Large Language Models for Robot-Assisted Blood Suction
Sadra Zargarzadeh
Maryam Mirzaei
Yafei Ou
Mahdi Tavakoli
21
1
0
14 Aug 2024
Robot Instance Segmentation with Few Annotations for Grasping
Robot Instance Segmentation with Few Annotations for Grasping
Moshe Kimhi
David Vainshtein
Chaim Baskin
Dotan Di Castro
45
2
0
01 Jul 2024
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for
  Optimized Learning Fusion
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion
Zijun Long
George Killick
Lipeng Zhuang
Gerardo Aragon Camarasa
Zaiqiao Meng
R. McCreadie
VLM
29
2
0
22 Feb 2024
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
Zijun Long
R. McCreadie
Muhammad Imran
48
9
0
05 Jan 2024
Elucidating and Overcoming the Challenges of Label Noise in Supervised
  Contrastive Learning
Elucidating and Overcoming the Challenges of Label Noise in Supervised Contrastive Learning
Zijun Long
George Killick
Lipeng Zhuang
R. McCreadie
Gerardo Aragon Camarasa
Paul Henderson
18
5
0
25 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
1