ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00522
  4. Cited By
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
v1v2 (latest)

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

1 March 2024
Xiangxiang Chu
Jianlin Su
Bo Zhang
Chunhua Shen
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (47 upvotes)Github (384★)

Papers citing "VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks"

19 / 19 papers shown
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
Jianhao Zeng
Y. Bai
Ruidong Chen
Xuanpu Zhang
Lei-huan Sun
Dongyang Jin
Ryan Xu
Nannan Zhang
Dan Song
Xiangxiang Chu
3DH
255
4
0
24 Nov 2025
VisPlay: Self-Evolving Vision-Language Models from Images
VisPlay: Self-Evolving Vision-Language Models from Images
Yicheng He
Chengsong Huang
Zongxia Li
Jiaxin Huang
Yonghui Yang
OffRLLRMVLM
503
23
0
19 Nov 2025
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything
Huawei Lin
Yunzhi Shi
Tong Geng
Weijie Zhao
Wei Wang
Ravender Pal Singh
LLMAGVLMLRM
340
3
0
04 Nov 2025
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Y. X. Wei
Shiwei Zhang
Hangjie Yuan
Yujin Han
Zhekai Chen
...
Difan Zou
Xihui Liu
Yingya Zhang
Yu Liu
Hongming Shan
DiffMMoE
309
11
0
28 Oct 2025
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints
Meiqi Wu
Jiashu Zhu
Xiaokun Feng
C. L. Philip Chen
Chen Zhu
Bingze Song
Fangyuan Mao
Jiahong Wu
Xiangxiang Chu
Kaiqi Huang
VGenEGVMVLM
462
6
0
16 Oct 2025
From Editor to Dense Geometry Estimator
From Editor to Dense Geometry Estimator
Jiyuan Wang
Chunyu Lin
Lei-huan Sun
Rongying Liu
Lang Nie
Mingxing Li
K. Liao
Xiangxiang Chu
DiffMMDE
335
12
0
04 Sep 2025
A Novel Framework for Automated Explain Vision Model Using Vision-Language Models
A Novel Framework for Automated Explain Vision Model Using Vision-Language Models
Phu-Vinh Nguyen
Tan-Hanh Pham
Chris Ngo
Truong-Son Hy
237
0
0
27 Aug 2025
Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
Chubin Chen
Jiashu Zhu
Xiaokun Feng
Nisha Huang
Meiqi Wu
Fangyuan Mao
Jiahong Wu
Xiangxiang Chu
Xiu Li
Xiu Li
388
1
0
18 Aug 2025
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
Fangyuan Mao
Aiming Hao
Jintao Chen
Dongxia Liu
Xiaokun Feng
Jiashu Zhu
Meiqi Wu
Chubin Chen
Jiahong Wu
Xiangxiang Chu
DiffMVGen
451
20
0
11 Aug 2025
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Matthew Lisondra
B. Benhabib
G. Nejat
LM&Ro
375
8
0
26 May 2025
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
Mingxing Li
Dongyang Jin
Xiaowen Chu
Lei Sun
Lei-huan Sun
Xiangxiang Chu
DiffM
1.1K
27
0
06 May 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
644
2
0
30 Mar 2025
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
Aabid Karim
Abdul Karim
Bhoomika Lohana
Matt Keon
Jaswinder Singh
A. Sattar
LRM
297
5
0
23 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
677
21
0
08 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
645
9
0
08 Mar 2025
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
Maria Lymperaiou
Giorgos Filandrianos
Angeliki Dimitriou
Athanasios Voulodimos
Giorgos Stamou
MLLM
249
0
0
01 Mar 2025
FlowDreamer: exploring high fidelity text-to-3D generation via rectified
  flow
FlowDreamer: exploring high fidelity text-to-3D generation via rectified flow
Hangyu Li
Xiangxiang Chu
Dingyuan Shi
Lin Wang
434
1
0
09 Aug 2024
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
961
1
0
01 Dec 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELMLRM
1.0K
966
0
19 Sep 2023
1
Page 1 of 1