Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2403.00522
Cited By

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

v1v2 (latest)

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

1 March 2024

Chunhua Shen

ArXiv (abs)PDF HTML HuggingFace (47 upvotes)Github (384★)

Papers citing "VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks"

19 / 19 papers shown

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

255

4

0

24 Nov 2025

VisPlay: Self-Evolving Vision-Language Models from Images

VisPlay: Self-Evolving Vision-Language Models from Images

Chengsong Huang

503

23

0

19 Nov 2025

Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything

Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything

Ravender Pal Singh

340

3

0

04 Nov 2025

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

...

309

11

0

28 Oct 2025

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

C. L. Philip Chen

462

6

0

16 Oct 2025

From Editor to Dense Geometry Estimator

From Editor to Dense Geometry Estimator

335

12

0

04 Sep 2025

A Novel Framework for Automated Explain Vision Model Using Vision-Language Models

A Novel Framework for Automated Explain Vision Model Using Vision-Language Models

Phu-Vinh Nguyen

237

0

0

27 Aug 2025

Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models

Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models

Xiu Li

388

1

0

18 Aug 2025

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

451

20

0

11 Aug 2025

Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review

Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review

Matthew Lisondra

375

8

0

26 May 2025

FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing

FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing

1.1K

27

0

06 May 2025

CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition

CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition

644

2

0

30 Mar 2025

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Bhoomika Lohana

Jaswinder Singh

297

5

0

23 Mar 2025

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

677

21

0

08 Mar 2025

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

645

9

0

08 Mar 2025

HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning

HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning

Maria Lymperaiou

Giorgos Filandrianos

Angeliki Dimitriou

Athanasios Voulodimos

249

0

0

01 Mar 2025

FlowDreamer: exploring high fidelity text-to-3D generation via rectified
flow

FlowDreamer: exploring high fidelity text-to-3D generation via rectified flow

Lin Wang

434

1

0

09 Aug 2024

SCHEME: Scalable Channel Mixer for Vision Transformers

SCHEME: Scalable Channel Mixer for Vision Transformers

Nuno Vasconcelos

961

1

0

01 Dec 2023

Baichuan 2: Open Large-scale Language Models

Baichuan 2: Open Large-scale Language Models

...

1.0K

966

0

19 Sep 2023

Page 1 of 1