ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.05499
  4. Cited By
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference
  Serving at Scale

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

10 August 2024
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
ArXivPDFHTML

Papers citing "LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale"

6 / 6 papers shown
Title
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Understanding and Optimizing Multi-Stage AI Inference Pipelines
A. Bambhaniya
Hanjiang Wu
Suvinay Subramanian
S. Srinivasan
Souvik Kundu
Amir Yazdanbakhsh
Midhilesh Elavazhagan
Madhu Kumar
Tushar Krishna
39
0
0
14 Apr 2025
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
A. Agrawal
Haoran Qiu
Junda Chen
Íñigo Goiri
Chaojie Zhang
Rayyan Shahid
R. Ramjee
Alexey Tumanov
Esha Choukse
RALM
LRM
23
1
0
25 Sep 2024
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems
  for Large-model Training at Scale
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
S. Srinivasan
T. Krishna
33
39
0
24 Mar 2023
DFX: A Low-latency Multi-FPGA Appliance for Accelerating
  Transformer-based Text Generation
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
64
74
0
22 Sep 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1