All Papers
0 / 0 papers shown
Title |
|---|
Title |
|---|

Title |
|---|
![]() APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2025 |
![]() MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware ExpertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Model-Distributed Inference for Large Language Models at the EdgeIEEE Workshop on Local and Metropolitan Area Networks (LAN/MAN), 2025 |
![]() Turning LLM Activations Quantization-FriendlyInternational Symposium on Applied Computational Intelligence and Informatics (SACI), 2025 |
![]() Achieving binary weight and activation for LLMs using Post-Training QuantizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM InferenceDesign, Automation and Test in Europe (DATE), 2025 |