
Title |
|---|
![]() APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2025 |
![]() eMamba: Efficient Acceleration Framework for Mamba Models in Edge ComputingACM Transactions on Embedded Computing Systems (ACM TECS), 2025 |
![]() Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision QuantizationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025 |
![]() Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression TechniquesAnnual International Computer Software and Applications Conference (COMPSAC), 2025 |
![]() Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI ModelsACM Computing Surveys (ACM Comput. Surv.), 2025 |
![]() Optimizing DNN Inference on Multi-Accelerator SoCs at Training-timeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024 |