RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Constrained Decoding with Speculative LookaheadsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
PLD+: Accelerating LLM inference by leveraging Language Model ArtifactsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
Speculative Decoding with CTC-based Draft Model for LLM Inference
AccelerationNeural Information Processing Systems (NeurIPS), 2024 |
Debiasing Watermarks for Large Language Models via Maximal CouplingJournal of the American Statistical Association (JASA), 2024 |
SAM Decoding: Speculative Decoding via Suffix AutomatonAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
SpecHub: Provable Acceleration to Multi-Draft Speculative DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
A Theoretical Perspective for Speculative Decoding AlgorithmNeural Information Processing Systems (NeurIPS), 2024 |
The Impact of Inference Acceleration on Bias of LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative DecodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
Transferable Post-training via Inverse Value LearningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRAInternational Conference on Learning Representations (ICLR), 2024 |
FIRP: Faster LLM inference via future intermediate representation
predictionNatural Language Processing and Chinese Computing (NLPCC), 2024 |
Fast Best-of-N Decoding via Speculative RejectionNeural Information Processing Systems (NeurIPS), 2024 |
Watermarking Large Language Models and the Generated Content:
Opportunities and ChallengesAsilomar Conference on Signals, Systems and Computers (ACSSC), 2024 |
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical LimitsInternational Conference on Learning Representations (ICLR), 2024 |
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
Language ModelsInternational Conference on Machine Learning (ICML), 2024 |
MoDification: Mixture of Depths Made EasyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
and Speculative DecodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
Learning from Imperfect Data: Towards Efficient Knowledge Distillation
of Autoregressive Language Models for Text-to-SQLConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
Probabilistic Degeneracy Detection for Point-to-Plane Error MinimizationIEEE Robotics and Automation Letters (RA-L), 2024 |
QEFT: Quantization for Efficient Fine-Tuning of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |