Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for CodeInternational Conference on Information Control Systems & Technologies (ICICST), 2023 |
Forgetting Curve: A Reliable Method for Evaluating Memorization
Capability for Long-context ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |