MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context ScenariosAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |