Benchmarking Cognitive Biases in Large Language Models as EvaluatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Time Travel in LLMs: Tracing Data Contamination in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023 |
AgentBench: Evaluating LLMs as AgentsInternational Conference on Learning Representations (ICLR), 2023 |
Won't Get Fooled Again: Answering Questions with False PremisesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Assisting Language Learners: Automated Trans-Lingual Definition
Generation via Contrastive Prompt LearningWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2023 |
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaNeural Information Processing Systems (NeurIPS), 2023 |
A New Dataset and Empirical Study for Sentence Simplification in ChineseAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Direct Preference Optimization: Your Language Model is Secretly a Reward
ModelNeural Information Processing Systems (NeurIPS), 2023 |
OpenAssistant Conversations -- Democratizing Large Language Model
AlignmentNeural Information Processing Systems (NeurIPS), 2023 |
Towards a Unified Multi-Dimensional Evaluator for Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
TruthfulQA: Measuring How Models Mimic Human FalsehoodsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 |
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
Crawled CorpusConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
Memorization vs. Generalization: Quantifying Data Leakage in NLP
Performance EvaluationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2021 |