M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense
Passage RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024 |
FlashAttention-2: Faster Attention with Better Parallelism and Work
PartitioningInternational Conference on Learning Representations (ICLR), 2023 |
DoReMi: Optimizing Data Mixtures Speeds Up Language Model PretrainingNeural Information Processing Systems (NeurIPS), 2023 |
ConTextual Masked Auto-Encoder for Dense Passage RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2022 |
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022 |
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage
RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 |
SimCSE: Simple Contrastive Learning of Sentence EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
GooAQ: Open Question Answering with Diverse Answer TypesConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
Scaling Deep Contrastive Learning Batch Size under Memory Limited SetupWorkshop on Representation Learning for NLP (RepL4NLP), 2021 |
Dense Passage Retrieval for Open-Domain Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |
ELI5: Long Form Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2019 |
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question
AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2018 |
A large annotated corpus for learning natural language inferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2015 |