v1v2 (latest)

Enhancing elusive clues in knowledge learning by contrasting attention of language models

AAAI Conference on Artificial Intelligence (AAAI), 2024

26 September 2024

Jian Gao

Xiao Zhang

Ji Wu

Chenyi Guo

ArXiv (abs)PDF HTML

Papers citing "Enhancing elusive clues in knowledge learning by contrasting attention of language models"

31 / 31 papers shown

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team

Gemma Team Morgane Riviere

...

617

1,556

31 Jul 2024

Source-Aware Training Enables Knowledge Attribution in Language Models

Hao Peng

401

01 Apr 2024

Reverse Training to Nurse the Reversal Curse

O. Yu. Golovneva

Zeyuan Allen-Zhu

Jason Weston

Sainbayar Sukhbaatar

345

20 Mar 2024

Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction

456

16 Feb 2024

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Mert Yuksekgonul

192

26 Sep 2023

Physics of Language Models: Part 3.1, Knowledge Storage and ExtractionInternational Conference on Machine Learning (ICML), 2023

Zeyuan Allen-Zhu

Yuanzhi Li

KELM

521

233

25 Sep 2023

AttentionMix: Data augmentation method that relies on BERT attention mechanism

Dominik Lewy

Jacek Mańdziuk

272

20 Sep 2023

MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningInternational Conference on Learning Representations (ICLR), 2023

Ge Zhang

512

11 Sep 2023

Code Llama: Open Foundation Models for Code

Baptiste Rozière

...

Louis Martin

451

2,755

24 Aug 2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructInternational Conference on Learning Representations (ICLR), 2023

...

800

622

18 Aug 2023

Textbooks Are All You Need

...

Adam Tauman Kalai

367

512

20 Jun 2023

LLaMA: Open and Efficient Foundation Language Models

...

4.9K

17,636

27 Feb 2023

Self-Instruct: Aligning Language Models with Self-Generated InstructionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Daniel Khashabi

757

2,804

20 Dec 2022

Large Language Models Are Reasoning TeachersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

328

434

20 Dec 2022

Scaling Instruction-Finetuned Language ModelsJournal of machine learning research (JMLR), 2022

...

1.3K

3,790

20 Oct 2022

Solving Quantitative Reasoning Problems with Language ModelsNeural Information Processing Systems (NeurIPS), 2022

Henryk Michalewski

...

661

1,295

29 Jun 2022

Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022

Carroll L. Wainwright

...

2.1K

17,490

04 Mar 2022

Towards Continual Knowledge Learning of Language Models

591

186

07 Oct 2021

AEDA: An Easier Data Augmentation Technique for Text ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Akbar Karimi

L. Rossi

Andrea Prati

169

185

30 Aug 2021

RoFormer: Enhanced Transformer with Rotary Position Embedding

821

3,918

20 Apr 2021

Attention is not not ExplanationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Sarah Wiegreffe

Yuval Pinter

XAI AAML FAtt

472

1,025

13 Aug 2019

What Does BERT Look At? An Analysis of BERT's Attention

Kevin Clark

Urvashi Khandelwal

Omer Levy

Christopher D. Manning

MILM

613

1,826

11 Jun 2019

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be PrunedAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

697

1,329

23 May 2019

Unsupervised Data Augmentation for Consistency TrainingNeural Information Processing Systems (NeurIPS), 2019

790

2,537

29 Apr 2019

Attention is not ExplanationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019

Sarthak Jain

Byron C. Wallace

FAtt

1.1K

1,523

26 Feb 2019

Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

Sosuke Kobayashi

177

656

16 May 2018

Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia

Xinya Du

Claire Cardie

KELM

260

173

15 May 2018

mixup: Beyond Empirical Risk MinimizationInternational Conference on Learning Representations (ICLR), 2017

714

11,100

25 Oct 2017

SQuAD: 100,000+ Questions for Machine Comprehension of Text

708

8,904

16 Jun 2016

Improving Neural Machine Translation Models with Monolingual Data

Rico Sennrich

Barry Haddow

Alexandra Birch

791

2,850

20 Nov 2015

Distilling the Knowledge in a Neural Network

797

22,387

09 Mar 2015