ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.17954
  4. Cited By
Enhancing elusive clues in knowledge learning by contrasting attention of language models
v1v2 (latest)

Enhancing elusive clues in knowledge learning by contrasting attention of language models

AAAI Conference on Artificial Intelligence (AAAI), 2024
26 September 2024
Jian Gao
Xiao Zhang
Ji Wu
Chenyi Guo
ArXiv (abs)PDFHTML

Papers citing "Enhancing elusive clues in knowledge learning by contrasting attention of language models"

31 / 31 papers shown
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLMMoEOSLM
617
1,556
0
31 Jul 2024
Source-Aware Training Enables Knowledge Attribution in Language Models
Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa
Aman Rangapur
Emma Strubell
Honglak Lee
Lu Wang
Iz Beltagy
Hao Peng
HILM
401
25
0
01 Apr 2024
Reverse Training to Nurse the Reversal Curse
Reverse Training to Nurse the Reversal Curse
O. Yu. Golovneva
Zeyuan Allen-Zhu
Jason Weston
Sainbayar Sukhbaatar
345
48
0
20 Mar 2024
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Kuniaki Saito
Kihyuk Sohn
Chen-Yu Lee
Yoshitaka Ushiku
456
10
0
16 Feb 2024
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
  Language Models
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
192
67
0
26 Sep 2023
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Physics of Language Models: Part 3.1, Knowledge Storage and ExtractionInternational Conference on Machine Learning (ICML), 2023
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
521
233
0
25 Sep 2023
AttentionMix: Data augmentation method that relies on BERT attention
  mechanism
AttentionMix: Data augmentation method that relies on BERT attention mechanism
Dominik Lewy
Jacek Mańdziuk
272
4
0
20 Sep 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction
  Tuning
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningInternational Conference on Learning Representations (ICLR), 2023
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMatLRM
512
512
0
11 Sep 2023
Code Llama: Open Foundation Models for Code
Code Llama: Open Foundation Models for Code
Baptiste Rozière
Jonas Gehring
Fabian Gloeckle
Sten Sootla
Itai Gat
...
Hugo Touvron
Louis Martin
Nicolas Usunier
Thomas Scialom
Gabriel Synnaeve
ELMALM
451
2,755
0
24 Aug 2023
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructInternational Conference on Learning Representations (ICLR), 2023
Haipeng Luo
Qingfeng Sun
Can Xu
Lu Wang
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
LRMOSLM
800
622
0
18 Aug 2023
Textbooks Are All You Need
Textbooks Are All You Need
Suriya Gunasekar
Yi Zhang
J. Aneja
C. C. T. Mendes
Allison Del Giorno
...
Sébastien Bubeck
Ronen Eldan
Adam Tauman Kalai
Y. Lee
Yuan-Fang Li
AI4CEALMSyDa
367
512
0
20 Jun 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
4.9K
17,636
0
27 Feb 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Self-Instruct: Aligning Language Models with Self-Generated InstructionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALMSyDaLRM
757
2,804
0
20 Dec 2022
Large Language Models Are Reasoning Teachers
Large Language Models Are Reasoning TeachersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Namgyu Ho
Laura Schmid
Se-Young Yun
ReLMELMLRM
328
434
0
20 Dec 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsJournal of machine learning research (JMLR), 2022
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
1.3K
3,790
0
20 Oct 2022
Solving Quantitative Reasoning Problems with Language Models
Solving Quantitative Reasoning Problems with Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLMELMLRM
661
1,295
0
29 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
2.1K
17,490
0
04 Mar 2022
Towards Continual Knowledge Learning of Language Models
Towards Continual Knowledge Learning of Language Models
Joel Jang
Seonghyeon Ye
Sohee Yang
Joongbo Shin
Janghoon Han
Gyeonghun Kim
Stanley Jungkyu Choi
Minjoon Seo
CLLKELM
591
186
0
07 Oct 2021
AEDA: An Easier Data Augmentation Technique for Text Classification
AEDA: An Easier Data Augmentation Technique for Text ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Akbar Karimi
L. Rossi
Andrea Prati
169
185
0
30 Aug 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
821
3,918
0
20 Apr 2021
Attention is not not Explanation
Attention is not not ExplanationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Sarah Wiegreffe
Yuval Pinter
XAIAAMLFAtt
472
1,025
0
13 Aug 2019
What Does BERT Look At? An Analysis of BERT's Attention
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
613
1,826
0
11 Jun 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be PrunedAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
697
1,329
0
23 May 2019
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingNeural Information Processing Systems (NeurIPS), 2019
Qizhe Xie
Zihang Dai
Eduard H. Hovy
Minh-Thang Luong
Quoc V. Le
790
2,537
0
29 Apr 2019
Attention is not Explanation
Attention is not ExplanationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019
Sarthak Jain
Byron C. Wallace
FAtt
1.1K
1,523
0
26 Feb 2019
Contextual Augmentation: Data Augmentation by Words with Paradigmatic
  Relations
Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
Sosuke Kobayashi
177
656
0
16 May 2018
Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia
Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia
Xinya Du
Claire Cardie
KELM
260
173
0
15 May 2018
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk MinimizationInternational Conference on Learning Representations (ICLR), 2017
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
714
11,100
0
25 Oct 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Abigail Z. Jacobs
RALM
708
8,904
0
16 Jun 2016
Improving Neural Machine Translation Models with Monolingual Data
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
791
2,850
0
20 Nov 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
797
22,387
0
09 Mar 2015
1