Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 664 papers shown
Title
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
21
0
0
16 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
52
0
0
14 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
47
0
0
12 Mar 2025
Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs
Mayank Singh
Abhijeet Kumar
Sasidhar Donaparthi
Gayatri Karambelkar
40
0
0
12 Mar 2025
Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models
Anar Yeginbergen
Maite Oronoz
Rodrigo Agerri
38
0
0
07 Mar 2025
Statistical Deficiency for Task Inclusion Estimation
Loïc Fosse
Frédéric Béchet
Benoit Favre
Géraldine Damnati
Gwénolé Lecorvé
Maxime Darrin
Philippe Formont
Pablo Piantanida
79
0
0
07 Mar 2025
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models
Jie He
Bo Peng
Yi-Lun Liao
Qun Liu
Deyi Xiong
58
8
0
06 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
74
2
0
03 Mar 2025
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Yao Wang
Mingxuan Cui
Arthur Jiang
56
0
0
03 Mar 2025
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Perla Al Almaoui
Pierrette Bouillon
Simon Hengchen
57
0
0
28 Feb 2025
Advancements in Natural Language Processing for Automatic Text Summarization
Nevidu Jayatilleke
Ruvan Weerasinghe
Nipuna Senanayake
97
1
0
27 Feb 2025
Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets
Tohida Rehman
Soumabha Ghosh
Kuntal Das
Souvik Bhattacharjee
Debarshi Kumar Sanyal
S. Chattopadhyay
50
0
0
26 Feb 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
77
0
0
26 Feb 2025
BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM
Xiaoyu Chen
Changde Du
Che Liu
Yizhe Wang
Huiguang He
65
0
0
24 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
66
0
0
24 Feb 2025
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge
Heegyu Kim
Taeyang Jeon
Seungtaek Choi
Jihoon Hong
Dongwon Jeon
...
Jisu Bae
Chihoon Lee
Yunseo Kim
Jinsung Park
Hyunsouk Cho
ELM
55
0
1
23 Feb 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Ivoline Ngong
Swanand Kadhe
Hao Wang
K. Murugesan
Justin D. Weisz
Amit Dhurandhar
K. Ramamurthy
44
2
0
22 Feb 2025
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral
Shivani Kumar
David Jurgens
LRM
41
0
0
21 Feb 2025
RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization
Jinhu Qi
Shuai Yan
Yibo Zhang
Wentao Zhang
R. L. Jin
Y. Hu
Ke Wang
3DV
47
1
0
21 Feb 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
201
0
0
21 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
55
1
0
21 Feb 2025
A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond
Shreya Shukla
Jose Torres
Abhijit Mishra
Jacek Gwizdka
Shounak Roychowdhury
43
0
0
20 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
A. Laio
Sebastian Goldt
68
8
0
17 Feb 2025
PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent
Jiateng Liu
Lin Ai
Zizhou Liu
Payam Karisani
Zheng Hui
May Fung
Preslav Nakov
Julia Hirschberg
Heng Ji
DiffM
83
4
0
17 Feb 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
LRM
91
1
0
17 Feb 2025
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
Chengyan Wu
Bolei Ma
Y. Liu
Zheyu Zhang
Ningyuan Deng
Y. Li
Baolan Chen
Yi Zhang
Barbara Plank
Yun Xue
42
0
0
17 Feb 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
68
0
0
14 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
98
4
0
12 Feb 2025
Unsupervised Translation of Emergent Communication
Ido Levy
Orr Paradise
Boaz Carmeli
Ron Meir
S. Goldwasser
Yonatan Belinkov
72
0
0
11 Feb 2025
A Large-Scale Benchmark for Vietnamese Sentence Paraphrases
Sang Quang Nguyen
Kiet Van Nguyen
60
0
0
11 Feb 2025
LegalViz: Legal Text Visualization by Text To Diagram Generation
Eri Onami
Taiki Miyanishi
Koki Maeda
Shuhei Kurita
AILaw
66
1
0
10 Feb 2025
Learning to Substitute Words with Model-based Score Ranking
Hongye Liu
Ricardo Henao
41
0
0
09 Feb 2025
On Memory Construction and Retrieval for Personalized Conversational Agents
Zhuoshi Pan
Qianhui Wu
Huiqiang Jiang
Xufang Luo
Hao Cheng
...
Y. Yang
Chin-Yew Lin
H. V. Zhao
Lili Qiu
Jianfeng Gao
RALM
56
3
0
08 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Jing Yang
Max Glockner
Anderson de Rezende Rocha
Iryna Gurevych
LRM
62
1
0
07 Feb 2025
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot
Xuejiao Zhao
Siyan Liu
Su-Yin Yang
C. Miao
118
4
0
06 Feb 2025
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
Martin Mundt
Anaelia Ovalle
Felix Friedrich
A Pranav
Subarnaduti Paul
Manuel Brack
Kristian Kersting
William Agnew
202
0
0
05 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
64
1
0
02 Feb 2025
Multilingual State Space Models for Structured Question Answering in Indic Languages
A. Vats
Rahul Raja
Mrinal Mathur
Vinija Jain
Aman Chadha
68
1
0
01 Feb 2025
Inkspire: Supporting Design Exploration with Generative AI through Analogical Sketching
David Chuan-En Lin
Hyeonsu B Kang
Nikolas Martelaro
A. Kittur
Yan-Ying Chen
Matthew K. Hong
97
3
0
30 Jan 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
56
0
0
30 Jan 2025
Fake News Detection After LLM Laundering: Measurement and Explanation
Rupak Kumar Das
Jonathan Dodge
81
0
0
29 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
41
5
0
28 Jan 2025
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation
Nishat Raihan
Antonios Anastasopoulos
Marcos Zampieri
ELM
43
5
0
28 Jan 2025
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Sami Baral
L. Lucy
Ryan Knight
Alice Ng
Luca Soldaini
Neil T. Heffernan
Kyle Lo
41
3
0
28 Jan 2025
Speech Translation Refinement using Large Language Models
Huaixia Dou
Xinyu Tian
Xinglin Lyu
Jie Zhu
Junhui Li
Lifan Guo
68
0
0
28 Jan 2025
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators
Dingkang Yang
Dongling Xiao
Jinjie Wei
Mingcheng Li
Zhaoyu Chen
Ke Li
L. Zhang
HILM
92
3
0
28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
73
2
0
28 Jan 2025
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
Zhongpu Chen
Y. Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
41
0
0
28 Jan 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
68
14
0
28 Jan 2025
Previous
1
2
3
4
5
...
12
13
14
Next