Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies

10 March 2025

Papers citing "Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies"

9 / 9 papers shown

LettuceDetect: A Hallucination Detection Framework for RAG Applications

Adam Kovacs

Gábor Recski

201

24 Feb 2025

How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain SimulationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

199

20 Feb 2025

Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

574

18 Feb 2025

Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary PerceptionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

283

17 Feb 2025

LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison

340

09 Feb 2025

Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization

778

06 Feb 2025

SOK: Exploring Hallucinations and Security Risks in AI-Assisted Software Development with Insights for LLM Deployment

304

31 Jan 2025

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

224

20 Jan 2025

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

222

17 Jan 2025