ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.02049
  4. Cited By
Post Turing: Mapping the landscape of LLM Evaluation

Post Turing: Mapping the landscape of LLM Evaluation

3 November 2023
Alexey Tikhonov
Ivan P. Yamshchikov
    ELM
ArXivPDFHTML

Papers citing "Post Turing: Mapping the landscape of LLM Evaluation"

8 / 8 papers shown
Title
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
Fabrizio Davide
Pietro Torre
Andrea Gaggioli
Andrea Gaggioli
ELM
101
0
0
12 Dec 2024
Beyond Turing Test: Can GPT-4 Sway Experts' Decisions?
Beyond Turing Test: Can GPT-4 Sway Experts' Decisions?
Takehiro Takayanagi
Hiroya Takamura
Kiyoshi Izumi
Chung-Chi Chen
ELM
DeLMO
20
1
0
25 Sep 2024
PLUGH: A Benchmark for Spatial Understanding and Reasoning in Large
  Language Models
PLUGH: A Benchmark for Spatial Understanding and Reasoning in Large Language Models
Alexey Tikhonov
ELM
ReLM
LRM
18
0
0
03 Aug 2024
Humor Mechanics: Advancing Humor Generation with Multistep Reasoning
Humor Mechanics: Advancing Humor Generation with Multistep Reasoning
Alexey Tikhonov
Pavel Shtykovskiy
LRM
ReLM
21
1
0
12 May 2024
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
178
780
0
02 May 2023
Open-Domain Dialog Evaluation using Follow-Ups Likelihood
Open-Domain Dialog Evaluation using Follow-Ups Likelihood
Maxime De Bruyn
Ehsan Lotfi
Jeska Buhmann
Walter Daelemans
24
9
0
12 Sep 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1