We introduce GUIDE-LLM

We introduce GUIDE-LLM: a reporting checklist for using large language models (LLMs) in behavioral and social science.

Purpose
GUIDE-LLM is a reporting checklist designed by 80+ experts to improve transparency, reproducibility, and ethical accountability of LLM-based research in the behavioral and social sciences. In particular, GUIDE-LLM supports researchers in clearly describing how LLMs were used, why specific methodological choices were made, and what steps were taken to ensure responsible research practices.

Motivation
LLMs offer new opportunities to study human behavior, yet their rapidly evolving nature poses challenges for research rigor. For example:

The label “ChatGPT” can refer to different underlying models (e.g., GPT-4, GPT-4o), each with multiple versions often marked by timestamps (e.g., gpt-4o-2024-11-2).
LLMs have vastly different behavior when accessed via the official API or a web interface (due to differences in system prompts)
Outputs vary due to other parameters like "temperature" (which balances more random vs. deterministic outputs). Still, even with temperature=0, outputs can still vary due to hardware-level non-determinism.
Memorization (where LLMs recall their training data) can further challenge internal validity and lead to bias.

What the checklist looks like

GUIDE-LLM consists of 14 items for reporting. It includes aspects such as:

Where and how the model is used
Model configuration and prompting

How it was developed
GUIDE-LLM was developed using a two-round Delphi process in collaboration with a large-expert panel around the world (N=80 experts), from different fields (e.g., psychology, political science, economics, management) with strong expertise in LLM use. All items included in the checklist received strong consensus among the expert panel (>2/3 of the votes).

Read the checklist

We introduce GUIDE-LLM

What are you looking for?