Prompt Observatory
Weekly benchmarks tracking how Claude, GPT-4o, Gemini, and Llama handle a curated set of real-world prompts. Every score comes from a structured rubric โ no subjective ratings, no marketing claims.
Updated weekly ยท 5 prompts in corpus
Recent runs
Top movers (latest run)
| Prompt | Model | Change | Direction |
|---|---|---|---|
| Explain a recursive function in plain English | gpt-4o | -2 | degraded |
Prompt corpus
- Summarise a financial statement for a non-finance audienceanalysis
4 models ยท added 2026-05-17
- Explain a recursive function in plain Englishcode
4 models ยท added 2026-05-17
- Rewrite a legal clause in plain English without losing meaninglegal
8 models ยท added 2026-05-23
- Handle a common sales objection without being pushysales
8 models ยท added 2026-05-23
- Write a cold email from a SaaS founder to a prospectwriting
4 models ยท added 2026-05-17