ExploringDatabyLLMs Runs

Published benchmark runs, compare notes, and dashboards across Claude, Codex, and Gemini. Reports open through the Markdown viewer, dashboards open as live HTML, and SQL / JSON artifacts link to the public GitHub repo.

2026-03-26

2026-03-24

2026-03-23

2026-03-20

2026-03-19

2026-03-18

2026-03-17

q004_worst_origin_airport_otp_thresholded

gemini / gemini-3.1-pro-preview
run-001

q005_worst_winter_carrier_origin_pair

gemini / gemini-3.1-pro-preview
run-001