ExploringDatabyLLMs Runs
Published benchmark runs, compare notes, and dashboards across Claude, Codex, and Gemini. Reports open through the Markdown viewer, dashboards open as live HTML, and SQL / JSON artifacts link to the public GitHub repo.
2026-03-24
q001_hops_per_day
q005_worst_winter_carrier_origin_pair
Compare
q006_peak_aa_delay_month_network
Compare
gemini / gemini-3.1-pro-preview
2026-03-23
gemini / gemini-3.1-pro-preview
2026-03-19
q001_hops_per_day
gemini / gemini-3-flash-preview
gemini / gemini-3.1-pro-preview
q007_hops_per_day_semantic_discovery
2026-03-17
gemini / gemini-3.1-pro-preview
q002_top_carrier_by_flights_leadership
Compare
gemini / gemini-3.1-pro-preview
q003_delta_atl_departure_delay_hotspots
Compare
gemini / gemini-3.1-pro-preview
q004_worst_origin_airport_otp_thresholded
gemini / gemini-3.1-pro-preview
q005_worst_winter_carrier_origin_pair
gemini / gemini-3.1-pro-preview