Aug 7, 20252 min read

GPT 5

Summary

We pushed GPT-5 through planet gen, challenge mode, and a two-part business reasoning test to see how far the new model actually goes.

Episode context

GPT-5 landed on the channel’s anniversary, so we put it through our standard gauntlet: planet generation with 3JS, a first-person challenge mode, and a two-part business reasoning workflow that tests retrieval, synthesis, and reporting.

Summary

The initial planet render was one of the best first attempts we’ve seen, with animated clouds and biome variation right out of the gate. After feedback, it added the requested controls but briefly broke the terrain and triggered a WebGL error. Once corrected, the final result delivered the best planet we’ve seen so far, with strong shaders, responsive sliders, and terrain controls that finally respected the Minecraft-style constraints in our prompt.

Challenge mode still fell short on scale and object parenting, but it preserved the planet context and kept the full control panel—showing progress even if it didn’t fully clear the test. On the business reasoning side, GPT-5 did a thorough retrieval pass, then produced a robust HTML report with charts, sources, and a structured recommendation framework. The analysis felt more human and more specific than past runs, with novel insights and useful operational framing.

Key takeaways

  • GPT-5 produced the strongest planet generation we’ve seen, especially after iterative feedback.
  • The model handled complex shader and control requirements, even when intermediate steps broke terrain.
  • Challenge mode still struggles with scale and object parenting, but retained core planet controls.
  • Business reasoning was a standout: richer sourcing, useful charts, and more human-like synthesis.
  • The analysis emphasized “spiky intelligence,” reinforcing that the best model depends on the task.

Highlight moments

  • 01:42 — One of the best first planet attempts we’ve seen
  • 03:12 — Control sliders land, but terrain breaks and a WebGL error appears
  • 04:35 — Final planet pass becomes the best result to date
  • 08:08 — Challenge mode shows progress but still misses scale and parenting
  • 12:38 — GPT-5 delivers a full HTML report with charts, sources, and novel takeaways

Scorecard

Coming soon.

In our words

GPT-5 feels like a real step forward in both creative generation and analytical usefulness. The planet test crossed a threshold we’ve been chasing for a year, and the reasoning workflow produced a structured, source-backed report that felt closer to how a strong analyst would work. It’s not flawless—challenge mode still exposes the same scale and physics gaps—but the combination of stronger generation and better reasoning makes this a compelling upgrade.