Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sample size of 1 but GPT-5 seems horrendous at coding?

My go to benchmark is a 3d snake game Claude does almost flawlessly (or at least in 3-4 iterations)

The prompt:

write a 3d snake game in js and html. you can use any libraries you want. the game still happens inside a single plane, left arrow turns the snake left, right arrow turns it right. the plane is black and there's a green grid. there are multiple rewards of random colors at a given time. each time a reward is eaten, it becomes the snake's new head. The camera follows the snake's head, it is above an a bit behind it, looking forward. When the snake moves right or left, the camera follows gradually left or right, no snap movements. write everything in a single html file.

EDIT: I'm not trying to shit on GPT-5, so many people here seem to be getting very good results, am I doing something wrong with my prompt?



This is what I got from your prompt in one shot with GPT-5 Thinking:

Game: https://chatgpt.com/canvas/shared/6895f722f2708191ac4a6d1645...

Conversation: https://chatgpt.com/share/6895f74a-0c5c-8004-b349-69da096531...

The controls are inverted for some reason and it could be a bit faster, but I fixed both of these easily with one prompt and here's the corrected version: https://chatgpt.com/canvas/shared/6895f82759f88191ba41c9fcd5...


Thanks, the issue was indeed not using explicitly the thinking model or they changed something over the weekend -- it's at least on par with Claude now.

EDIT: clearly better than Claude or any other model that I tried before. I had a bonus benchmark -- add a narrow triangle on the head of the snake that indicates the direction of movement, after a single iteration GPT-5 fixed it whereas Claude could never get the rotation of the triangle right, nor could o3 the last time I tried.


> My go to benchmark is a 3d snake game Claude does almost flawlessly (or at least in 3-4 iterations)

If you need to know how the snake game should look to get the code then Claude is not doing the work you are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: