IIRC, ChatGPT is based on GPT3.5 (likely an even larger model) rather than GPT3. It's also been refined a bit using reinforcement learning.
I've noticed that when I ask ChatGPT to determine the type of a variable in a given code block, its reasoning has fewer holes than GPT3 for the same prompt. Stands to reason that other results will be similarly refined.
It also doesn't appear to have a token limit? Not sure how that feat was accomplished.