Hacker Newsnew | past | comments | ask | show | jobs | submit | bturtel's commentslogin

Great question! It's probabilistic so not really "right vs wrong" on any single question, but who better estimated the likelihood. One big difference shows up when there's no useful context - we ran the same eval WITHOUT including any useful up-to-date context with questions. In this case, GPT-5 stays overconfident and its BSS drops to -11.3% (vs -4.3% ours) - worse than just guessing the base rate. So one advantage of the RL training is just learning to know what you don't know, and identify when there's real signal.

Great question!

The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).

Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.


We're working on a follow up paper now to show similar results with larger models!


Great read! Thanks for sharing.


This could be huge. IIUC this is basically an AI-enabled version of MyFitnessPal, which has like 200M+ users, but with a massively streamlined user experience. Great idea.


This looks really cool - the UI in particular feels really approachable and polished. I like how you detect and call out "What you're doing wrong" to help build awareness of unhelpful thought patterns. Upvoted!

I recently launched something pretty similar (www.pensiveapp.com) but we took a very different approach (and I think the space is huge). Interesting to see how differently you approached the problem.


This is really cool - I think its really helpful in difficult conversations when you can encourage people to choose a single branch / claim of the argument and stick to resolving that before confounding by mixing in other claims. Personally, I'd love to be able to see a visual branching of different arguments / counterarguments, with some visual indicator for how much support each has.


Thank you. Yes this was my exact vision for it. Like a spider's web so that you can easily see which objections to any claims. And yes with some kind of voting feature. I just figured this would be an easy place to start and see if people were interested in it.


I think this has a TON of potential. Situations like these are very non-obvious and anxiety-inducing for lots of people, so if you can make this a way for people to gain proficiency and confidence at navigating tricky social interactions, it could be a very powerful value prop. My only feedback would be that it took too long to get into the first challenge - lots of instructions / introduction / scene setting. Well done!


Thank you for the kind words, vision, and feedback! Will be thinking more along the direction of true 'life situation rehearsal.'

Re: taking too long, I 100% agree. Wrestled with what to cut. Do you think skipping all the setup screens and story intro would have worked well for you, dropping right into Vincent('s missed birthday)?


I like teaching by doing the best. Once I started playing the game I felt hooked so getting them to that first speaking opportunity will draw them in as they learn.


Yea, I think so - I could imagine this being really streamlined by just dropped me immediately into a conversation, with maybe the goal just written on a screen somewhere - no setup, no storyline, etc. I guess it just depends if most of your users are there for a gameplay experience vs a "practice" experience.


Thanks! Great questions. We just launched - anecdotally we've had really great feedback from early users, but we're working with PhDs in the field to design an external validation study while tracking user-reported outcomes. We've also heard from a few providers that have started recommending Pensive to their clients for on-demand support between sessions. We see Pensive as complementary to therapy for some users and a standalone tool for others. Many people who don't need therapy can still benefit significantly from consistent, evidence-based practice. The key is making these proven techniques more accessible. Like physical health, most people don’t need a medical intervention - they need to exercise, but sorting through the research and implementing effective practice is a big barrier. Our focus is on delivering established practices in a convenient format with just a few minutes of guided conversation a day.


This is very cool. Reminds me of Quantum Country (https://news.ycombinator.com/item?id=30467585) but for everything else in life.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: