> the tokens are actually generated by the user and the server never sees them (unblinded) before their first usage
Here is how I see it:
1. The user generates a token/nonce => T
2. The user blinds the token with secret blinding factor b => Blinded token TB = T*b
3. The user sends the blinded token for signing. The server signs it and returns it to the user => Signed blinded token TBS = Sign(TB)
4. The user unblinds the token (this does not break the signature) => Signed Unblinded token TS = TBS/b
5. The user sends TS for its search query.
The server signed TB, then received TS. Even if it logged that TB = user, it cannot link TS to TB, because it does not know the blinding factor b. Thus, it cannot link the search query with TS to the user.
The LLM UIs that integrate that kind of thing all have visible indicators when it's happening - in ChatGPT you would see it say "Analyzing..." while it ran Python code, and in Claude you would see the same message while it used JavaScript (in your browser) instead.
If you didn't see the "analyzing" message then no external tool was called.
Just a clarification, they tuned on the public training dataset, not the semi-private one. The 87.5% score was on the semi-private eval, which means the model was still able to generalize well.
That being said, the fact that this is not a "raw" base model, but one tuned on the ARC-AGI tests distribution takes away from the impressiveness of the result — How much ? — I'm not sure, we'd need the un-tuned base o3 model score for that.
In the meantime, comparing this tuned o3 model to other un-tuned base models is unfair (apples-to-oranges kind of comparison).
Clickbait is BLUF with a deceptive bottom line (BL). Clickbait is bad. You can choose to write in BLUF style without that.
In my experience, I only prefer "Classical philosophical writing" when I'm already convinced of reading the content (e.g. know the author, interested by the subject).
In almost all other cases, I prefer BLUF format: i.e. "get to the point, I'll read more if I'm intrigued".
That might be a bit too strict. I'd still expect my private repos (no forks involved) to be private, unless we discover another footnote in GH's docs in a few years ¯\_(ツ)_/¯
But I'll forget about using forks except for publicly contributing to public repos.
> Users should never be expected to know these gotchas for a feature called "private".
Yes, the principle of least astonishment[0] should apply to security as well.
> People don't believe it's possible for software to be secure
Rightfully so. You'd statistically be almost always right considering a software unsecure given enough time (for the vulnerabilities to be introduced then found).
> need a secondary defense to "protect them"
Nothing wrong with that. It's called Defense in Depth and is rather advised. Once you understand that security measures are not bulletproof, stacking them proves to be an easy way to increase protection.
The case of fail2ban is not trivial: reducing log noise is a great perk, and can indirectly help with monitoring (you'd more easily notice suspicious behaviour if it's the only thing on your logs), but it comes at the small cost of setting it up, and accepting the risk of having a shared IP unwillingly blocked.
Probably nitpicking but these types of measures are usually tricky to interpret because there is a high chance your indexes (maybe even rows) are still on PostgreSQL shared buffers and OS cache and might not reflect real usage performance.
To get a more "worst-case" measure, after your inserts and indexes creation, you can restart your database server + flush OS pages cache (e.g. drop_caches for Linux), then do the measure.
Sometimes the difference is huge, although I don't suspect it will be in this case.
Here is how I see it:
The server signed TB, then received TS. Even if it logged that TB = user, it cannot link TS to TB, because it does not know the blinding factor b. Thus, it cannot link the search query with TS to the user.