More

catalypso · on Feb 15, 2025

> the tokens are actually generated by the user and the server never sees them (unblinded) before their first usage

Here is how I see it:

  1. The user generates a token/nonce => T

  2. The user blinds the token with secret blinding factor b => Blinded token TB = T*b

  3. The user sends the blinded token for signing. The server signs it and returns it to the user => Signed blinded token TBS = Sign(TB)

  4. The user unblinds the token (this does not break the signature) => Signed Unblinded token TS = TBS/b

  5. The user sends TS for its search query.

The server signed TB, then received TS. Even if it logged that TB = user, it cannot link TS to TB, because it does not know the blinding factor b. Thus, it cannot link the search query with TS to the user.

catalypso · on Jan 1, 2025

I just tried it and I'm actually surprised with how well they work even with base64 encoded inputs.

This is assuming they don't call an external pre-processing decoding tool.

simonw · on Jan 1, 2025

The LLM UIs that integrate that kind of thing all have visible indicators when it's happening - in ChatGPT you would see it say "Analyzing..." while it ran Python code, and in Claude you would see the same message while it used JavaScript (in your browser) instead.

If you didn't see the "analyzing" message then no external tool was called.

catalypso · on Dec 22, 2024

Just a clarification, they tuned on the public training dataset, not the semi-private one. The 87.5% score was on the semi-private eval, which means the model was still able to generalize well.

That being said, the fact that this is not a "raw" base model, but one tuned on the ARC-AGI tests distribution takes away from the impressiveness of the result — How much ? — I'm not sure, we'd need the un-tuned base o3 model score for that.

In the meantime, comparing this tuned o3 model to other un-tuned base models is unfair (apples-to-oranges kind of comparison).

catalypso · on Nov 8, 2024

This looks cool!

I now almost exclusively get my HN feed through a simple script I wrote to get desc sorted posts by score or trend (score/time): https://github.com/faroukfaiz10/hackernews-homepage

The result looks something like this ({score/time} - {score} - {link} - {comments link}):

  59 - 1478 - Passport Photos - https://maxsiedentopf.com/passport-photos/ - https://news.ycombinator.com/item?id=42069646

  16 - 790 - Useful built-in macOS command-line utilities - https://weiyen.net/articles/useful-macos-cmd-line-utilities- https://news.ycombinator.com/item?id=42057431

  ...

catalypso · on Aug 19, 2024

Not magic-wormhole compatible, but saw these shared on other comments:

- https://wormhole.app

- https://sendfiles.dev

- https://file.pizza

- https://winden.app

catalypso · on July 29, 2024

Clickbait is BLUF with a deceptive bottom line (BL). Clickbait is bad. You can choose to write in BLUF style without that.

In my experience, I only prefer "Classical philosophical writing" when I'm already convinced of reading the content (e.g. know the author, interested by the subject).

In almost all other cases, I prefer BLUF format: i.e. "get to the point, I'll read more if I'm intrigued".

catalypso · on July 24, 2024

> I'll be calling "private" repos "unlisted"

That might be a bit too strict. I'd still expect my private repos (no forks involved) to be private, unless we discover another footnote in GH's docs in a few years ¯\_(ツ)_/¯

But I'll forget about using forks except for publicly contributing to public repos.

> Users should never be expected to know these gotchas for a feature called "private".

Yes, the principle of least astonishment[0] should apply to security as well.

[0] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...

catalypso · on June 16, 2024

> People don't believe it's possible for software to be secure

Rightfully so. You'd statistically be almost always right considering a software unsecure given enough time (for the vulnerabilities to be introduced then found).

> need a secondary defense to "protect them"

Nothing wrong with that. It's called Defense in Depth and is rather advised. Once you understand that security measures are not bulletproof, stacking them proves to be an easy way to increase protection.

The case of fail2ban is not trivial: reducing log noise is a great perk, and can indirectly help with monitoring (you'd more easily notice suspicious behaviour if it's the only thing on your logs), but it comes at the small cost of setting it up, and accepting the risk of having a shared IP unwillingly blocked.

catalypso · on Dec 16, 2023

Adding two resources to the mix:

1. https://growth.design: case studies + cognitive biases & principles that affect your UX

2. https://lawsofux.com/: a collection of best practices that designers can consider when building user interfaces.

catalypso · on Aug 12, 2023

Thanks for the effort.

Probably nitpicking but these types of measures are usually tricky to interpret because there is a high chance your indexes (maybe even rows) are still on PostgreSQL shared buffers and OS cache and might not reflect real usage performance.

To get a more "worst-case" measure, after your inserts and indexes creation, you can restart your database server + flush OS pages cache (e.g. drop_caches for Linux), then do the measure.

Sometimes the difference is huge, although I don't suspect it will be in this case.

ElatedOwl · on Aug 12, 2023

imo a properly configured Postgres server should have enough RAM to keep any hot data in cache. The cached path is the accurate measurement.