More

curioussquirrel · 2025-10-30T22:14:16 1761862456

Photoshop now has a bunch of features that get used in professional environments. And in the end user space, facial recognition or magic eraser are features in apps like Google Photos that people actively use and like. People probably don't care that it's AI under the hood, in fact they probably don't even realize.

There is a lot of unchecked hype, but that doesn't mean there is no substance.

hbcondo714 · 2025-10-30T23:50:28 1761868228

Similar, I'm accustomed to using the Magic Wand tool in Paint[1] and Pinta[2] to select pixels based on color. I can't find this anywhere in Affinity.

[1] https://www.getpaint.net/doc/latest/MagicWand.html

[2] https://www.pinta-project.com/user-guide/wand/

isodev · 2025-10-31T06:38:20 1761892700

This is not “AI” :)

isodev · 2025-10-31T06:37:10 1761892630

When people say AI, they refer to LLMs. Your examples are models in general which have been around for a lot longer before the OpenAI and techbros had the AGI wet dream.

zajio1am · 2025-10-31T14:27:37 1761920857

In the context of graphics, AI is usually associated with Stable diffusion / Midjourney, which are not LLMs.

curioussquirrel · 2025-10-27T17:04:05 1761584645

100% agree. Using it to polish your sentences or fix small grammar/syntax issues is a great use case in my opinion. I specifically ask it not to completely rewrite or change my voice.

It can also double as a peer reviewer and point out potential counterarguments, so you can address them upfront.

philipwhiuk · 2025-10-29T02:30:53 1761705053

> I specifically ask it not to completely rewrite or change my voice.

And LLMs always do what you say, absolutely always, no issues there.

curioussquirrel · 2025-10-27T10:54:07 1761562447

Interesting experiment, but I'd say aggregating the scores across models is far from ideal. Gemini 1.5 Flash got close-to-perfect scores on most languages (probably boils down to small variances in temp/top_k and statistical error). Small models are generally quite bad at non-English languages and tank the overall performance.

BTW, newer generations of models seem to have made some real progress in multilingual performance.

curioussquirrel · 2025-10-23T05:17:46 1761196666

Looking forward to Louis Rossmann's reaction. Wouldn't be surprised if this leads to a lawsuit over monopolistic behavior - this is clearly abusing their dominant position in the browser space to eliminate competitors in photos sharing.

skrebbel · 2025-10-23T05:45:07 1761198307

Who is that and why is his reaction relevant?

jeroenhd · 2025-10-23T06:28:41 1761200921

He's a right-to-repair activist Youtuber who is quite involved in GrayJay, another app made by this company, which is a video player client for other platforms like YouTube.

I'm not sure why his reaction would be relevant, though. It'll just be another rant about how Google has too much control like he's done in the past. He may be right, but there's nothing new to say.

archargelod · 2025-10-23T09:06:16 1761210376

He wasn't just involved with GrayJay, he's actually a member of FUTO - the company behind Immich and GrayJay. Now read grandparent comment one more time:

> Wouldn't be surprised if this leads to a lawsuit over monopolistic behavior

His reaction also matters because he's basically the public face for the company on YouTube and has a huge following. You've probably seen a bunch of social media accounts with the "clippy" character as their avatar. That's a movement started by Louis Rossman.

TiredOfLife · 2025-10-23T23:50:34 1761263434

Seems that Rossmann left FUTO in february and started his own foundation in march

curioussquirrel · 2025-10-23T05:04:34 1761195874

I wonder how much more telemetry and behavioral data Atlas needs to collect on its users given the rapid response system. And how well is all the session data stripped of sensitive information when transferred.

curioussquirrel · 2025-10-14T16:33:13 1760459593

This is a very good answer and I'm commenting only to bring more attention to it apart from voting up. Well put!

curioussquirrel · 2025-10-14T05:22:51 1760419371

Yes, but it would hurt its contextual understanding and effectively reduce the context window several times.

viraptor · 2025-10-14T09:50:08 1760435408

Only in the current most popular architectures. Mamba and RWKV style LLMs may suffer a bit but don't get a reduced context in the same sense.

curioussquirrel · 2025-10-14T16:37:13 1760459833

You're right. There was also an experiment in Meta which tokenized bytes directly and it didn't hurt performance much in very small models.

curioussquirrel · 2025-10-14T05:16:15 1760418975

Thanks for the explanation and for the tokenizer playground link!

curioussquirrel · 2025-10-14T05:13:37 1760418817

Why test for something? I find it fascinating if something starts being good at task it is "explicitly not designed for" (which I don't necessarily agree with - it's more of a side effect of their architecture).

I also don't agree that nobody is using this for - there are real life use cases today, such as people trying to find meaning of misspelled words.

On a side note, I remember testing Claude 3.7 with the classic "R's in the word strawberry" question through their chat interface, and given that it's really good at tool calls, it actually created a website to a) count it with JavaScript, b) visualize it on a page. Other models I tested for the blog post were also giving me python code for solving the issue. This is definitely already a thing and it works well for some isolated problems.

viraptor · 2025-10-14T07:21:36 1760426496

> such as people trying to find meaning of misspelled words.

That worked just fine for quite a while. There's apparently enough misspelling in the training data, we don't need precise spelling for it. You can literally write drunken gibberish and it will work.

curioussquirrel · 2025-10-14T16:35:36 1760459736

True. But does that scale to less common words? Or to other languages than English?

viraptor · 2025-10-15T14:41:40 1760539300

> The phrase "Pweiz mo cco ejst w sprdku zmi?" appears to be a distorted or misspelled version of a Polish sentence. The closest meaningful phrase in Polish is "Powiedz mi co jest w środku ziemi?" which translates to "Tell me what is inside the Earth?"

I'm not sure I could figure out the mangled words there.

curioussquirrel · 2025-10-14T05:07:10 1760418430

Even GPT 3.5 is okay (but far from great) at Base64, especially shorter sequences of English or JSON data. Newer models might be post-trained on Base64-specific data, but I don't believe it was the case for 3.5. My guess is that as you say, given the abundance of examples on the internet, it became one of the emergent capabilities, in spite of its design.

ACCount37 · 2025-10-14T05:38:49 1760420329

No one does RL for better base64 performance. LLMs are just superhuman at base64, as a natural capability.

If an LLM wants a message to be read only by another LLM? Base64 is occasionally chosen as an obfuscation method of choice. Which is weird for a number of reasons.

necovek · 2025-10-14T17:28:02 1760462882

Why are you so confident about this? I am honestly interested if you were part of any one LLM training data collection teams because that's the only way to be so certain.

It's trivial to generate a full mapping of all base64 4-byte sequences which map to all 3-byte 8-bit sequences (there is only 8^3 of different "tokens", or 2048), and especially to any sequences coming out as ASCII (obviously even fewer). If I was building a training set, I would include the mapping in multiple shapes and formats, because why not?

If it's an emergent "property", have you tried asking an LLM to do a base48 for instance? Or maybe even something crazier like base55 (keeping it a subset of base64 set).

ACCount37 · 2025-10-14T20:58:59 1760475539

The conventional wisdom is that real world text is the most valuable pre-training data.

There is some experimentation on using algorithmically generated synthetic data in pre-training, as well as some intentional inclusions of "weird" data - like CSV logs of weather readings. But generally, it's seen as computationally inefficient - compared to "normal" pre-training done on natural data.

In a world where compute is much cheaper and getting new data is much more expensive, I would expect this kind of thing to be pursued more. We're heading for that world. But we aren't there yet.

I haven't experimented with baseN encodings myself, no. But if I were to down the expectations in advance:

1. Base64 is by far the best-known baseN encoding in LLMs.

2. This is driven mainly by how well represented meaningful base64 strings are in the natural "scraped web" datasets. LLMs learn base64 the way they learn languages.

3. Every LLM pre-trained on "scraped web" data will be somewhat capable of reading and writing base64.

4. Base64-encoded text is easier to read for an LLM than encoded non-text binary data.

5. The existence of a strict, learnable "4 characters -> 3 bytes" map is quite beneficial, but not vital.

Timwi · 2025-10-16T07:28:44 1760599724

> It's trivial to generate a full mapping of all base64 4-byte sequences which map to all 3-byte 8-bit sequences (there is only 8^3 of different "tokens", or 2048)

How did you get this number? The correct number is 64^4 = 256^3 = 16777216.

necovek · 2025-10-16T17:20:30 1760635230

My bad, just a total brainfart on my part: you are completely right! :)

necovek · 2025-10-14T20:02:40 1760472160

For kicks, I've tried this out with ChatGPT5: it nicely explained how it will use A-Za-z0123 as the alphabet for base55, and then duly went and produced a string with a 4 in it. It's not even base64, so it's all sorts of messy :)