Hacker Newsnew | past | comments | ask | show | jobs | submit | more curioussquirrel's commentslogin

Photoshop now has a bunch of features that get used in professional environments. And in the end user space, facial recognition or magic eraser are features in apps like Google Photos that people actively use and like. People probably don't care that it's AI under the hood, in fact they probably don't even realize.

There is a lot of unchecked hype, but that doesn't mean there is no substance.


Similar, I'm accustomed to using the Magic Wand tool in Paint[1] and Pinta[2] to select pixels based on color. I can't find this anywhere in Affinity.

[1] https://www.getpaint.net/doc/latest/MagicWand.html

[2] https://www.pinta-project.com/user-guide/wand/


This is not “AI” :)


When people say AI, they refer to LLMs. Your examples are models in general which have been around for a lot longer before the OpenAI and techbros had the AGI wet dream.


In the context of graphics, AI is usually associated with Stable diffusion / Midjourney, which are not LLMs.


100% agree. Using it to polish your sentences or fix small grammar/syntax issues is a great use case in my opinion. I specifically ask it not to completely rewrite or change my voice.

It can also double as a peer reviewer and point out potential counterarguments, so you can address them upfront.


> I specifically ask it not to completely rewrite or change my voice.

And LLMs always do what you say, absolutely always, no issues there.


Interesting experiment, but I'd say aggregating the scores across models is far from ideal. Gemini 1.5 Flash got close-to-perfect scores on most languages (probably boils down to small variances in temp/top_k and statistical error). Small models are generally quite bad at non-English languages and tank the overall performance.

BTW, newer generations of models seem to have made some real progress in multilingual performance.


Looking forward to Louis Rossmann's reaction. Wouldn't be surprised if this leads to a lawsuit over monopolistic behavior - this is clearly abusing their dominant position in the browser space to eliminate competitors in photos sharing.


Who is that and why is his reaction relevant?


He's a right-to-repair activist Youtuber who is quite involved in GrayJay, another app made by this company, which is a video player client for other platforms like YouTube.

I'm not sure why his reaction would be relevant, though. It'll just be another rant about how Google has too much control like he's done in the past. He may be right, but there's nothing new to say.


He wasn't just involved with GrayJay, he's actually a member of FUTO - the company behind Immich and GrayJay. Now read grandparent comment one more time:

> Wouldn't be surprised if this leads to a lawsuit over monopolistic behavior

His reaction also matters because he's basically the public face for the company on YouTube and has a huge following. You've probably seen a bunch of social media accounts with the "clippy" character as their avatar. That's a movement started by Louis Rossman.


Seems that Rossmann left FUTO in february and started his own foundation in march


I wonder how much more telemetry and behavioral data Atlas needs to collect on its users given the rapid response system. And how well is all the session data stripped of sensitive information when transferred.


This is a very good answer and I'm commenting only to bring more attention to it apart from voting up. Well put!


Yes, but it would hurt its contextual understanding and effectively reduce the context window several times.


Only in the current most popular architectures. Mamba and RWKV style LLMs may suffer a bit but don't get a reduced context in the same sense.


You're right. There was also an experiment in Meta which tokenized bytes directly and it didn't hurt performance much in very small models.


Thanks for the explanation and for the tokenizer playground link!


Why test for something? I find it fascinating if something starts being good at task it is "explicitly not designed for" (which I don't necessarily agree with - it's more of a side effect of their architecture).

I also don't agree that nobody is using this for - there are real life use cases today, such as people trying to find meaning of misspelled words.

On a side note, I remember testing Claude 3.7 with the classic "R's in the word strawberry" question through their chat interface, and given that it's really good at tool calls, it actually created a website to a) count it with JavaScript, b) visualize it on a page. Other models I tested for the blog post were also giving me python code for solving the issue. This is definitely already a thing and it works well for some isolated problems.


> such as people trying to find meaning of misspelled words.

That worked just fine for quite a while. There's apparently enough misspelling in the training data, we don't need precise spelling for it. You can literally write drunken gibberish and it will work.


True. But does that scale to less common words? Or to other languages than English?


> The phrase "Pweiz mo cco ejst w sprdku zmi?" appears to be a distorted or misspelled version of a Polish sentence. The closest meaningful phrase in Polish is "Powiedz mi co jest w środku ziemi?" which translates to "Tell me what is inside the Earth?"

I'm not sure I could figure out the mangled words there.


Even GPT 3.5 is okay (but far from great) at Base64, especially shorter sequences of English or JSON data. Newer models might be post-trained on Base64-specific data, but I don't believe it was the case for 3.5. My guess is that as you say, given the abundance of examples on the internet, it became one of the emergent capabilities, in spite of its design.


No one does RL for better base64 performance. LLMs are just superhuman at base64, as a natural capability.

If an LLM wants a message to be read only by another LLM? Base64 is occasionally chosen as an obfuscation method of choice. Which is weird for a number of reasons.


Why are you so confident about this? I am honestly interested if you were part of any one LLM training data collection teams because that's the only way to be so certain.

It's trivial to generate a full mapping of all base64 4-byte sequences which map to all 3-byte 8-bit sequences (there is only 8^3 of different "tokens", or 2048), and especially to any sequences coming out as ASCII (obviously even fewer). If I was building a training set, I would include the mapping in multiple shapes and formats, because why not?

If it's an emergent "property", have you tried asking an LLM to do a base48 for instance? Or maybe even something crazier like base55 (keeping it a subset of base64 set).


The conventional wisdom is that real world text is the most valuable pre-training data.

There is some experimentation on using algorithmically generated synthetic data in pre-training, as well as some intentional inclusions of "weird" data - like CSV logs of weather readings. But generally, it's seen as computationally inefficient - compared to "normal" pre-training done on natural data.

In a world where compute is much cheaper and getting new data is much more expensive, I would expect this kind of thing to be pursued more. We're heading for that world. But we aren't there yet.

I haven't experimented with baseN encodings myself, no. But if I were to down the expectations in advance:

1. Base64 is by far the best-known baseN encoding in LLMs.

2. This is driven mainly by how well represented meaningful base64 strings are in the natural "scraped web" datasets. LLMs learn base64 the way they learn languages.

3. Every LLM pre-trained on "scraped web" data will be somewhat capable of reading and writing base64.

4. Base64-encoded text is easier to read for an LLM than encoded non-text binary data.

5. The existence of a strict, learnable "4 characters -> 3 bytes" map is quite beneficial, but not vital.


> It's trivial to generate a full mapping of all base64 4-byte sequences which map to all 3-byte 8-bit sequences (there is only 8^3 of different "tokens", or 2048)

How did you get this number? The correct number is 64^4 = 256^3 = 16777216.


My bad, just a total brainfart on my part: you are completely right! :)


For kicks, I've tried this out with ChatGPT5: it nicely explained how it will use A-Za-z0123 as the alphabet for base55, and then duly went and produced a string with a 4 in it. It's not even base64, so it's all sorts of messy :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: