yusufozkan's comments

yusufozkan · 2026-02-28T20:50:28 1772311828

This is the same company that started as a nonprofit dedicated to open AI safety research, then became a capped-profit entity, then effectively closed-source, then dropped the cap, and is now pursuing full for-profit conversion. Every single guardrail they've set for themselves has been quietly revised or removed once it became inconvenient. Anyone want to bet on how long those exclusions last?

cebert · 2026-02-28T21:17:25 1772313445

Money always wins

zoklet-enjoyer · 2026-02-28T21:32:22 1772314342

The comment below mine is flagged but it shouldn't be. I believe Annie Altman.

xvector · 2026-02-28T22:40:59 1772318459

I used to write off Annie's statements as mad raving, but the more I see how Sam acts the more I'm starting to think she might be telling the truth after all.

jiggawatts · 2026-02-28T21:21:23 1772313683

Those exclusions are very carefully worded to sound iron-clad while actually having the strength of wet tissue paper.

yusufozkan · 2025-04-05T19:14:25 1743880465

> while pre-training our Llama 4 Behemoth model using FP8 and 32K GPUs

I thought they used a lot more GPUs to train frontier models (e.g. xAi training on 100k). Can someone explain why they are using so few?

joaogui1 · 2025-04-05T19:47:22 1743882442

I don't want to hunt the details on each of theses releases, but

* You can use less GPUs if you decrease batch size and increase number of steps, which would lead to a longer training time

* FP8 is pretty efficient, if Grok was trained with BF16 then LLama 4 should could need less GPUs because of that

* Depends also on size of the model and number of tokens used for training, unclear whether the total FLOPS for each model is the same

* MFU/Maximum Float Utilization can also vary depending on the setup, which also means that if you're use better kernels and/or better sharding you can reduce the number of GPUs needed

yusufozkan · on March 25, 2025

Are you sure you are using the new 4o image generation?

https://imgur.com/a/wGkBa0v

minimaxir · on March 25, 2025

That is an unexpectedly literal definition of "full glass".

Loeffelmann · on March 25, 2025

That's the point. With the old models they all failed to produce a wine glass that is completley to the brim full. Because you can't find that a lot in the data they used for training.

colecut · on March 25, 2025

Imagine if they just actually trained the model on a bunch of photographs of a full glass of wine, knowing of this litmus test

gorkish · on March 25, 2025

I obviously have no idea if they added real or synthetic data to the training set specifically regarding the full-to-the-brim wineglass test, but I fully expect that this prompt is now compromised in the sense that because it is being discussed in the public sphere, it's has inherently become part of the test suite.

Remember the old internet adage that the fastest way to get a correct answer online is to post an incorrect one? I'm not entirely convinced this type of iterative gap finding and filling is really much different than natural human learning behavior.

friendzis · on March 26, 2025

> I'm not entirely convinced this type of iterative gap finding and filling is really much different than natural human learning behavior.

Take some artisan, I'll go with a barber. The human person is not the best of the best, but still a capable barber, who can implement several styles on any head you throw at them. A client comes, describes certain style they want. The barber is not sure how to implement such a style, consults with master barber beside, that barber describes the technique required for that particular style, our barber in question comes and implements that style. Probably not perfectly as they need to train their mind-body coordination a bit, but the cut is good enough that the client is happy.

There was no traditional training with "gap finding and filling" involved. The artisan already possessed core skill and knowledge required, was filled on the particulars of their task at hand and successfully implemented the task. There was no looking at examples of finished work, no looking at example of process, no iterative learning by redoing the task a bunch of times.

So no, human learning, at least advanced human learning, is very much different from these techniques. Not that they are not impressive on their own, but let's be real here.

wegfawefgawefg · on March 26, 2025

overfitting vs generalizing

also we all know real people who fail to generalize, and overfit. copycats, potentially even with great skill, no creativity.

vlovich123 · on March 25, 2025

Humans don’t train on the entire contents of the Internet, so i’d wager that they do learn differently

sayamqazi · on March 26, 2025

I think there is a critical aspect of human visual learning which machine leanring cant replicate because it is prohibitively expensive. When we look at things as children we are not just looking at a single snapshot. When you stare at an object for a few seconds you have practically injested hundreds of slightly variated images of that object. This gets even more interesting when you take into account real world is moving all the time, so you are seeing so many things from so many angles. This is simply undoable with compute.

vlovich123 · on March 26, 2025

Then explain blind children? Or blind & deaf children? There's obviously some role senses play in development but there's clearly capabilities at play here that are drastically more efficient and powerful than what we have with modern transformers. While humans learn through example, they clearly need a lot fewer examples to generalize off of and reason against.

sayamqazi · on March 29, 2025

> Then explain blind children I was only talking about vision tasks as an example. You can extend the idea to any sense.

> While humans learn through example, they clearly need a lot fewer examples to generalize off of and reason against.

Human brain has been developing over millenia. machines start from zero. What if this few example learning is just an emergent capbaility of any "leanring function" given enough compute and training.

wegfawefgawefg · on March 26, 2025

they take in many samples of touch data

vlovich123 · on March 26, 2025

I think my point is that communication is the biggest contributor to brain development more than anything and communication is what powers our learning. Effective learners learn to communicate more with themselves and to communicate virtually with past authors through literature. That isn’t how LLMs work. Not sure why that would be considered objectionable. LLMs are great but we don’t have to pretend like they’re actually how brains work. They’re a decent approximation for neurons on today’s silicon - useful but nowhere near the efficiency and power of wetware.

Also as for touch, you’re going to have a hard time convincing me that the amount of data from touch rivals the amount of content on the internet or that you just learn about mistakes one example at a time.

wegfawefgawefg · on March 26, 2025

There are so many points to consider here im not sure i can address them all.

- Airplanes dont have wings like birds but can fly. and in some ways are superior to birds. (some ways not)

- Human brains may be doing some analogue of sample augmentation which gives you some multiple more equivalent samples of data to train on per real input state of environment. This is done for ml too.

- Whether that input data is text, or embodied is sort of irrelevant to cognition in general, but may be necessary for solving problems in a particular domain. (text only vs sight vs blind)

vlovich123 · on March 26, 2025

> Airplanes dont have wings like birds but can fly. and in some ways are superior to birds. (some ways not)

I think you're saying exactly what I'm saying. Human brains work differently from LLMs and the OP comment that started this thread is claiming that they work very similarly. In some ways they do but there's very clear differences and while clarifying examples in the training set can improve human understanding and performance, it's pretty clear we're doing something beyond that - just from a power efficiency perspective humans consume far less energy for significantly more performance and it's pretty likely we need less training data.

wegfawefgawefg · on March 27, 2025

sure.

to be honest i dont really care if they work the same or not. I just like that they do work and find it interesting.

i dont even think peoples brains work the same as eachother. half of people cant even visually imagine an apple.

Neural networks seem to notice and remember very small details, as if they have access to signals from early layers. Humans often miss the minor details. Theres probably a lot more signal normalization happening. That limits calorie usage and artifacts the features.

I dont think that this is necessarily a property neural networks cant have. I think it could be engineered in. For now though seems like were making a lot of progress even without efficiency constraints so nobody cares.

sayamqazi · 2025-04-08T08:34:55 1744101295

> half of people cant even visually imagine an apple.

What is the evidence for this? We are just taking people's word for it?

wegfawefgawefg · 2025-04-09T16:01:55 1744214515

youre one of todays lucky few. about to have your mind blown. look this one up.

HelloImSteven · on March 25, 2025

Even if they did, I’d assume the association of “full” and this correct representation would benefit other areas of the model. I.e., there could (/should?) be general improvement for prompts where objects have unusual adjectives.

So maybe training for litmus tests isn’t the worst strategy in the absence of another entire internet of training data…

orbital-decay · on March 25, 2025

A lot of other things are rare in datasets, let alone correctly labeled. Overturned cars (showing the underside), views from under the table, people walking on the ceiling with plausible upside down hair, clothes, and facial features etc etc

myaccountonhn · on March 26, 2025

They still can't generate a watch that shows arbitrary times I believe, so it could be the case?

nefarious_ends · on March 25, 2025

imagine!

sejje · on March 26, 2025

I did coax the old models into doing it once (dall-e) but it was like a fun exercise in prompting. They definitely didn't want to.

jorvi · on March 25, 2025

The old models were doing it correct also.

There is no one correct way to interpert 'full'. If you go to a wine bar and ask for a full glass of wine, they'll probably interpert that as a double. But you could also interpert it the way a friend would at home, which is about 2-3cm from the rim.

Personally I would call a glass of wine filled to the brim 'overfilled', not 'full'.

kalleboo · on March 26, 2025

I think you're missing the context everyone else has - this video is where the "AI can't draw a full glass of wine" meme got traction https://www.youtube.com/watch?v=160F8F8mXlo

The prompts (some generated by ChatGPT itself, since it's instructing DALL-E behind the scenes) include phrases like "full to the brim" and "almost spilling over" that are not up to interpretation at all.

drdeca · on March 25, 2025

People were telling the models explicitly to fill it to the brim, and the models were still producing images where it was filled to approximately the half-way point.

yusufozkan · on March 25, 2025

Generating an image of a completely full glass of wine has been one of the popular limitations of image generators, the reason being neural networks struggling to generalise outside of their training data (there are almost no pictures on the internet of a glass "full" of wine). It seems they implemented some reasoning over images to overcome that.

kube-system · on March 25, 2025

I wonder if that has changed recently since this has become a litmus test.

Searching in my favorite search engine for "full glass of wine", without even scrolling, three of the images are of wine glasses filled to the brim.

numpad0 · on March 25, 2025

Except this is correct in this context. None of existing Diffusion models could, apparently.

yusufozkan · on March 25, 2025

This is another cool example from their blog

https://imgur.com/a/Svfuuf5

Imustaskforhelp · on March 25, 2025

Looks amazing,can you please also create a unconventional image like the clock at 2:35 , I tried it something like this with gemini when some redditor asked it and it failed so wondering if 4o does do it

CSMastermind · on March 25, 2025

I tried and it failed repeatedly (like actual error messages):

> It looks like there was an error when trying to generate the updated image of the clock showing 5:03. I wasn’t able to create it. If you’d like, you can try again by rephrasing or repeating the request.

A few times it did generate an image but it never showed the right time. It would frequently show 10:10 for instance.

coder543 · on March 25, 2025

If it tried and failed repeatedly, then it was prompting DALL-E, looking at the results, then prompting DALL-E again, not doing direct image generation.

Imustaskforhelp · on March 26, 2025

So it's not doing what they are saying/ advertising, I think you are onto something big then

coder543 · on March 26, 2025

No... OpenAI said it was "rolling out". Not that it was "already rolled out to all users and all servers". Some people have access already, some people don't. Even people who have access don't have it consistently, since it seems to depend on which server processes your request.

Workaccount2 · on March 25, 2025

I tried and while the clock it generated was very well done and high quality, it showed the time as the analog clock default of 10:10.

lyu07282 · on March 25, 2025

The problem now is we don't know if people mistake dall-e for the new multimodal gpt4o output, they really should've made that clearer.

cmorgan31 · on March 25, 2025

I’m using 4o and it gets time wrong a decent chunk but doesn’t get anything else in the prompt incorrect. I asked for the clock to be 4:30 but got 10:10. OpenAI pro account.

Imustaskforhelp · on March 26, 2025

Shouldn't reasoning make the clock work though.

Why does it sound like this isn't reasoning on images directly but rather just dall e as some other comment said , I will type the name of the person here (coder543)

stevesearer · on March 25, 2025

Can you do this with the prompt of a cow jumping over the moon?

I can’t ever seem to get it to make the cow appear to be above the moon. Always literally covering it or to the side etc.

michaelt · on March 25, 2025

https://chatgpt.com/share/67e31a31-3d44-8011-994e-b7f8af7694... got it on the second try.

coder543 · on March 25, 2025

To be clear, that is DALL-E, not 4o image generation. (You can see the prompt that 4o generated to give to DALL-E.)

spuz · on March 26, 2025

How can you see this? I don't see it.

coder543 · on March 26, 2025

On the web version, click on the image to make it larger. In the upper right corner, there is an (i) icon, which you can click to reveal the DALL-E prompt that GPT-4o generated.

dimitri-vs · on March 26, 2025

Here you go: https://imgur.com/a/QJlj4I9