Hacker Newsnew | past | comments | ask | show | jobs | submit | ekropotin's commentslogin

VRAM vs UM is not exactly apples to apples comparison.

I’m not very well versed in this domain, but I think it’s not going to be “VRAM” (GDDR) memory, but rather “unified memory”, which is essentially RAM (some flavour of DDR5 I assume). These two types of memory has vastly different bandwidth.

I’m pretty curious to see any benchmarks on inference on VRAM vs UM.


A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines:

    Raptor Lake + 5080: 380.63 GB/s
    Raptor Lake (CPU for reference): 20.41 GB/s
    GB10 (DGX Spark): 116.14 GB/s
    GH200: 1697.39 GB/s
This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems.

In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.


Nice! Thanks for that.

55 t/s is much better than I could expect.


I’m using VRAM as shorthand for “memory which the AI chip can use” which I think is fairly common shorthand these days. For the spark is it unified, and has lower bandwidth than most any modern GPU. (About 300 GB/s which is comparable to an RTX 3060.)

So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.


IDK, I feel it’s quite overpriced, even with the current component prices.

I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.


AMD now has 32 GiB Radeon AI Pro 9700. 4 of these (just under 2k each) would put you at 128 GiB VRAM

VRAM is not everything - GPU cores also matter (a lot) for inference

4x Radeon will have significantly more GPU power than say Mac Studio or DGX Spark.

inference speed is like monitor Hz; sure, you go from 60 to 120Hz and thats noticeable, but unless your model is AGI, at some point you're just generating more code than you'll ever realistically be able to control, audit and rely on.

So, context is probably more $/programming worth than inference speed.


How do you know that? Scientists tried to measure Chuck Norris’ age. The number refused to exist.

Clickbait. He is not dead, he just decided to retire from the world of mortals.

Gemini has similar bug https://github.com/google-gemini/gemini-cli/issues/1028, that essentially made this tool absolutely unusable for me.

Never had this problem with Claude tho. Must be something environment-specific.


So basically tmuxinator?

IDK if it can be applied in all situations.

Sometimes, especially when it comes to distributed systems, going from working solution to fast working solution requires full blown up redesign from scratch.


Let me guess - another article about how CLI s are superior to MCP?

I k know

The first link looks very suspicious

Appears to be where the actual link, http://partnerportal.anthropic.com/s/partner-registration, redirects. Site.com is some Salesforce related domain.

Huh, so you got http; I'm now getting linked to: https://partnerportal.anthropic.com/s/partner-registration

Which Firefox warns me has an untrusted cert.


Classic vibe coding, everyone involved in AI has blinders when it comes to their dogfood.

Yes, that’s why I linked where I found it. Anyone suspicious can click through to it from the anthropic.com page. It’s the correct link though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: