More

M277 · on March 7, 2022

It's not that simple; there are places where your asset has value for a user but the user _cannot_ (and I really mean cannot here) pay that value.

It's not a matter of convenience, and it's not just because it's cheaper either -- piracy is the only way to actually get access to it for some people. Otherwise, it's just prohibitively expensive for them.

Sure, you could argue that that still falls under "because it's cheaper", but there's a difference between someone pirating because they just want to save money and someone pirating because they literally can't afford to get it legally no matter what.

And this doesn't cover stuff like software just being plain unavailable in some regions; I literally cannot get some software I use in university legally.

M277 · on Feb 18, 2022

There's Chips and Cheese[0], but not sure if "layman terms" is accurate. They're really great, though.

[0]: https://chipsandcheese.com/

M277 · on Dec 5, 2021

Speaking of which -- Are there any communities/servers with a similar feel to HN / r/learnprogramming / etc that you'd recommend?

Kind of new with Discord I admit.

M277 · on Oct 2, 2021

Thanks a lot, always enjoy your posts on here and r/hardware! Do you have a hardcore introduction with even more detail / perhaps even with examples of implementations? :)

I find white papers quite good (although I admit there are many things I don't understand yet and constantly have to look up), but even these sometimes feel a bit general.

dragontamer · on Oct 3, 2021

> Do you have a hardcore introduction with even more detail / perhaps even with examples of implementations?

Hard to beat the actual technical references when you want something hardcore!

The actual CUDA documentation is good:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....

In particular, "Section 5: Performance Guidelines": https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.... gives a lot of micro-architectural details, including those nasty "bank conflicts" that people keep talking about. Honestly, the original documentation says it best with just a few paragraphs on that particular manner:

> To achieve high bandwidth, shared memory is divided into equally-sized memory modules, called banks, which can be accessed simultaneously. Any memory read or write request made of n addresses that fall in n distinct memory banks can therefore be serviced simultaneously, yielding an overall bandwidth that is n times as high as the bandwidth of a single module.

>

> However, if two addresses of a memory request fall in the same memory bank, there is a bank conflict and the access has to be serialized. The hardware splits a memory request with bank conflicts into as many separate conflict-free requests as necessary, decreasing throughput by a factor equal to the number of separate memory requests. If the number of separate memory requests is n, the initial memory request is said to cause n-way bank conflicts.

See? Its really not that hard or unapproachable. Just read the original docs, its all there.

AMD's documentation is scattered to the winds, but the same information is around. I'd say that your #1 performance guidelines are from the ancient optimization guide from 2015. Its a bit dated, but its fine: http://developer.amd.com/wordpress/media/2013/12/AMD_OpenCL_...

Chapter 1 and Chapter 2 are relevant to today's architectures (even RDNA, even though some details have changed). Chapter 2: GCN, applies for all AMD GPUs from the 7xxx series, through the Rx 2xx series, Rx 3xx, 4xx, 5xx, Vega, and CDNA (aka: MI100) architectures.

RDNA does not have as good of an architectural guide. Start with the OpenCL guide for optimization, and then "update" your knowledge with the rather short RDNA guide: https://gpuopen.com/performance/

For assembly language details:

* CUDA's PTX is basically assembly language: https://docs.nvidia.com/cuda/parallel-thread-execution/index...

* PTX is a portable assembly language: NVidia continuously updates their GPUs. Volta was very well studied by this paper: https://arxiv.org/abs/1804.06826. I'd suggest reading this AFTER you learn the basics of PTX.

* AMD publishes their assembly language for each generation. Vega is probably a good starting point: https://developer.amd.com/wp-content/resources/Vega_Shader_I...

pkaye · on Oct 3, 2021

Thanks for these great links.

M277 · on Oct 3, 2021

Wow.. this is really great, thank you very, very, very much!

M277 · on Oct 2, 2021

You may find this[0] helpful (note -- download link to a .PDF). It's the GA100 whitepaper.

[0]: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...

M277 · on Sept 24, 2021

The problem with the argument that this is crippling developing countries is that it's often the case that the top talent can't really shine in their country. They can't realize their potential, either due to the lack of means or corruption or envy or whatever. I am from a third world country and a large part of our scientists did great things because they immigrated, at the same time, there are many great minds here that to do anything at all have to partake in an endless uphill battle.

Like, I'd totally love to have our top engineers and scientists come from the West and do some amazing things here (they 100% can make a great change), but I feel that they actually just can't.

frankfrankfrank · on Sept 28, 2021

I get your point and have considered it myself. It is a big problem, but it is also a function of the rather rapaciousness of the US economic/immigration system and model itself. NOTHING would stop the US government to fund the opportunities in whatever country in question, but also, there is no reason why "immigration" could not be a kind of skill building/sharing and development plan where you are given the opportunity to go to the USA for a set number of years and/or cycles, and then you know you are going to transition back to your home country where you will then use those skills/knowledge to help and build your country.

It is actually extremely inefficient the way the rapacious American system in particular works (but also Canada and increasingly the overall EU). There is far more utility to be gained from making exponential strides and advances by the "human resources" as they are even referred to, by uplifting and developing their home country than to serve the US ruling class in ever more desperate search for "growth".

And then of course there is also the fact that immigration is short sighted and a pure measure of the incompetence and failure of government. If you are importing people, you have failed to adequately govern to meet your needs and it also will make you reliant on that immigration while you will ignore even building up any kind of domestic capacity.

Then there is of course the consequence of immigration dependence coupled with international development and growth too when the source of "immigrants" dries up because people will want to life in their home countries where they can be among their own and development has narrowed the difference between comfort levels in the USA. We will be seeing this effect increasingly in the near future as, e.g., Indians see no reason anymore to move to the USA because the US university is not even as good as a local university, let alone American society is crumbling and cracking at all seams. It will be compounding effect. I already see it happening.

maerF0x0 · on Sept 24, 2021

Also it ignores, as i've observed with Filipinos in Canada, that they often send like half of their paycheck back "home" . Some of them live in developed world "squalor" (like many people to a house, or taking the bus) so their families can live like royalty in a low CoL area.

M277 · on Sept 8, 2021

Do you know any good resources for the other things?

jimktrains2 · on Sept 8, 2021

Ben Eater has some interesting video series, including building a cpu from logic chips, building a 6502 computer on a breadboard, building a vga card, and details on usb, ps/2, ethernet, &c.

https://m.youtube.com/c/BenEater

M277 · on Sept 8, 2021

Yeah, he's really great. I have been searching for more channels like his, but couldn't find anything sadly. Closest I found was nandland, but he's more focused on digital design.

AlexeyBrin · on Sept 8, 2021

A good resource is https://www.nand2tetris.org/, there is a book and two parts course on Coursera.

M277 · on Sept 8, 2021

Thank you! I had this on my radar for quite a bit, just need to find the time haha.

jfrunyon · on Sept 9, 2021

https://wiki.osdev.org/Expanded_Main_Page and Ben Eater. LOL

M277 · on Sept 4, 2021

N = 1 here, but as someone from a third world country this is definitely true for my experience. Like, 90% of students here can't really pay US-prices for textbooks at all, so they resort to either piracy or buying imported used books (which are significantly cheaper somehow, but sometimes badly worn or old editions), or even buying printed pirated books.

My university department even had someone who would pirate textbooks and print them and sell them to the students at cost (what it cost to print them), he was very popular and people to this day still remember him very fondly as he was arguably doing a major service... the students wouldn't have read the books otherwise.

It may be legally wrong in the West, but morally? The author doesn't lose (like I said, the books wouldn't have been bought) and the students get access to knowledge they wouldn't have been able to access otherwise.

M277 · on Aug 28, 2021

This is actually better for most consumers. The SLC cache was increased nearly three fold and the controller is a superior one (it now uses the same one as the 980 PRO). TechPowerUp[0] has a much better post on this, and you can clearly see there that the new one is better in most cases than the old one.

The only ones disadvantaged by this change are people who constantly write >42GB which I would think are video editors. (The old version would get a speed reduction to 1500MB/s after overflowing the SLC cache (42GB), the new one goes down to 800MB/s after overflowing (115GB))

P.S.: Not defending this, just clarifying because most posters here seem to believe it's a straight up downgrade. Should also be worth noting that Samsung changed the product box, product number, firmware version and the spec sheet for this change. So they're significantly better than the others who have done similar moves. That said, I still believe that they should have called this the 971 Evo+ or something, as it's genuinely different.

[0]: https://www.techpowerup.com/286008/et-tu-samsung-samsung-too...

OneLeggedCat · on Aug 28, 2021

> Should also be worth noting that Samsung changed the product box, product number, firmware version and the spec sheet for this change

This is yugely better than what some other SSD makers have done. ADATA for example has massively downgraded some drives, trying to sell those under the same name and part numbers as a popular good-selling drive, and done so completely silently. ADATA isn't the only one. The screaming about this situation is endless on several pc part enthusiast subreddits.

biaachmonkie · on Aug 30, 2021

What Samsung should have done it change the name and call it the "971 Evo Plus SSD" or "970 Evo Gold SSD" or some such change that distinguishes this new product with different performance characteristics than the actual "970 Evo Plus SSD".

But no they want to benefit from the good name and customer perception of the "970 Evo Plus SSD" while selling a substantially different product under that name. That is fraudulent behavior!

belltaco · on Aug 28, 2021

>The SLC cache was increased nearly three fold

How can they do that without reducing the overall capacity? My understanding is that part of the MLC storage in SSDs is used as an SLC cache so that it's faster, but can store only half, one third or one fourth of the data it otherwise would.

cwizou · on Aug 28, 2021

In general, there are two separate components to the SLC cache strategy (which as you said, is writing only one bit instead of 3 for TLC, because it's much faster to do so). First you have some overprovisionned NAND, the size of which depends with models. I believe it is 6 GB on this one.

Then you have what they call "intelligent turbowrite", which is a dynamically allocated/reallocated SLC cache (about 108 GB).

For both, the concept is broadly the same, your writes go into the overprovisionned "SLC cache" first, then into the dynamic one.

When the drive is idle, it will consolidate the writes of both caches as 3 bit writes, freeing the NAND for "SLC cache" use again. This can take a few minutes of idle time.

As you fill up your disk things get more complicated, you need to keep some free space to be able to consolidate your writes, the exact way this controller works in that case is not known to me, but this is an issue with every SSD that's not full SLC. Modern controllers usually are doing much better than the old ones.

shanoaice · on Aug 28, 2021

Generally SLC cache has almost no connection to the overall size of the drive. After finishing writing a huge portion of data (for example), the controller will start to move the written data out of the SLC section and turn them into normal TLC mode, releasing the SLC space for next turn of writing. When the drive usage becomes higher, some drive (apparantly Samsung's drive does) have a dynamic SLC capacity policy that will reduce the avaliable SLC space, so the disk can have enough space to store normal TLC data.

SV_BubbleTime · on Aug 28, 2021

I don’t know if that’s true they are the same memory, and I don’t know if the other posters are correct.

However, every SSD out there has unallocated space. This is for trim and drive write purposes.

At the mfg level, if you needed to get more space, you can. This will affect the drive rates per day (DWPD) rating of the drive.

In the server world, you can get “high endurance” drives that are 90% just under-provisioned for storage space.

wmf · on Aug 28, 2021

Consumer SSDs don't have a a lot of overprovisioning. For example, a 1 TB SSD will never have more than 1 TiB of flash. Server SSDs are a different story.

wtallis · on Aug 28, 2021

> a 1 TB SSD will never have more than 1 TiB of flash

It's a bit more complicated than that. None of the quantities precisely correspond to the definitions of 1TB = 1000^4 or 1TiB = 1024^4 bytes. A "1TB" drive will have a host-accessible capacity of 1,024,209,543,168 bytes.

The NAND chips on a consumer 1TB drive will collectively have a nominal capacity of 1TiB (1,099,511,627,776), but that's more of a lower bound; the actual capacities those chips add up to will be higher. If we assume defect-free flash and count the bits used for ECC in order to get an idea of how many memory cells are physically present, then we get numbers as high as 1,335,416,061,952 bytes for our 1TB drive. If we don't count the space reserved for ECC, then we're down to about 1,182,592,401,408 bytes on defect-free flash, and 1,172,551,237,632 after initial defects (taken from a random consumer TLC drive in my collection).

So that means the SSD is starting out with about 14.48% more capacity to work with than it provides to the host system—considerably more than the 9.95% discrepancy between the official definitions of 1TB and 1TiB. Of course, that 14.48% will be reduced as the drive wears out, and the low-grade flash used in thumb drives and bargain barrel SSDs from non-reputable brands will tend to have more initial defects.

M277 · on Aug 25, 2021

Zen 4 [so far] from all the rumors actually seems to be competing against Intel 13th gen. (Both coming Q4 2022)