The Sneaky Standard – Intel PCI Standardization History

rasz · on Feb 14, 2024

>(VL-Bus) ... It wasn’t a massive leap, more like a stopgap improvement on the way to better graphics.

>VESA came up with a slightly faster bus standard for the next generation of graphics cards, one just fast enough to meet the needs of 486 users.

from 5MB/s to 133MB/s not a massive leap, just slightly faster, ok :)

>Intel didn’t exactly make that point clear to the companies supporting the VESA standards body until it was too late for them to react.

Those companies didnt care. Adding VLB to a 486 motherboard cost below 1$ by reusing excess stock of failed standard MCA connectors. Adding PCI meant big expensive PCI bridge chip or developing next generation chipset and impedance matched PCB to satisfy PCI reflected wave switching requirements. VLB was a quick hack to give users what was needed at the time at the price they were willing to pay. Market wasnt ready to pay for PCI in 1992, two years later cost came down to earth.

hyperman1 · on Feb 14, 2024

5mb/s is on the low side. Isa ran on the same clock as the cpu. We started there, but things got better quickly.

I think the hard limit was a bit below 40MHz, as people still used 74HC chips for glue logic. Combined with EISA 16 bits = 2 bytes per cycle this might give you a 80MB/s theoretical EISA peak. I had an EISA based 386 running 40MHz at the time.

Interestingly, VESA's 133MHz is double this, so maybe the majority of the speed gain was doubling bus width to 32bit?

rasz · on Feb 14, 2024

We started at 8 bit 4.77MHz with 4 clocks minimum so 1MB/s. We ended at around 12MHz in some crazy Turbo XT/AT clones, anything faster was back to ~8MHz with fancier BIOSes giving you option to overclock.

You are thinking cpu fsb, ISA never ran at 40MHz. Even EISA ran at a fixed ~8MHz. Btw EISA is not those longer black ISA slots on normal boards, EISA is brown slots with too many pins between normal pins because its secretly a 32bit bus, found pretty much only on workstation/server boards from 1989-1994.

hyperman1 · on Feb 14, 2024

You made me curious, so I looked it up:

https://en.wikipedia.org/wiki/List_of_interface_bit_rates

It lists: The original case: ISA 8-Bit/4.77 MHz - 0 W/S: every 4 clocks 1 byte , so basically what you say.

The best case; EISA 32-bit/8.33 MHz : 33.32 MB/s So a lot worse than VESA, but still better than 5Mb/s

hulitu · on Feb 17, 2024

Wasn't VESA and VLB the same thing ? VLB stands AFAIK for VESA Local Bus.

sponaugle · on Feb 14, 2024

I was an engineer at Intel in the 90s and worked on a PCI card for audio codec support. I remember going into one of the PCI test labs the first time and seeing this fantastic hardware setup where all of the PCI signals were mapped into a logic analyzer that could do full rate recording. The processor on the board was also mapped into a LA so you can see CPU side transactions. No doubt in the 90s this equipment was well in the hundreds of thousands of dollars.

Like any good setup of the time the development machine had a VGA monitor on a Mach 64 as well as a monochrome monitor for SoftICE. It was running in Windows NT 3.51. Ah such good times for a young engineer!

Perhaps best of all, if I had a question the people who designed PCI were on the floor above us.

rasz · on Feb 14, 2024

You might enjoy hour long red team R&D lab tour, includes similar PCIE snooping setups :)

"Secrets of a $182 Billion Chip Maker: AMD's Labs | Full Documentary" - Gamers Nexus https://www.youtube.com/watch?v=7H4eg2jOvVw

wmf · on Feb 14, 2024

PCI was so far ahead of its time it was crazy. When it came out, PCI was 10x faster than SCSI and 100x faster than Ethernet.

RiverCrochet · on Feb 14, 2024

Well, you couldn't plug a cable from a peripheral directly into a PCI port in 1993. It wasn't an external interface at all.

PCI was an internal "close to the CPU" communications method mostly routed through a motherboard, superseding ISA which was the CPU bus physically extended into slots. ISA ran at 8Mhz (maybe 10Mhz or 16Mhz, memory is funny) which was also pretty fast in 1993.

It wasn't until PCIe introduced switching and Thunderbolt that PCI-anything became more like switched Ethernet and something that could be funneled through cables.

mjg59 · on Feb 14, 2024

> ISA ran at 8Mhz (maybe 10Mhz or 16Mhz, memory is funny)

The answer is "It depends", but it also depends on what you mean by ISA? The slots in the PC, XT, and AT (and clones of the appropriate vintage) tended to run synchronously with the CPU, which meant the performance of a card would depend on how fast your CPU was - well, up until the point where it was excessively overclocked and stopped working, of course. This meant that old cards wouldn't necessarily work in newer machines, given they may have been designed to work with a 4.77MHz PC. Later systems supported dividing the CPU clock to get back down to around the 8MHz range, and eventually we get to the point where the ISA slots aren't connected to the CPU at all but are bridged to PCI. But the term ISA wasn't really a thing until vendors introduced EISA as a royalty-free competitor to IBM's MCA slot architecture, and retconned the AT slot into ISA, and at that point things had pretty much settled on 8MHz. So, in that sense, ISA has always been 8MHz, even if the thing that was identical to ISA wasn't.

NikkiA · on Feb 14, 2024

There were actually PCI and PCI-X external chassis boxes (in all 4 forms), they were expensive as hell, but they were a critical tool for developing PCI/PCI-X devices.

They usually used 1 or 2 scsi-3 (80-pin) connector to connect from the host to the box (I think you could get away with 1 connector on either 32bit standard if you dropped some of the non-vital signals).

aidenn0 · on Feb 14, 2024

16 bit ISA@8.33MHz was about 15x slower than PCI. A 10BASE2 card used a significant fraction of the bus.

rasz · on Feb 14, 2024

To expand on the above, absolute theoretical 16bit ISA limit is 2 cycles per transaction (0WS), (2 Bytes x clock MHz)/2 cycles = ~8MB/s at 8MHz. In practice best period correct VGA cards reach ~4-6MB/s write speed. Slowest VLB cards do ~10MB/s fastest ~30MB/s, slowest PCI VGA ~15MB/s fastest again ~30MB/s.

rasz · on Feb 14, 2024

It was 1x speed of 2 years younger VLB and did Autoconfig like 1986 Amiga :)

vrinsd · on Feb 14, 2024

Intel also had quite a few issues (as usual) getting their own PCI standard working.

Almost all of the early Intel PCI chipsets (i.e. Mercury, Neptune ) had serious bugs. Intel actually used DEC PCI chipsets or licensed their designs to actually get a more robust PCI implementation (IIRC Triton). Until then VLB was still a better choice.

PCIe retains a lot of "weirdness" that PCI introduced which includes a lack of a unified or "properly architected" device-to-host and host-to-device centralized DMA agent or broker.

Even today it's still a challenge to extract more than 50-60% of the theoretical bandwidth PCIe offers without a lot of tuning and mental churn.

wtallis · on Feb 14, 2024

> it's still a challenge to extract more than 50-60% of the theoretical bandwidth PCIe offers without a lot of tuning and mental churn.

Are you referring to using all the bandwidth across all of a CPU's PCIe lanes? Because it's clearly not a challenge to do way better than 50-60% utilization for transfers in one direction at a time to one device. All kinds of cheap crappy products have passed that threshold on their first try.

vrinsd · on Feb 15, 2024

The issue is not lack of raw bandwidth, it's getting the hardware, software, drivers and OS "to do the right thing".

PCIe gives you the building blocks of posted transactions and non-posted transactions but doesn't help you use them effectively. There is no coordinated or designated DMA subsystem to help move data between the root-complex("host") and end-point("device)".

So, if you have to design a new PCIe end-point (target in original PCI terms) using an FPGA or ASIC then trying to actually sustain PCIe throughput in either "direction" isn't trivial.

Posted transactions ("writes") are 'fire and forget' and non-posted transactions ("reads") have a request/acknowledgement system, flow-control, etc.

If you can get your "system" to use ONLY posted writes (fire and forget) with a large enough MPS (payload size), usually >128 Bytes, then you can get to 80%-95% of theoretical throughput (1).

The real difficulty is if you need to do a PCIe 'read' this breaks down into a read-request (MRd) and a Completion with Data (CplD). The 'read' results in a lot of back and forth traffic and tracking the MRds/CplDs becomes a challenge (2).

Often an end-point can use 'posted writes' to blast data to the PCIe root-complex (usually the CPU/host) maximizing throughput since a host usually has hundreds of MegaBytes of RAM to make use of for buffers. Unfortunately to transfer data from the root-complex(host) to the end-point(device), the host usually will have the device's DMA controller initiate a 'read' from the host's memory which results in these split transactions since end-points don't often carry hundreds of MB of RAM. This also means bespoke drivers, tying into the OS PCIe subsystems and hopefully not loosing any MSI-X interrupts.

To re-iterate in the modern "Intel way" the CPU houses the PCIe root complex but does not house ANY DMA controller. So to get "DMA" working means each PCIe end-point's implementation has some kind of DMA "controller" which is different than the DMA controller of all other end-points, rather than Intel having spec'd out an "optional" centralized a DMA controller in the root complex.

1: https://cdrdv2-public.intel.com/666650/an456-683541-666650.p...

2: https://www.intel.com/content/www/us/en/docs/programmable/68...

peter_d_sherman · on Feb 14, 2024

>"Key to PCIe’s approach is the use of “lanes” of data transfer speed, allowing high-speed cards like graphics adapters more bandwidth (up to 16 lanes) and slower technologies like network adapters or audio adapters less. This has given PCIe unparalleled backwards compatibility — it’s technically possible to run a modern card on a first-gen PCIe port in exchange for lower speed — while allowing the standard to continue improving."

Backwards compatibility -- seems to have been the key to Microsoft Windows OS's popularity as well...

Anyway, an excellent computer history article!

userbinator · on Feb 14, 2024

Ironically, VLB was based on the 486's local bus.

Many people may know Intel also contributed to the initial USB standard, but they also created AC97, HD Audio, and AMR/CNR.

jauntywundrkind · on Feb 14, 2024

The article goes into that some.

> To put that all another way, VESA came up with a slightly faster bus standard for the next generation of graphics cards, one just fast enough to meet the needs of 486 users. Intel came up with an interface designed to reshape the next decade of computing, one that it would even let its competitors use.

It's crazy how useful PCI was. Intel kept being the company that made standards, that created places for cooperation. In an industry where historically it felt like every company wanted to take on the world solo. Its hard to think of a company that has done more to grow the pot in an industry than Intel.

(Intel also created nvme!)

crotchfire · on Feb 14, 2024

> (Intel also created nvme!)

To be fair they did that because there were already several vendor-specific proprietary PCIe-attached-SSD products (notably in Apple's macbooks) and the lack of a standard way of doing this was a major fragmentation risk.

PCI and PCIe were generous gifts from Intel to the rest of the industry and involved giving away a fair amount of hard-won development results (like the pcie serdes electrical parameters). NVMe was more a case of Intel being the referee because we needed somebody to make an arbitrary decision.

wtallis · on Feb 14, 2024

> there were already several vendor-specific proprietary PCIe-attached-SSD products (notably in Apple's macbooks) and the lack of a standard way of doing this was a major fragmentation risk.

Are you confusing the protocol for the mechanical connector and form factor? Prior to using NVMe, Apple was using PCIe SSDs that spoke AHCI, the Intel-supported industry standard for storage controllers used by almost all SATA controllers.

The M.2 card connector and form factor were standardized by PCI-SIG and are unrelated to NVMe.

alexdbird · on Feb 14, 2024

As I understand it, speaking AHCI doesn't mean the higher level drivers exist in the BIOS to allow booting, so things were a long way from standardisation in those days. You couldn't just put an AHCI drive in an M.2 slot and expect to get on with your day.

wtallis · on Feb 14, 2024

A PCIe SSD speaking AHCI means the boot firmware and OS will see it exactly the same way they would see an add-in card providing one SATA port populated with an SSD that is mysteriously faster than a 6Gbps SATA connection should allow. Those PCIe SSDs used AHCI exactly because that's what everything already had drivers for.

alexdbird · on Feb 17, 2024

That's the thing, not everything had the drivers. Gigabyte in particular omitted the important bits for a while. I'm not going to claim I'm an expert, but I ended up with a very fast SSD which could only be made bootable by extracting parts of another vendors BIOS and creating a custom UEFI image with them added, and life is far too short for that.

wtallis · on Feb 17, 2024

What you're describing sounds like the process for adding NVMe support to a board that shipped with a PCIe M.2 slot but no NVMe driver in the firmware. There were quite a few of those circa Intel's Z97 chipset, and motherboard vendors were inconsistent about when or whether they shipped firmware updates to add NVMe support. A few early NVMe SSDs included a PCI Option ROM to provide NVMe capability on systems that didn't already have it. For any other NVMe SSD, the firmware needed to be modified to add a UEFI DXE driver for NVMe to enable booting.

There were very few consumer PCIe SSDs sold before the adoption of NVMe, and they were almost exclusively sold to OEMs rather than as retail products. A few like the Samsung XP941 and SM951 were available through grey-market retailers for about a year before Samsung launched the 950 PRO as an official retail product, using NVMe (with an Option ROM). (Note: the SM951 existed in both AHCI and NVMe variants, so some of those grey-market retail sales were of drives that were essentially 950 PROs with slightly older flash).

It's not entirely impossible that Gigabyte shipped boards that couldn't boot from AHCI PCIe SSDs. But it would be difficult for that to be a driver issue. I'd expect those boards to also have difficulty booting from an add-in SATA controller card, because the difficulty booting from AHCI SSDs would have to come not from a lack of an AHCI driver but from a misconfiguration preventing that driver from being used with anything other than the SATA controller built-in to the motherboard.

How sure are you that you didn't miss the brief era of AHCI SSDs entirely and are simply remembering early NVMe teething issues?

alexdbird · on Feb 20, 2024

I am 100% sure. It is indeed an SM951/M2/ACHI[sic, that's what's printed on it], I have it in front of me. It went in the only system I've built in the last 15 years, so there's no room for confusion.

It had to be AHCI because macOS didn't support NVMe in this era, and the motherboard was also chosen for Hackintosh compatibility, but I only ever used it as a data drive because of the boot issues. That was probably the best option for its use as a fast compile machine anyway, but it did become a PITA when I wanted to sell the machine.

I'm afraid I am not familiar with this technology - as I said I am not an expert - so perhaps driver is the wrong word. Perhaps I did end up reading advice relating to the NVMe variant, information was very hard to come by. All a very small historical footnote now anyway!

jauntywundrkind · on Feb 14, 2024

It's still grand that NVMe massively improved the world beyond "can it boot". Having 65536 different submission queues, having read/write opa take 1 rather than 6 round trips, and a variety of other improvements.

I think we still would have been stuck with a bunch of vendors making exotic expensive boutique flash storage systems for a while had Intel not come by & made a fairly sizable overhaul of the existing protocols.

wtallis · on Feb 14, 2024

FYI, nobody supports 65k queues. The protocol allows it, but the hardware has lower limits. Most SSD controller vendors have had trouble supporting enough queues to assign one per CPU core on high-end systems contemporaneous with the SSDs.

lakkal · on Feb 14, 2024

I briefly tried to use a motherboard with both PCI and VLB slots in the mid 90s. Didn't work well, though - constant machine crashes.

hakfoo · on Feb 15, 2024

I wonder if the idea was bad, or just that the PCI+VLB boards tended to be last dregs of the 486 market, where the selling point was "The system integrator can clear out his cabinets of rapidly obsolescing VLB cards" or "the person in a weirdly budget-constrained upgrade can avoid replacing the VLB video card". They weren't competing for the premium market and probably cut other corners; this was after all the era of fake cache.

vrinsd · on Feb 15, 2024

The PCI chipsets of this time were really really buggy, that's why.

They got better as time went on but it really took a number of years before people could get reasonably high-performance, reliable PCI implementations. For x86, aside from AMD's Irongate (750/760) chipsets (K7-era) and nVidia (nForce), pretty much only Intel had PCI working reasonably. ALI, VIA and SiS PCI implementations always had weird issues and quirks.

PCI-IDE adapters are another good example -- VIA's PCI IDE had all sorts of issues, if you wanted high performance PCI IDE, it worked best with Intel.

It's not that different with modern PCIe-SATA either (history repeats), Marvell PCIe/SATA adapters have lingering oddities.

rasz · on Feb 16, 2024

Wasnt Irongate build on licensed VIA chipset IP, and same deal with nForce being Acer/ALI/ULI?

vrinsd · on Feb 16, 2024

No, Irongate was very much internally developed by AMD. AMD was well aware of the "sketchy" nature of ALI/VLI/ULI/SiS and knew it was giving them a really bad rep so they undertook their own designs.

AMD chipsets were not as successful in the marketplace because unsurprisingly the AMD-chipsets cost more than the Taiwanese ones, and motherboard vendors (who are almost all based in Taiwan) stuck with their existing vendors. The dual-socket AMD machines (K7-based) pretty much all had the 760MPX on it because I think only AMD had a multi-socket chipset that was reliable.

AMD also inhereted a lot of DEC engineers, so it was no surprise that K8-era their 'HyperTransport' was really 'Lighting Transport', developed at DEC.

Interestingly, Micron had also developed a chipset for AMD in this era but never released it.

nForce was also not re-licensed from ALI/ULI that I recall but I have far less insight here. nVidia at the time had a 'total system play' in mind so they were attempting to do GPU/audio/network/chipsets for AMD & Intel and got designed into original Xbox (x86-based). Jensen I think used NRE money from MSFT to fund a lot of the nVidia chipset work.

rasz · on Feb 17, 2024

>No, Irongate was very much internally developed by AMD.

My confusion comes from wiki entry for AMD-640 https://en.wikipedia.org/wiki/List_of_AMD_chipsets#AMD-xxx I assumed 741 was further development based on 640.

> AMD was well aware of the "sketchy" nature of ALI/VLI/ULI/SiS

From my limited experience in practice AMD-751 was more problematic than KT133/133A. Those were still the times when hardware reviews had sections dedicated to "Stability". For example https://www.anandtech.com/show/718/5

"Even when running the DDR SDRAM at CAS 2 settings, the system did not crash once within 24 hours of our stress tests. We continued to run the stability tests and finally the first crash occurred after 34 hours of operation. Considering that this is FIC's first try at a DDR board we were very impressed with the stability of the AD11."

Personally I dont remember many vendors sticking with AMD chipsets after switch to Socket A, at least in Europe it was all VIA with some ALI/SIS. As for price its all on AMD, nobody forced them to manufacture at Dresden. If chipset was so strategic AMD should have sold it with minimal/no margin. The way I see it AMD was just seeding the market making sure to avoid chicken and egg problem.

hakfoo · on Feb 17, 2024

I wonder if there was also some business sabotage in effect.

It would have been extremely tempting for Intel to pressure motherboard manufacturers with some subtle messages like "Nice AMD750 board, too bad we don't have any more 440BX chips for you."

The Athlon debuted with only like three compatible mainboards, and two of them were minimally rebadged versions of the AMD reference design. Of course, it was still too compelling of a platform to ignore, and everyone got on board soon enough. But if R&D had stayed away from the Athlon market for an extra 6 months or a year, that could manifest in worse board design and optimization for quite a while.

hakfoo · on Feb 17, 2024

ULi got subsumed by nVidia shortly after PCI-Express became the norm.

They made a chipset which offered a fairly compatible AGP-like slot alongside PCI-e, and one with two full x16 slots when this otherwise required a very expensive nForce board. So of course, nVidia immediately blocked SLI support on it.

chem83 · on Feb 14, 2024

I'd have included a 6th entry into the list of offshoots: OCulink.

chem83 · on Feb 15, 2024

VLB felt like a stopgap much in the same way that AGP lingered just enough for the PCI-X v PCIe mess to be sorted out.