Dang. I bought an i4770 rather than i4770K specifically because of TSX.

reitzensteinm · on Aug 12, 2014

I also did this, missing out on 100mhz of base clock (3.4ghz vs 3.5ghz for the k).

TSX seemed like a once in a decade step forward, though as I understand it the restrictions with the cache size (and thus the amount of memory you can write to before the transaction gets too big) meant it wasn't very practical for much beyond optimistic lock acquisition.

For example, PyPy isn't planning on doing a TSX port even with their enthusiasm for transactional memory.

Also influencing my decision, I bought a 2600k a few years back, but never bothered overclocking it, which I guess was an admission that the excitement I found for hardware when I was a child was dead. I guess you either have the money, or the time, but rarely both.

It's disappointing that this microcode update isn't being done in such a way that you can re-enable it after agreeing to a disclaimer that it's not for production use. I'm not sure what the mechanism for this would look like, but given that Intel sold cards that unlocked Hyperthreading, I'm sure it's possible.

http://www.engadget.com/2010/09/18/intel-wants-to-charge-50-...

Edit: The article has been updated saying that it will be possible to enable TSX for development purposes on Haswell-EP at least.

ak217 · on Aug 12, 2014

> For example, PyPy isn't planning on doing a TSX port even with their enthusiasm for transactional memory.

Do you have more information about their reasoning behind this? From my point of view this is the highest profile software project to potentially make use of HTM, and I recall reading that the plan was to eventually introduce hardware acceleration.

reitzensteinm · on Aug 12, 2014

Here are a few links:

http://pypy.org/tmdonate.html (Search for "haswell")

http://grokbase.com/t/python/pypy-dev/13bvt3kg70/pluggable-h...

It seems to boil down to:

* The cache size (which determines the amount of memory you can write to in a transaction before having to commit back) is insufficient, causing excessive transaction aborts.

* There is no mechanism to bypass the HTM, writing to memory within a transaction that is not rolled back. This exacerbates the small cache size, since all memory writes have a cost, not just the ones you want rolled back in the case of a transaction abort.

Interestingly, this does not bode well for HTM on a platform with many smaller cores, say a hypothetical 64 core ARM. Each core will have a tiny amount of L1 cache, severely limiting transaction size.

And many smaller cores is exactly where you'd want the benefits of HTM, since the overhead of synchronization is higher in proportion to the work each core can do.

sounds · on Aug 12, 2014

In reference to the sibling post that gives more details about pypy, I'd like to call to your mind the history of the vector extensions for x86.

First revision: MMX. It reused the same registers as the older x87 floating point coprocessor (even though the x87 transistors lived on the same die). As a result, legacy x87 code and MMX code had to transition using an expensive EMMS instruction.

Second revision: (well, ignoring some small changes to MMX) ... SSE. Finally got its own registers, but lacked a lot of real-world capability.

Third revision: SSE2, finally got to a level of parity with competing vector extensions (see, for example, PowerPC's Altivec).

And so forth.

I guess the take-home lesson for me is that these new TSX instructions are indeed fascinating to play around with, but I wouldn't expect it to blow the doors off. Intel will incrementally refine it.

(The incremental approach also gives Intel a chance to study how it's being used and keeps AMD playing catch-up.)

MBCook · on Aug 13, 2014

The other big problem with MMX was that it was integer only. While that might have been ok for some application 3D games and other software that could really use the boost needed floating point and not only couldn't benefit (since it was integer only) it actually interfered (since, as you said, it reused the registers).

AMD's 3DNow had single precision floating point support, so it was actually somewhat useful. SSE followed 3DNow and added single precision support (as well as fixing the register stuff). SSE2 added double precision support.

sounds · on Aug 14, 2014

Right, thanks for those additional details.

Today, no one would use MMX instructions (since SSE is vastly superior). I expect Intel will continue to add TSX capabilities which will eventually produce some nice results for parallel code.

binarycrusader · on Aug 12, 2014

The 4790K (aka "Devil's Canyon") supports TSX. It's one of the only "unlocked" CPUs that Intel produces that does.

Yes, really:

http://ark.intel.com/products/80807/Intel-Core-i7-4790K-Proc...

...which is why I specifically bought it. So yes, I too am a little annoyed by this since I bought it specifically to develop TSX applications.

sounds · on Aug 14, 2014

Just curious if you'll forego updating your BIOS so you don't lose the TSX instructions, despite the errata?

Hopefully you can upgrade to a Broadwell or later once Intel starts shipping fixed silicon. Haswells will be updated to the new microcode once a replacement is available. (At least, that's our plan.)

binarycrusader · on Aug 14, 2014

Since I'm having no problems, I'll likely forego updating the BIOS. However, that might not mean much since I think Microsoft distributes Intel microcode updates as part of OS updates. But usually you can remove the specific kb patch if needed.

But yes, I'll be looking forward to the replacement Broadwell. My previous workstation was a Core 2 DUO E8400 which I just replaced with the 4790K.