Couldn't you do something like add a bidirectional encoder after your embedding look up table to compress your text into some smaller token-count semantic space before feeding your transformer blocks to get a similar effect, then?
Yes, you can get good compression of a long sequence of "base" text tokens into a shorter sequence of "meta" text tokens, where each meta token represents the information from multiple base tokens. But, grouping a fixed number of base tokens into each meta token isn't ideal, since that won't align neatly with sensible semantic boundaries, like words, phrases, sentences, etc. So, the trick is how decide which base tokens should be grouped into each meta token....
This sort of "dynamic chunking" of low-level information, perhaps down to the level of raw bytes, into shorter sequences of meta tokens for input to some big sequence processing model is an active area of research. Eg, one neat paper exploring this direction is: "Dynamic Chunking for End-to-End Hierarchical Sequence Modeling" [1], from one of the main guys behind Mamba and other major advances in state-space models.
Funny, Normal People comes out of Ireland, historically a very sexually repressed nation under the thumb of the Catholic Church. Times are changing rapidly in Ireland though.
Exactly. Capital gains tax is equivalent to a wealth tax on appreciating assets only, which is the only kind of assets you should be targeting with a wealth tax. So just implement a sensible capital gains tax, and you're done.
Considering it's the ones who hold the largest pools of assets affected who can buy the changes to the tax code to build loopholes to get themselves exempted, that seems about impossible.
Admittedly not too familiar with the hardware R&D work going on here, I know Intel is largely manufacturing with a bit of development from things like the purchase of Movidius.
For the software side, what are those R&D departments actually working on? Would you really say it's R&D and product development? From what I can see, both from my own experience and from job posts, most of the engineering jobs are themselves operations related (SRE, infrastructure, customer support). I wasn't saying we don't have engineering roles in Ireland, but what we do have are not prime roles in terms of the companies products and services, and we should be looking to grow beyond facilitating company operations.
I can only speak for myself, but I’m in a group of ≈50 working on prime software R&D (as you term it) in a very trendy space in Dublin for a large multinational. I don’t want to say too much beyond that - and you’re probably right that it’s the exception rather than the rule - but it does exist.
If you work where I think you work (Irish founded company, office is about as central as you can get in Dublin) it's a great example of the type of thing we need more of! I really do think it's the exception unfortunately, even within the other parts of the multinational here.
This is it - you can't just compare FP32 TFLOPS. The newer cards have a ton of extra precisions, custom cores for certain workloads, and on chip memory, all of which use silicon area and transistors, but none of which boost the FP32 TFLOPS metric.
I could design you a chip that is nothing but FP32 multipliers and adders that has, theoretically, a ridiculous TFLOPS per mm^2, but it would be next to useless in any real workload.
On CPU, assuming inference is compute bound rather than bandwidth bound, the compute time will scale quadratically with the size of the FC layers (which account for almost all compute time in these networks). So if the hidden size was 768 in BERT-Base, and 4096 in ALBERT, inference will approximately be 28.4x slower... yikes.
Woah there, the person you're replying to didn't once use the word "weird" or any synonyms thereof.
They said that men and women tend to have different broad interests. But that doesn't preclude women from having an interest in tech. He's making a statement on statistics, not making a value judgement about anybody.