Sama and OpenAI, I am waiting on my data bundle to become available so I can delete my account. This has taken more than 48 hours - either you are getting hammered on deletion requests, or as usual you are playing games hoping I forget. I won't. People won't.
Between vi(m) and VSCode - sublime has grown increasingly less useful in my day to day life. Used to use it exclusively, but it has lately been squeezed on both ends due to increasing server side development needs.
The grapes are sour because their moat is crumbling.
What was supposed to be a model, training, and data moat - is now reduced to operational cost, which they are not terribly efficient for.
OpenAI has been on a journey to burn as much $ as possible to get as far ahead on those three moats, to the point where decreasing TCO for them on inference was not even relevant - "who cares if you save me 20% of costs when I can raise on a 150b pre money value?".
Well, with their moats disappearing, they will have no choice but to compete on inference cost like everyone else.
Log space is nice, multiplication can be replaced by addition.
This part is easy and anyone can implement hardware to do this. The tricky bit is always the staying in log space while doing accumulations, especially ones across a large range.
We design an inference accelerator which more or less accomplishes this by quantizing input tensors into logarithmic space. This allows the multiplication (in convolution especially), to be optimized into very simple adders. This (and a few other tricks) has a very dramatic impact on how much compute density we achieve while keeping power very low. We keep the tensors in our quantized space throughout the layers of the network and convert the outputs as required on the way out of the ASIC.
We achieve impressive task level performance, but this requires some specialized training and model optimizations.
Very cool to see ideas like this propagate more into the mainstream.
Isn't matrix multiplication already a convolution? You are rotating the right hand side matrix anti clockwise 90 degrees and then convolving it upon the LHS matrix from top to bottom.
The point above regarding convolution had to do specifically with accelerating 3x3 and above convolutional operations, as the product and the accumulation can be done in a few clock cycles if setup with care and love.
there is no way in which the indexes into the input matrices in a matrix multiplication are formed from sums or differences of indices and dummy variables
however, convolution is a matrix multiplication, specifically multiplication by the circulant matrix of the convolution kernel
But, I should also mention, we only use the RISC-V cores as a pre/post-processor, scheduling engine, service processor. We have custom hardware that does the bulk of the inference math (we are a convolutional accelerator with a number of constraints traded off for speed and power). The fabric itself is driven by engines that are programmed by the scheduling engine (RISC-V).
reply