Hacker Newsnew | past | comments | ask | show | jobs | submit | sabhiram's commentslogin

Sama and OpenAI, I am waiting on my data bundle to become available so I can delete my account. This has taken more than 48 hours - either you are getting hammered on deletion requests, or as usual you are playing games hoping I forget. I won't. People won't.

Yep, there is absolutely no problem with that at all.

Never imagined politics so obviously manipulating the talking heads with nary a care about perception.


Between vi(m) and VSCode - sublime has grown increasingly less useful in my day to day life. Used to use it exclusively, but it has lately been squeezed on both ends due to increasing server side development needs.


The grapes are sour because their moat is crumbling.

What was supposed to be a model, training, and data moat - is now reduced to operational cost, which they are not terribly efficient for.

OpenAI has been on a journey to burn as much $ as possible to get as far ahead on those three moats, to the point where decreasing TCO for them on inference was not even relevant - "who cares if you save me 20% of costs when I can raise on a 150b pre money value?".

Well, with their moats disappearing, they will have no choice but to compete on inference cost like everyone else.


Log space is nice, multiplication can be replaced by addition.

This part is easy and anyone can implement hardware to do this. The tricky bit is always the staying in log space while doing accumulations, especially ones across a large range.


We are talking about it aren't we?


Muni style. People crapping all over it from time to time.


Fascinating paper.

We design an inference accelerator which more or less accomplishes this by quantizing input tensors into logarithmic space. This allows the multiplication (in convolution especially), to be optimized into very simple adders. This (and a few other tricks) has a very dramatic impact on how much compute density we achieve while keeping power very low. We keep the tensors in our quantized space throughout the layers of the network and convert the outputs as required on the way out of the ASIC.

We achieve impressive task level performance, but this requires some specialized training and model optimizations.

Very cool to see ideas like this propagate more into the mainstream.


Isn't matrix multiplication already a convolution? You are rotating the right hand side matrix anti clockwise 90 degrees and then convolving it upon the LHS matrix from top to bottom.


The point above regarding convolution had to do specifically with accelerating 3x3 and above convolutional operations, as the product and the accumulation can be done in a few clock cycles if setup with care and love.


no, it is not, and i am not

discrete convolution is cₙ = Σᵢaᵢbₙ

there is no way in which the indexes into the input matrices in a matrix multiplication are formed from sums or differences of indices and dummy variables

however, convolution is a matrix multiplication, specifically multiplication by the circulant matrix of the convolution kernel

hth, hand


Sure it doesn't sum the whole matrix but it does sum row by row. Also how did you type out LaTeX in HN? Or is that a font?


it sums products, but convolution is summing products in a particular way that is not general matrix multipication

i typed special characters with the compose key; cf. https://github.com/kragen/xcompose

not as easy as latex but more compatible


No moat like a hardware moat.

Depending on the application, you could literally build software in the open and still maintain if not expand exposure.


Ah, the Huawei strategy.


We run a cluster of RISC-V CPUs based on open source designs to schedule and post process workloads for our edge accelerator ASIC.

10/10 would do it again, except this time we may pay SiFive or someone like that for something requiring less "customization".


Interesting, can you say anything about what sort of chip to chip communication is used? AXI, Wishbone, plain old serial?


AXI primarily.

But, I should also mention, we only use the RISC-V cores as a pre/post-processor, scheduling engine, service processor. We have custom hardware that does the bulk of the inference math (we are a convolutional accelerator with a number of constraints traded off for speed and power). The fabric itself is driven by engines that are programmed by the scheduling engine (RISC-V).

Happy to answer more specific DMs.


Can you DM on this site?


Yikes, no! But hit me up here: shaba@recogni.com.


Do you have a website / product page?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: