More

sabhiram · 2026-03-01T17:19:27 1772385567

Sama and OpenAI, I am waiting on my data bundle to become available so I can delete my account. This has taken more than 48 hours - either you are getting hammered on deletion requests, or as usual you are playing games hoping I forget. I won't. People won't.

sabhiram · 2025-09-18T19:11:05 1758222665

Yep, there is absolutely no problem with that at all.

Never imagined politics so obviously manipulating the talking heads with nary a care about perception.

sabhiram · on Jan 31, 2025

Between vi(m) and VSCode - sublime has grown increasingly less useful in my day to day life. Used to use it exclusively, but it has lately been squeezed on both ends due to increasing server side development needs.

sabhiram · on Jan 31, 2025

The grapes are sour because their moat is crumbling.

What was supposed to be a model, training, and data moat - is now reduced to operational cost, which they are not terribly efficient for.

OpenAI has been on a journey to burn as much $ as possible to get as far ahead on those three moats, to the point where decreasing TCO for them on inference was not even relevant - "who cares if you save me 20% of costs when I can raise on a 150b pre money value?".

Well, with their moats disappearing, they will have no choice but to compete on inference cost like everyone else.

sabhiram · on Oct 10, 2024

Log space is nice, multiplication can be replaced by addition.

This part is easy and anyone can implement hardware to do this. The tricky bit is always the staying in log space while doing accumulations, especially ones across a large range.

sabhiram · on Sept 14, 2024

We are talking about it aren't we?

sabhiram · on July 18, 2024

Muni style. People crapping all over it from time to time.

sabhiram · on July 8, 2023

Fascinating paper.

We design an inference accelerator which more or less accomplishes this by quantizing input tensors into logarithmic space. This allows the multiplication (in convolution especially), to be optimized into very simple adders. This (and a few other tricks) has a very dramatic impact on how much compute density we achieve while keeping power very low. We keep the tensors in our quantized space throughout the layers of the network and convert the outputs as required on the way out of the ASIC.

We achieve impressive task level performance, but this requires some specialized training and model optimizations.

Very cool to see ideas like this propagate more into the mainstream.

KRAKRISMOTT · on July 8, 2023

Isn't matrix multiplication already a convolution? You are rotating the right hand side matrix anti clockwise 90 degrees and then convolving it upon the LHS matrix from top to bottom.

sabhiram · on July 8, 2023

The point above regarding convolution had to do specifically with accelerating 3x3 and above convolutional operations, as the product and the accumulation can be done in a few clock cycles if setup with care and love.

kragen · on July 9, 2023

no, it is not, and i am not

discrete convolution is cₙ = Σᵢaᵢbₙ₋ᵢ

there is no way in which the indexes into the input matrices in a matrix multiplication are formed from sums or differences of indices and dummy variables

however, convolution is a matrix multiplication, specifically multiplication by the circulant matrix of the convolution kernel

hth, hand

KRAKRISMOTT · on July 9, 2023

Sure it doesn't sum the whole matrix but it does sum row by row. Also how did you type out LaTeX in HN? Or is that a font?

kragen · on July 9, 2023

it sums products, but convolution is summing products in a particular way that is not general matrix multipication

i typed special characters with the compose key; cf. https://github.com/kragen/xcompose

not as easy as latex but more compatible

sabhiram · on March 11, 2022

No moat like a hardware moat.

Depending on the application, you could literally build software in the open and still maintain if not expand exposure.

muzani · on March 12, 2022

Ah, the Huawei strategy.

sabhiram · on March 1, 2022

We run a cluster of RISC-V CPUs based on open source designs to schedule and post process workloads for our edge accelerator ASIC.

10/10 would do it again, except this time we may pay SiFive or someone like that for something requiring less "customization".

marktangotango · on March 1, 2022

Interesting, can you say anything about what sort of chip to chip communication is used? AXI, Wishbone, plain old serial?

sabhiram · on March 1, 2022

AXI primarily.

But, I should also mention, we only use the RISC-V cores as a pre/post-processor, scheduling engine, service processor. We have custom hardware that does the bulk of the inference math (we are a convolutional accelerator with a number of constraints traded off for speed and power). The fabric itself is driven by engines that are programmed by the scheduling engine (RISC-V).

Happy to answer more specific DMs.

gautamcgoel · on March 1, 2022

Can you DM on this site?

sabhiram · on March 2, 2022

Yikes, no! But hit me up here: shaba@recogni.com.

tdba · on March 1, 2022

Do you have a website / product page?