What other factors played a role in your decision to use rust over other languages when 128bit arithmetic played such an important role you wrote inline assembly for it?
Higher-precision arithmetic is important but is not our primary product. The existing code was performant enough for our usecase (although the increased speed is helpful). Rust has many benefits and I find it hard to imagine how anyone manages to get any performance-sensitive work done in the cryptocurrency industry with anything else.
From just reading the discussion so far, I'd like to see how a naive implementation in Julia would do (Julia does type inference, and some quick duck-duck-ing indicates that unsigned 128 bit int is the biggest regular number type - before having to go to bigint).
[ed: ok, from a skim, I see this is actually about 256bit multiplication - which makes me curious how just using bigints and * (mul operator) would work in Julia.
Also, I don't get this:
> u256_mul multiplies two 256-bit numbers to get a 256-bit result (in Rust, we just create a 512-bit result and then throw away the top half but in assembly we have a seperate implementation)
How do they know the result will fit in 256 bits? Sounds like they know more about the arguments than both having to be 256 bits long?
Julia has `widemul` which takes two `Int64` and produces an `Int128`. It is also not to hard to actually add your own primitive type for `Int256` (with the caveat that it needs support from LLVM, which I haven't checked)
C has had strong support for inline assembly for a long time, and in fact for a short period we called out to C in order to use inline assembly on stable (the function call overhead was worthwhile compared to how much faster the assembly implementation of the function was).