russell_sears's comments

russell_sears · on June 3, 2024

My grandfather used to work on telephone field network boxes (e.g., the big boxes full of messes of wires in residential areas).

One job was to look for outliers in the network, and they spent time studying areas with an unusually large number of issues and ones with an unusually low number of issues.

There was a part of the phone network that was decades overdue for an overhaul but had no issues, so they inspected it. (This was decades ago, so the replacements for this antique could be modern day museum pieces).

When they took a look, everything was corroded beyond reason, as expected. However, the connections were still low noise / low resistance.

The old boxes used some sort of post connector and a crimp. It had something like two or four points of contact for redundancy (all contact points would need to corrode before it failed).

In the boxes with no failures, the (long gone) technician simply stuck the end of the wire into the post crimp hole, then wrapped slack wire around it a dozen times.

This gave it 100’s of contact points, and (after reverse engineering the technique) it took something like 1/10th as long per connection.

Sadly, we’ll never know if the installer was a genius, lazy or both.

Anyway, the crimp connectors in fig 19-25 and 19-28 of the nasa article look like the same concept but turned inside out.

nullc · on June 3, 2024

I'm not sure why more of those didn't use wire-wrap, it's super fast and is really reliable. I've seen some bell terminals that were wirewrap but punchdown was more common in my experience.

natpalmer1776 · on June 3, 2024

As a young guy who was (un)lucky enough to be exposed to punchdown terminals as a field technician for AT&T…

Fuck punchdowns and everyone who said “good enough for me” when deciding to adopt them.

On a brand new install? Sure it works great!

In a box that has been sitting in the rain for a little over a decade? No.

figassis · on June 3, 2024

Laziness tends to be a great catalyst for genius work

russell_sears · on March 21, 2024

This looks like the sort of bug I'd write back when I used mutexes to write I/O routines. These days, I'd use a lock-free state machine to encode something like this:

   NOT_IN_CACHE -> READING -> IN_CACHE

(the real system would need states for cache eviction, and possibly page mutation).

Readers that encounter the READING state would insert a completion handler into a queue, and readers transitioning out of the READING state would wake up all the completion handlers in the queue.

I've been working on an open source library and simple (manual) proof system that makes it easy to verify that the queue manipulation and the state machine manipulation are atomic with respect to each other:

https://docs.rs/atomic-try-update/0.0.2/atomic_try_update/

The higher level invariants are fairly obvious once you show that the interlocks are correct, and showing the interlocks are correct is just a matter of a quick skim of the function bodies that implement the interlocks for a given data type.

I've been looking for good write ups of these techniques, but haven't found any.

tavianator · on March 21, 2024

The existing btrfs code does use a lock-free state machine for this, `eb->bflags`, that sort of mirrors regular page flags (hence `UPTODATE`, `DIRTY`, etc.).

But Linux kernel APIs like test_bit(), set_bit(), clear_bit(), test_and_set_bit() etc. only work on one bit at a time. The advantage is they can avoid a CAS loop on many platforms. The disadvantage is you only get atomic transitions for one bit at a time. So the `READING -> UPTODATE` transition is more like

    READING -> (READING | UPTODATE) -> UPTODATE

And the `NOT_IN_CACHE -> READING` transition is not fully atomic at all:

    if (!(bflags & UPTODATE)) // atomic
                              // race could happen here
        bflags |= READING;    // atomic

The whole state machine could be made atomic with CAS, but that would be (slightly) more expensive.

russell_sears · on May 12, 2018

You might also be interested in JOL/Overlog, which was the predecessor to Bloom:

https://github.com/tcondie/jol

We built quite a few prototypes on top of it. The language implementation is a research prototype, but it is impressively stable.

Overlog’s synatax is based on datalog, but it shares many core ideas with Bloom.

russell_sears · on May 12, 2018

You can guarantee termination by limiting bit precision, but it is more interesting to limit the set of operators your program uses instead.

For instance, you might have an operator that translates from hostname to IPv4 address.

That function can return billions of different numbers, but if your program’s input only has 10 hostnames in it (and you don’t use string manipulation or other tricks to fabricate hostnames) then you can infer that the IP lookup function only adds 10 new symbols to the sets that your program maintains.

Searching for “herbrand universe” will produce formal writeups of these ideas.

russell_sears · on Aug 23, 2016

We really need a global carbon credit system (with stiff tariffs for countries that do not participate). Cut net greenhouse emissions by a few percent a year until net emissions are negative and atmospheric CO2 and methane are back to pre-industrial levels. This will help renewables (and nuclear, for better or worse) in the short term. In the medium term, the invisible hand of the market will make carbon capture profitable, which means smart VCs will start dumping cash into it now.

I think this sounds more plausible than terraforming, but is really the same thing.

russell_sears · on Aug 13, 2016

I read this explanation a few times, and am afraid I still don't understand why Tor is not vulnerable to this attack. Let's say I'm an attacker that provides enough bandwidth to compromise N% of client circuts, and I can disrupt non-compromised circuits at will. Casuing clients to repeatedly renegotiate non-compromised circuits lets me compromise >>N% of circuits, right? For small N, the chance of getting a compromised circuit is roughly linear in the number of circuit disruptions, right?

Unless I'm missing something, this vulnerability provides a significant multiplier to existing Sybil attacks against Tor. A quick search suggests there have been Sybil attacks in the past.

Am I missing something obvious here?

zmanian · on Aug 13, 2016

This attack is a general amplifier on denial of service attacks on Linux server.

If this attack was deployed against Tor, this would appear as a general DDOS attack against Tor and degrade most users experience.

It would not help an attacker direct circuits towards a malicious exit as described in the paper.

russell_sears · on Aug 13, 2016

I still don't understand how you come to the conclusion that the attack won't work.

The bolded text in the blog post reads "the middle relay will pick a different exit relay", and should read "the client will pick a new circuit at random using the same bandwidth-proportional weights."

The security properties of the corrected description aren't very comforting: the attack is no longer guaranteed to work, just like there is no guarantee I will ever roll a one with a fair die.

zmanian · on Aug 13, 2016

You'll get an exit with DDOS attack against it. Not an exit controlled by the attacker. The attacker will see the same fraction of Tor traffic with or without the attack. The Tor user base will notice a massively degraded experience.

You can't affect the proportion of Tor traffic that flows through a given exit with a DDOS attack.

thegeomaster · on Aug 13, 2016

I think the parent comment is saying that given an attack which can reliably abort connections between two arbitrary endpoints, one can reliably break other users' Tor circuits, which causes their Tor clients to renegotiate new ones. If a circuit is negotiated such that all nodes are owned by the attacker (in which case the attacker can unmask the affected user), the attacker can then choose not to disrupt that particular circuit. This, in turn, enables the attacker to penalize non-attacker-controlled circuits, and cause Tor clients to settle on compromised ones (which won't be penalized). This gives the attacker more power than simply waiting for an unlucky client to build a circuit with all nodes belonging to the attacker.

If this is indeed what the parent is saying, and if I'm understanding the attack and the Tor protocol correctly, I too don't see why such an attack isn't feasible. I'm perhaps in the wrong and missing something, but would very much appreciate if someone could clear this up for me.

belorn · on Aug 13, 2016

> enables the attacker to penalize non-attacker-controlled circuits, and cause Tor clients to settle on compromised ones

That's the key question. Will clients "settle" on compromised ones, or just continue to try access the penalized (dead) circuits?

A few years ago, there were a bug which was used to rapidly force users to create new paths. The Tor Project fixed that bug, but they also added a extra precaution for future bugs by limiting how new paths were selected in regard to guard nodes. As to my understanding, when a client first connects, it initially generate a random (but consensus weighted) subset selection of all guard nodes. It then randomly picks one from that selection when creating new paths, which mean that only guard nodes in the selection can ever be picked by the client. A client that is attacked as described in the article will just cycle through its lists indefinitely and never choose a compromised node, unless it already picked a compromised node initially.

thegeomaster · on Aug 13, 2016

Ah, I see. Thanks for explaining that so clearly!

russell_sears · on July 26, 2016

There are two effects at play. If you're reading a multiple of a flash page at a time, then you can get random bandwidth to match sequential. Under the hood, decent SSDs actually round robin sequential reads across all of the underlying hardware, so they can run as fast as whatever their bottleneck resource is. The bottleneck could be error correction, the SATA link, or the channels (copper traces) between the NAND and the controller, for example. Random reads with enough parallelism give the SSD enough scheduling freedom to have the same throughput as sequential.

The second effect comes from the fact that application data is rarely exactly the same size as a flash page. If you group all the related data on one page, then you don't waste time transferring bits you won't look at. B-Trees sort of do this, but they leave holes in pages so they can support efficient inserts (a good SSD or database will compress the holes, but that doesn't help random reads much, since it's still going to transfer entire flash pages to the SSD controller).

LSM-Trees push things a bit further, since new data ends up being grouped on higher levels of the tree. When I was benchmarking bLSM's predecessor, I worked out the expected throughput on the back of an envelope while I was waiting for results. It did much better than expected, since the benchmark heavily favored recently created data! YMMV, of course.

russell_sears · on July 26, 2016

Hi. First author of the bLSM (Yahoo) paper here. You've pretty much nailed it: The trick is getting latency and volume at the same time. If you're willing to wait an hour to look at event logs, then Hadoop is good enough, but where's the fun in that?

At the time, we were looking for the best of both worlds: Hadoop and log processing give high throughput writes, but latencies are much too high to act on user intent within a single session. We had a lot of applications in mind that needed low latency access to event log data, but didn't have a suitable storage engine to back them.

Existing LSM trees suffered from write stalls and excessive read amplification. We were targeting hard disks back then, and didn't want to pay for the extra seeks. We got random read access down to about one seek per read, and figured out how to eliminate write stalls. Surprisingly, we still ended up with good write throughput.

More importantly, from an application perspective, we provide efficient range scans, which lets you reason about groups of data with matching or contiguous keys. If you squint hard enough, this is all MapReduce really does, except it precomputes all the answers up front, and we do it in real time. On the one hand we have lower throughput, since we perform more random reads / sequential writes, and also do more thread synchronization. On the other, we only perform the computations for data that's actually being used for something, which means we do a lot less work for the right applications.

Thanks!