I think the Jepsen comment might be confusing. The point is that Dgraph can inge...

I think the Jepsen comment might be confusing. The point is that Dgraph can ingest data in either a Triple format, or a JSON format. Internally, Dgraph stores it in its own binary format. See [1] for more details.

> Naively, I'd think that one would cluster by entity and what an entity is connected to

In a distributed system, that approach leads high-fanout and network broadcasts, which kills query latency. Ideally, you want to do a traversal / join in one network call (max), not more. Because of this design, Dgraph can execute arbitrary depth queries in a much faster way. More details are in [1].

[1]: https://dgraph.io/paper