I think the Jepsen comment might be confusing. The point is that Dgraph can ingest data in either a Triple format, or a JSON format. Internally, Dgraph stores it in its own binary format. See [1] for more details.
> Naively, I'd think that one would cluster by entity and what an entity is connected to
In a distributed system, that approach leads high-fanout and network broadcasts, which kills query latency. Ideally, you want to do a traversal / join in one network call (max), not more. Because of this design, Dgraph can execute arbitrary depth queries in a much faster way. More details are in [1].
> Naively, I'd think that one would cluster by entity and what an entity is connected to
In a distributed system, that approach leads high-fanout and network broadcasts, which kills query latency. Ideally, you want to do a traversal / join in one network call (max), not more. Because of this design, Dgraph can execute arbitrary depth queries in a much faster way. More details are in [1].
[1]: https://dgraph.io/paper