Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Jepsen: Dgraph 1.1.1 (jepsen.io)
203 points by aphyr on May 1, 2020 | hide | past | favorite | 64 comments


Hey folks, author of Dgraph here. If you're interested in the design details of Dgraph, and have the appetite for a very technical research paper, please check this out:

https://dgraph.io/paper

I'd like to thank Kyle in doing another round of testing. Some of these bugs that we fixed (in 2018 and 2019) were very tricky edge cases -- it's incredible to see Dgraph running so much more stable now, under all varied failure scenarios.

Let me know if you have any questions, I'm around to answer.


I've been using Dgraph in building some PoCs and it's been great. My current confusion is around the push for a plain GraphQL compliant query endpoint as I'm not sure what the intended purpose is. Presumably it's not for direct connections between a front-end web client and the DB using something like Apollo Client. Should front ends still be going through some sort of API server that ingests GraphQL or REST style requests and queries Dgraph to build results? That's what I'm currently doing to implement things such as custom logic, side effects, authentication, etc. I'm not seeing the benefit of the plain GraphQL endpoint in that case, since I would be managing a separate GraphQL schema in the DB that is likely a subset or is similar to the API server's GraphQL schema. I've been sticking with GraphQL+- since it works great and I need an API server in between the front end and DB anyways. I've built some helper functions to map plain GraphQL request trees into expanded GraphQL+- queries for expanding edges since my API schema closely matches my Dgraph schema.

Would love some insight into the plans or use case for the plain GraphQL endpoint.


The latest version of Dgraph 20.03.1 contains GraphQL endpoint. We're releasing few more features very soon, which would add object level auth, ability to trigger custom logic, call remote GraphQL endpoints to execute part of the query, and so on. So, you could put Dgraph directly in front of a frontend, without worry.

Dgraph can also be called via Apollo Gateway, as part of their federation.

We also have other very interesting stuff in the pipeline, which would cater a lot to front end development as well as backend. So, stay tuned.


Does Dgraph work in geo-distributed applications? It's great to be "distributed" and all, but if your cluster is in one data center and it loses connection, you're hosed. I'd assume for something using Raft (and whatever other custom protocol you're also using) there must be a way to configure election timeouts, but I can't find any docs on it.


Depends upon how far apart are the servers. It is on our roadmap to experiment with Dgraph servers as part of one cluster across US based datacenters. Once we establish that, we can try it out across US / Europe, etc.

We use 100ms heartbeats, with 20x that for election timeouts, in line with Etcd. Don't think that's exposed to the end user -- nobody has needed to tweak that.


Hi, was nosing around your website and find it very much appealing. Am currently for hire, rooted in germany. Whom should I contact? Thx! Am author of https://github.com/mro/librdf.sqlite



Did you have to pay Kyle for him to test your database?


Yep! Testing databases is my full time job. Vendors pay me for the work, and that means the Jepsen library, test harness for each database, and all the reports are free for everyone. :)


Loved the precise, concise, and complete technical expression in the Background section.

(Thank you for sharing your great work.)


Does the Jepsen test require you to host the databases or can you test a system completely remotely?


Sure, I could run Jepsen against anything I can talk to over the network, but a.) it might be painful or impossible to get a fresh, healthy cluster for each test run (It's... fairly common that databases stop being databases during a test) and b.) doing fault injection means I need a way to, well, inject faults. That's why Jepsen tests almost always involve Jepsen controlling the cluster.


Ok, what are the API calls you need now, and would need to remotely inject faults? Can you make a HTTP server API with a client implementation for it so that we can just make implement the server API and point your test cluster to the nodes for automatic testing? I understand this would be counter to you making your living out of this, but think of it as a "light" pro-bono version!


It sounds like you're asking for contract work; you're welcome to email me at aphyr@jepsen.io.


The report states the work was funded by Dgraph and conducted in line with their ethics policy.


You may find the Ethics [1] section useful.

Also, are you an acquaintance of Kyle Kingsbury? Just curious as you refer to him by his first name.

[1] https://jepsen.io/ethics


It's an internet convention to refer to people by their first names.


I find Dgraph to be one of the most interesting of the current batch of non-relational data stores.

I've wanted a robust, easy-to-use graph database for years (ever since reading about the crazy brilliant graph database stuff that goes on inside Facebook https://www.facebook.com/notes/facebook-engineering/tao-the-... ) and Neo4J never really cut it for me.

Watching Dgraph mature - and survive two rounds of Jepsen with relatively decent marks - is intriguing. I don't have a project that needs it yet but maybe something will come up soon.


Neo4j is... conditionally alright. The main problem is that their marketing has a bad case of MongoDB syndrome.

N4J: "We sell cars. Our cars can be anything you want, and are at least pretty good at everything, and often great and even better than any other possible car you can get!"

Client: "Cool, I need a spacious four-door sedan with good gas mileage that's also a race car"

N4J: "Oh yeah, definitely, no problem, this is the product for you, totally fits your use case!"

Client orders car, it arrives, is in fact a delivery van that likes to stall at traffic lights

Client: "Uh. This would be useful if I needed a delivery van for a route with no stops, I suppose."


> Neo4J never really cut it for me.

What did you find lacking in Neo4j?


In my experience, from ~1.5 years ago, performance became an extreme challenge to overcome if you wanted to service user facing requests in <100ms (for a non-trivial sized graph). I really did enjoy Cypher though and the tooling around it was very polished. Tempted to try it again now that they have a new version.


Last time I looked at it it seemed to really want me to use Java.

Turns out the official Python client library is five years old now so I clearly need to update my mental model of what it can do!


I've been using Dgraph for over a year (on/off, it's a side project). I first saw Dgraph on HN.

I thought Dgraph was going to be the "secret sauce" for my app after reading the list of features (maybe I was mesmerized by the cute mascot). Few months down the line, though, sometimes I question my decision and whether I should have used the good old PgSQL. Let me explain.

1. Coming from SQL and key-value NoSQL, Dgraph was very foreign to me. It's like OOP guy learning Functional Programming. Now I'm quite comfortable with it, but it took me months to be productive. To give an analogy, learning Dgraph is like learning Scala instead of Python.

2. Actually it's worse, the resources are mostly in the forum (micheldiz is my hero) and the official website. Until recently, the navigation for the docs is horrible, everything in one big HTML file, difficult to jump in from Google search result.

3. Rudimentary tooling for development phase. When you're working on a new idea for a product, you will experiment A LOT with the schema and data. When you write wrong data, with most RDMBS you can use GUI to right-click and delete or edit. In Dgraph, you must write mutation query (assuming you remember the syntax, as it is a bespoke language). Dgraph GUI is very minimal.

4. About the mutation.. in Dgraph (as far as I know) there's no referential integrity in the DB-sense. Like, you can make FK to non-existent object, or insert something invalid without returning error (but it's not stored, since it's invalid). The "integrity check" is in your app.

5. Because of #3 and #4, I find it easier to just drop the whole database and recreate it along with seed data, every. time.

6. The documentation for the Java client library is very minimal. So there you go, unfamiliarity with the query language ("GraphQL+-"), with the Dgraph itself, and with the client library.

I still use Dgraph, it's a good fit for my app, but if you're starting on a new business idea, maybe don't use anything fancy. My mistake, being a developer, I mixed research (new tech!) and bootstrapping.

(In case you're wondering, my app is http://s.id/axtiva-android-test -- still version 0.0.x, but recently I'm releasing weekly)


Hmm... That's probably not the testimonial I was hoping to get from a Dgraph user.

Though, I can see part of your pain is because of the custom query language, GraphQL+-. We now offer standard official GraphQL compliance as well, which tackles a lot of these issues you ran into.

1 and 2. GraphQL is becoming very common, so plenty of resources.

3. GraphQL has many amazing editors.

4. GraphQL allows for lack of referential integrity by setting certain fields in the object as non-nullable, which can remove those objects from the results and so on -- which is frankly, the thesis we have around a distributed graph database, sharded by predicates (not nodes).

5. arrr... sad.

6. GraphQL has many client libraries and so on.

Hope we could change your opinion by switching you to standard GraphQL -- particularly, if you don't need the advanced features provided by plus-minus.


TBH I think most of those complaints are true of the graph database ecosystem in general. I haven't used Dgraph but I echo the same concerns from my experience.


Sentiments are changing now, particularly with GraphQL. Dgraph bet on GraphQL early on, and it's really catching on as a replacement for REST.


You might consider using cayley [1] instead. If you find it easier to drop and recreate the database from scratch, then you have a source-of-record (to recreate the database from) and so it might make sense to treat your graph database as the place for doing graph analytics rather than as the place for storing the source-of-record data AND doing graph analytics.

Thinking of the graph database as a graphics programmer might think of the GPU might be a helpful perspective.

[1] https://cayley.io/ [2] https://oss.redislabs.com/redisgraph/

EDIT: I love dgraph, and reevaluate it anew frequently. However, I've shipped cayley and this approach works. Continue down-voting this throwaway account though.


Haven't used DGraph itself but I've used the storage engine they built for it - Badger, an alternative to RocksDB better optimized for SSD's - in two projects.

One was for event saving and retrieval, which was able to sustain a stable 60k writes /s with simultaneous 10k reads /s. It worked great overall, with stuff needing nontrivial tuning being 1. RAM usage 2. If you overwhelm it with writes it'll stall to keep up with level 0/level 1 compactions.

Another one is OctoSQL[1], where we're building exactly-once event-time based stream processing all around Badger. So far it was a breeze and I don't think we'd build it if not for Badger.

Overall, at least the storage engine they're using is awesome, and I can definitely recommend it!

[1]:https://github.com/cube2222/octosql


Love it! You should send a PR to add OctoSQL to the list of projects using Badger (GitHub README).


I'll certainly do when we actually release the version based on badger!


Can anyone share their production experience with Dgraph?


Wasn't at a huge scale but for one project another intern and I took a proof of concept that another engineer had done with Gremlin [1] and turned it into a full tool and ended up using dgraph. The python bindings were easy to work with and Ratel (the UI/web frontend) made quick searches and tests easy.

I liked working with it so now I'm the package maintainer for it on AUR [2]. At some point I'd like to make a repo showing how to implement common graph algorithms with the python bindings, since GraphQL+- currently only supports k-shortest path at the query level [3].

[1] https://tinkerpop.apache.org/gremlin.html [2] https://aur.archlinux.org/packages/dgraph-bin/ , https://aur.archlinux.org/packages/dgraph-git/ [3] https://discuss.dgraph.io/t/how-about-doing-some-graph-compu...


Thanks for maintaining Dgraph on AUR. I'm a fan of Arch Linux.

I think the latest release is 20.03.1, perhaps time to update?


Sure thing!

Will do, I've been a little lax the past few weeks but my school semester just finished so now I should have more time again.


I'm building a product that does graph analytics on top of DGraph.

Some constraints we have: * We ingest what some may consider a lot of data - on the order of terabytes a day. This can be 10's of thousands of writes per second.

* We need transactional logic.

* We want to analyze that data as it comes in, so think 10x reads for every write.

GraphDBs out there didn't seem like they would cut it. I eliminated almost every database due to:

* Bad licensing

* Incapable of scaling writes horizontally, or generally anding tons of rights

* No ACID transactions

Most graphdbs out there had at least two of these issues.

DGraph so far has worked really well. We aren't sending it the full load of data yet, so there's still a question around that write load, but at least it's designed for that, and initial numbers have been promising.

The fact that it's liberally licensed, has a really good pricing model, has strong community support, good docs, etc, has made me glad I chose it.

The roughest part is probably the query language, because it's bespoke and therefor ends up having weird unexpected behavior sometimes. Now that it supports GraphQL that should be less of an issue.


I have been developing a POC using dgraph for the last several months. I can't really comment on its robustness at production load since it was only used for local development.

But I can comment that getting the "right syntax" was at times extremely frustrating. It has a lot of "there is just one way to do it, and you have to spend a month reading our code to find it" kind of thing going on. It is definitely "beta" software in that regard, and the ease of use of its query language (languages) is abysmal. The documentation is also extremely confusing and incomplete, to say the least. Needs a lot more examples and a lot more "ways to skin a cat" than currently documented.

On a good note, the support provided via discuss.dgraph.io is really good, even though there are so many people struggling to make it do simple things - that support forum will likely be a place where the answer can be found, or someone can answer (rather quickly) with some help.


We now have a dedicated technical writer on the team, whose sole job is to improve our technical documentation. If you have suggestions, would be great if you could jot them down at [1], so we can improve the documentation.

[1]: https://discuss.dgraph.io.


Somebody in my organization got very interested in Dgraph in its 0.7 or 0.8 days. The version was marked as "production ready", but it was an absolute trainwreck.

We were modelling individuals and contacts between them, and the cluster would constantly break with dataset sizes that should have been easily managed. There was clearly something wrong in the storage engine, because we saw insane disk space usage. Dgraph consumed 10s of TB for something that should have taken < 100 GB.

We were one of the largest installations at the time, and were working with the core development team, but they were never able to resolve the issues.

We eventually had to tell management that there was no way we'd be able to operate the thing given its disk space consumption rate, so we had to delay project delivery to rip out Dgraph and replace it with Postgres.

Surely it's better today, but I'll never use it again by choice.


I don't agree with this high size database gap.

Dgraph is built for performance and with one of our app, we faced similar challenges. After reading some documentation and watching some of their videos, I got to know they compromise space against performance when we have lots of index. We reduced index from 35 to 8 and the db size got drastically low.

I believe you should investigate that as well to check if it's wrong in your database architecture design.

For your gap of <100 GB against 10,000 GB. I assume, you probably created lots of index. Just create good database design and reduce index, you will have low size high performance app.


v0.7? That was Dec 2016. A lot has changed in 3.5 years.


I love working with Dgraph but I hate DBA-related work. As a result I only use it for local projects. I’ve once experienced the data becoming inaccessible and being unable to restart the docker containers (which sadly happened before I began to regularly export data, but luckily with data that wasn’t too important), but I’ve otherwise only had positive experiences (micheldiz is great).

I’d probably use it for pet projects if they made it easier to automate backups to the cloud, and I’d use it for all projects if they offered a hosted solution.


Something is in the works around a managed service!


My experience is that it’s a memory hog. There’s a ton of tiny caveats that are easy to overlook until you spend a good amount of time debugging an issue. The team and community is active and helpful. The overall experience is positive and the technology nifty.


We are doing PoCs around it -- however the text search is not ready for prime-time. https://github.com/dgraph-io/dgraph/issues/5102


FWIW, we do a lot of 'GPU visual graph analytics and investigation for X' work at Graphistry, where X is hooking into either graph DBs (neo4j, ...) or doing as a virtual / on-the-fly layer over other data systems (Splunk, jupyter notebooks, ...). Almost all of our user's graph projects have ended up involving text search, and as part of that, search indexes. Think security, fraud, genetics, etc. I can only think of a few exceptions that did not need text, such as blockchain viz. I just sort of assume text fields as part of linking data nowadays. In fact, a lot of our recent work is going to the next level, where we use ML algs to compute over text to infer even fuzzier connections, vs simple ID/string/regex matching from the older days of graph tech.

So at least for domains where people want to make correlations over data such as a logs, events, transactions, CSVs, etc., I encourage dgraph folks to watch discussions of text closely.

Fun recent example that illustrates this: For ProjectDomino.org (COVID anti-misinfo), we started by ingesting the covid twitter firehose into a graphdb for easy and fast pivoting by tweet/account/etc. However, our analysts need to search by text, and a lot of our current work is now doing ML/graph algorithms to mine the text to infer fuzzy edges: GPU BERT, GPU UMAP, ... . Neo4j supports setting up various text indexes which helps search, but for analytics, we end up having to extract the data out of the DB, infer relationships & scores, and put them back in.


(author of Dgraph) We want to improve full text search, to bring it inline with Elastic Search. A lot of people compare Dgraph against Elastic, because they'd rather just have one solution (Dgraph) instead of two.

It's in our backlog to improve FTS drastically from where it stands today.


do you have any reusability of the infrastructure for indexing edge properties to reuse in FTS?


I'd similarly been evaluating a Python client implementation a while back and found the developer experience a little rough around the edges[1].

It's reassuring to see Dgraph undergoing the full Jepsen treatment, even if it highlights that there's still a bit of work to do, and further stability to prove.

[1] - https://github.com/dgraph-io/pydgraph/issues/94


one thing i'm not a huge fan of is dgraph's UID model, which is effectively an auto-incrementing uint across the entire cluster. because it auto increments server side, it's non-deterministic; ingesting 10 nodes before 10 others means that the UIDs will change across despite being the same XID. there is a way to use "blank nodes" to link nodes and edges with non-int UIDs, but that is only per-mutation, not per-commit or per-transaction. there is no way to tell dgraph what the UID should be.

that means that if you have externally unique IDs that you have infrastructure around, you are either caching that node's UID externally or doing an XID->UID lookup in order to create edges.

there is a bulk loader but that's only available in HA mode, and the UID:XID map it generates is obviously for data you already had in flat files (or whatever). so it's ok for static data sets, but not ideal for live updating data.

the gRPC API also has strange undocumented (AFAICT) behavior where even smallish batches of 100 hit some unspecified gRPC limit, so you need smaller batches ergo more commits ergo more wasted compute.


> there is no way to tell dgraph what the UID should be.

There is. You can lease UIDs from Zero, and do your own assignment. Look at /assign endpoint [1]

> doing an XID->UID lookup in order to create edges.

Also, you can use upserts to do an XID lookup, before creating a new node. Which is practically what other DBs do too.

> there is a bulk loader but that's only available in HA mode

Don't know what that means. Bulk Loader is a single process (not distributed), and can be used to bootstrap a Dgraph cluster. The cluster can be a 2-node cluster, or an HA cluster, that doesn't matter.

> where even smallish batches of 100 hit some unspecified gRPC limit

Never heard of that. Grpc does have a 4GB per message limit. But, I doubt you'd hit that with 100 records.

[1]: https://dgraph.io/docs/deploy/#more-about-dgraph-zero


thanks for the reply!

i hope this comes across as non-critical feedback, but it'd be really, really nice to put that assign endpoint in some form or fashion in the Mutation documentation. it is completely absent from there, and i don't recall seeing it in the tour of dgraph either.

furthermore, it's absent from the golang client. the documentation states:

> It’s possible to interface with Dgraph directly via gRPC or HTTP. However, if a client library exists for you language, this will be an easier option.

however it looks like i'll need an additional HTTP layer to interface with the /assign endpoint. not a huge deal, but that seems like a big functionality gap with the golang endpoint - would definitely like to see that added in there.

lastly, the /assign endpoint and the bulk loader can only be run with a DGraph Zero instance, which, as far as i can tell, which doesn't run by default with the provided docker image. that's an important detail that's not super duper obvious from the docs, until you start seeing parameters like dgraph-zero, and then realizing that it doesn't come with the quick start docker image.

again, hope this isn't taken personally. thanks for your work on the project!


No worries at all, I like to hear feedback from users, whether its positive or negative. Though, I also like to separate wheat from chaff, which is why I have suggestions / follow up questions, etc.

Assign endpoint is something that you can just do once. You could say, give me a million UIDs, and then use them however you want. You don't need to call it repeatedly.

Also, its an endpoint to Zero, not to Alpha. Zeros are not supposed to be directly talked to, in a running cluster. We're now doing work around exposing some of Zero endpoints via Alphas, in our GraphQL rewrite of the /admin endpoint. So, that might make it easier.

I think the consistent theme I'm hearing here is that our documentation isn't clear -- we aim to improve that. But, could use more critical, logical feedback / suggestions on our forum -- so please feel free to pitch in there.


There's a part I don't get here: "To store large datasets Dgraph shards the set of triples by attribute, breaks attributes into one or more tablets, and assigns each tablet to a group of nodes." But earlier, it says, "For convenience, Dgraph can also represent all triples associated with a given entity as a JSON object mapping attributes to values—where values are other entities, that entity’s attributes and values are embedded as an object, recursively."

I know almost nothing about graph databases, so presumably this is just my ignorance. But if entity-focused retrieval is an important use case, isn't clustering by attribute going to kill performance? Naively, I'd think that one would cluster by entity and what an entity is connected to.


The first sentence is describing how Dgraph shards data for storage; the second sentence discusses how data can be represented in the query API. You're right that this could lead to broad fanout, if users typically retrieved all attributes for a given UID. It also impacts the performance of joins: graph traversal across a single attribute is much faster when all those edges are on the same node, but graph traversal across different attributes might pay a higher latency cost. This is a classic dilemma in distributed graph storage--does one shard by attribute? By entity? Each leads to distinct performance tradeoffs, and Dgraph happened to choose attribute sharding.

Also keep in mind that typical Dgraph workloads request specific attributes (think "SELECT NAME, AGE") rather than everything ("SELECT *"), which reduces the impact of fanout. :)


> graph traversal across different attributes might pay a higher latency cost

Those can be done concurrently if at the same query level, so not necessarily any slower. In other terms, the number of network calls required (in a sufficiently distributed cluster, where each predicate/attribute is on a different server), is proportional to the number of attributes asked for in the query, not the number of results (at any step in graph traversal).

And that's the big part of the design. By constraining the number of network calls to very few machines, while doing traversals, which would lead to millions of results in the intermediate steps -- Dgraph can deal with high fan-out queries (with lots of node results) much better.

Alternative would be to shard by nodes (entities) -- in which case, if the intermediate steps have millions of results, they could end up broadcasting to the entire cluster to execute a single query. That'd kill latency.

So, the problem is not how many attributes a query is asking for -- that's generally bounded. The problem is how many nodes you end up with as you traverse the graph, those could be in millions.

That's why many graph layer systems suck at doing anything deeper than 1 or 2 level traversals / joins.


>> graph traversal across different attributes might pay a higher latency cost

> Those can be done concurrently if at the same query level, so not necessarily any slower.

An important clarification, yes! I should have made that more explicit. :)


Thanks! That's exactly what I was looking to know. And it couldn't come from a more trusted source.


I think the other way around. Dgraph was thought to be distributed. It cannot be "atomically" distributed based on entities. But in smaller parts of an entity. If each instance of Dgraph takes care of a small part of that entity, it becomes virtually more performatic.

Imagine, in an abstract way, that it is a hashing process. Where the attributes are spread over several instances in the cluster and Dgraph just "decodes" it for you quickly because it already has the key.

Could it be based on entities? yes, but I don't think it would bring any benefit. Imagine that you have 10 instances, and you have 10 entities. You have 3 entities with many edges (values and outgoing edges) and 8 "light" entities, without much information. Well, you will leave 3 entities in high usesage of resources while 8 entities are idle because they do not have much information to deliver to all query and mutation requests in the Cluster.

A system based on attributes in a distributed way is much more performance tho.


I think the Jepsen comment might be confusing. The point is that Dgraph can ingest data in either a Triple format, or a JSON format. Internally, Dgraph stores it in its own binary format. See [1] for more details.

> Naively, I'd think that one would cluster by entity and what an entity is connected to

In a distributed system, that approach leads high-fanout and network broadcasts, which kills query latency. Ideally, you want to do a traversal / join in one network call (max), not more. Because of this design, Dgraph can execute arbitrary depth queries in a much faster way. More details are in [1].

[1]: https://dgraph.io/paper


Does Jepsen ever open source any of their tooling/test harnesses? I teach a distributed systems class and it would be great to have an automated test framework/tools for Distributed systems issues


Yes: the library is prominently linked on the home page, and there are deep links to the Dgraph test suite code throughout the report. Pretty much all of my work is OSS, and public release of test harness for each report is explicitly part of the Jepsen ethics policy. :)

https://github.com/jepsen-io https://jepsen.io/ https://jepsen.io/ethics


Awesome! Quick question - how much of the test harness you use for each report is generic/reusable, and how much is system specific? I have my students implement various algorithms/systems in Elixir e.g. Raft/Paxos, various broadcast algs etc. It would be nice to have something both they and I could use to simulate network partitions etc.


Kinda depends on what you're doing, how much of Jepsen you're using, and how complex the system-specific code is. You can write a minimal Jepsen test in ~100 lines of code, if that's helpful. Jepsen and its main supporting libraries (Elle and Knossos) clock in at about 19K lines of code; a little over six years of full-time work.

For simulation testing, I'd suggest looking at Maelstrom, which uses Jepsen to provide a sort of workbench for writing toy Raft implementations in any language. You give it a binary which takes messages as JSON on STDIN and emits messages to STDOUT; it spawns a bunch of "nodes" (local processes) of that binary, connects them via a simulated network, generates pathological network behavior, simulates client requests, and verifies the resulting histories with Jepsen.

https://github.com/jepsen-io/maelstrom


This is missing too many hand drawn stuff and memes. This isn't the jepsen I remember :P




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: