JSON winning over XML is like saying CSV won over MySQL. They aren't equivalent....

svachalek · on May 30, 2024

What's missing in ECMA-404? Never had a problem with JSON parsers or writers, using it all day every day for decades. It's crappy in some ways, sure, like lack of full floating point support, but standardization is not an issue.

XML is mostly already lost on the current generation of developers though, much less future developers. Protobuf and cousins generally do typed interchange more efficiently with less complexity.

int_19h · on May 30, 2024

It's focused almost entirely on syntax and ignores semantics. For example, for numbers, all it says is that they are base-10 decimal floating point, but says nothing about permissible ranges or precision. It does not tell you that, for example, passing 64-bit numbers in that manner is generally a bad idea because most parsers will treat them as IEEE doubles, so large values will lose precision. Ditto for any situation where you need the decimal fraction part to be precise (e.g. money).

RFC 8259 is marginally better in that it at least acknowledges these problems:

   This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
   available and widely used, good interoperability can be achieved by
   implementations that expect no more precision or range than these
   provide, in the sense that implementations will approximate JSON
   numbers within the expected precision.  A JSON number such as 1E400
   or 3.141592653589793238462643383279 may indicate potential
   interoperability problems, since it suggests that the software that
   created it expects receiving software to have greater capabilities
   for numeric magnitude and precision than is widely available.

   Note that when such software is used, numbers that are integers and
   are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
   sense that implementations will agree exactly on their numeric
   values.

But note how this is still not actually guaranteeing anything. What it says is that implementations can set arbitrary limits on range and precision, and then points out that de facto this often means 64-bit floating point, so you should, at the very least, not assume anything better. But even if you only assume that, the spec doesn't promise interoperability.

In practice the only reliable way to handle any numbers in JSON is to use strings for them, because that way the parser will deliver them unchanged to the API client, which can then make informed (hopefully...) choices on how to parse them based on schema and other docs.

OTOH in XML without a schema everything is a string already, and in XML with a schema (which can be inline via xsi:type) you can describe valid numbers with considerable precision, e.g.: https://www.w3.org/TR/xmlschema-2/#decimal

chuckadams · on May 31, 2024

GraphQL does define the size of its numeric types: Ints are 32 bits, Floats are 64, so if you have a bigint type in your db, you'd best be passing it around as a string. Any decent GQL implementation does at least check for 32-bit Int overflow. Several people have independently come up with Int53 types for GQL to use the full integer-safe range in JS, but the dance to make custom scalars usable on any given stack can be tricky.

WorldMaker · on May 30, 2024

There are a lot of proponents that some or all of the "JSON5" [1] improvements should be standardized by ECMA as well. Especially because there is a mish-mash of support for such things in some but not all parsers. (Writers are a different matter.) Primarily comments and trailing commas, are huge wish list items and the biggest reasons for all of the other "many" variant parsers (JSONC, etc).

[1] https://json5.org/

int_19h · on May 30, 2024

This is more of a concern for JSON configs and other such uses that are directly exposed to humans, but not really for machine-generated and machine-consumed data.

cess11 · on May 31, 2024

It differs from the RFC, notably "text" is valid JSON according to ECMA but not the RFC. I've come across JSON parsers stumbling on the bottom part of ASCII, for example. JSON -> internal representation -> JSON commonly leads to loss of information.

Sure, protobuf is nice, but more limited in scope and closer to a JSON alternative than an XML alternative.

I use JSON every other day and have been for decades.

bigstrat2003 · on May 30, 2024

Frankly, if a developer can't figure out XML then they aren't worth their salary. Age is no excuse here; as a developer your job involves figuring out how to work with technology you haven't used before.

Xarodon · on May 30, 2024

XML parsers suffer the same fragmentation issues that JSON parsers do.

Go's XML parser straight-up emits broken XML when trying to output tags that have prefixed namespaces.

cess11 · on May 31, 2024

How come? Someone had a boring summer and made their own instead of using libxml2?

jayd16 · on May 30, 2024

Its not JSON over XML. They're saying JSON REST won over _XML and SOAP_.

int_19h · on May 30, 2024

Those two are kinda orthogonal, and while there was some overlap for adoption, it was fairly common to serve XML over REST early on (because more languages and frameworks had proven-quality XML parsers out of the box, so it was easier for the clients to handle).

JSON won in the end mostly because it was easier to handle in JS specifically, which is what mattered for the frontend. Then other languages caught up with their own implementations, although in some cases it took a while - e.g. for .NET you had to use third-party libraries until 2019.

acdha · on May 31, 2024

> JSON won in the end mostly because it was easier to handle in JS specifically, which is what mattered for the frontend

Browsers had XML parsers before they could handle JSON directly, and at the beginning there were complaints that JSON was harder to use for that reason. The reason why JSON won rapidly even for backend apps which never loaded it in JSON was ergonomics: every part of the XML world from the parsers to XPath/XSLT/XQuery to the rat’s nest of standards was plagued by the hairy-shirt “this is hard and should feel hard” attitude that has thankfully become less common. I saw so many people just burn out in the entire ecosystem because they got tired of unhelpful errors, pointless usability bugs around namespaces, low-quality or missing examples, and especially how common tools just stopped getting improved.

I maintain that the format would have been far more popular if all of the effort spent on standards work after the turn of the century had been suspended and the time directed to fixing things like the usability of namespaces in almost every parser, and hiring at least one person to work on libxml2 so developers could actually use features which shipped after 1999. Unfortunately it seemed like there were a ton of architects who really wanted to spend time building castles in the air and they just seemed to assume that someone else would do the boring parts of implementing it, but those people all jumped on JSON pretty quickly. I worked with a bunch of people who weren’t developers and the cycle of initial enthusiasm fading into “doesn’t this kind of suck?” with XML was depressing to watch having seen so much initial promise.

neonsunset · on May 30, 2024

To claim that pretty much every .NET project adding Newtonsoft.JSON as a first step was somehow a problem is just strange. No adequate team would claim this to be a problem.

int_19h · on May 30, 2024

It was enough of a problem that Microsoft eventually saw it fit to come up with the official replacement.

neonsunset · on May 31, 2024

It was made so that the ecosystem could continue to evolve, particularly in terms of performance and security hardening. But okay, what was the reason System.Text.Json introduced in your opinion? What were the egregious problems with Newtonsoft.Json?

int_19h · on May 31, 2024

In my opinion, the single biggest issue with Newtonsoft.Json is https://github.com/JamesNK/Newtonsoft.Json/issues/862.

Sure, you can disable it, but the fact that it is opt-out to begin with - i.e. that by default the parser will try to creatively interpret any string it sees in JSON input and convert in a locale-specific manner that also quietly loses data - is, frankly, insane through and through. I've personally run into this issue many times in existing code - it usually happens when people first start using the library and just never run into any inputs that would trigger this behavior while testing. Then once that code is shipped, someone somewhere just happens to have the data that triggers it.

And if you look at the comments to that issue, there are numerous mentions from other GitHub repos due to bugs it caused for them, including some Microsoft projects.

The cherry on that cake was author's response indicating that he doesn't even understand why this design is problematic in the first place: "I like what it does, I have no plans to change it, and I would do it again if given the chance." I wouldn't trust any parser written with this kind of attitude.