The implicit typing is just the worst. It works 95% of the time and looks "clean" in all examples, but it make for sooo many edge cases.
I have one where I need to store zip codes and telephone numbers. Sometimes these begin with a 0. Apparently in YAML1.1 this is then treated like an octal number and silently converted, meaning I don't get an error but a slightly different zip code. just great.
I defaulted to quote just everything, but once I get a chance to rewrite that script I'm gonna ditch yaml.
Also, just as a general word of caution, zip codes / post codes should never be treated as numbers. They are codes, and should be treated as opaque sequences of characters.
yeah i know. but apart from a proper database I haven't found a config file format where I can easily define datatypes (like in "this key is always a string, that one always a positive int" etc.).
Have you considered a corresponding JSON schema? I previously used that for a unified test format, which was basically functional tests for client libraries in various languages expressed in YAML and validated against a schema. We also converted the YAML to JSON to easier parsing, but there was no issue validating the YAML directly.
I know about JSON schema(ta) but haven't had the chance to play around with it. I wanted to avoid JSON because the files were meant to be human writable (and humans make mistakes, hence the need for strong types and validators).
But the project sprawled out since my first (very naive) implementation, so I think the real solution would actually be a proper database backend. But thanks for the suggestion.
As a solid YAML hater: This gets posted every few years, and it's great every time.
But also: This person got it right many years ago, this isn't the Norway problem, it's a lack of foresight and thinking on YAMLs problem. This is why standards are hard, because in an attempt to have syntax sugar (yes/no for true/false) we end up overriding countries.
Tbh JSON is perfectly fine if you're using it for what it was intended for: serializing data over the wire.
JSON only sucks when people try to use it as a configuration format. It was never meant for configuration. It didn't need comments because it was only ever supposed to encode data that would last as long as a single TCP session. Then along came Sensu and LSP, taking "JS object notation" way too literally, and now we're all fucked with config files that don't parse if you put a comment in them and a syntax only slightly less painful to write than XML.
It's not really JSON's fault that people have abused it for things it wasn't meant to do. But yes, the limitations of JSON as a config format probably are a proximal cause for YAML existing in the first place.
Turns out, people like tree-shaped data expressed parsimoniously, and YAML is great at that. Arguably, it's even better than TOML for expressing trees, though I'd be among the first to say that TOML is better in many respects.
If you're using JSON or XML for config, you're indenting your data to visually show the structure, anyway. Why let whitespace live in your config without paying rent?
Hey, you can still stop me from using JSON for the config file of my current project. Which file format you suggest that is human readable, I can effortlessly read/save as a python dict and where I can make comments?
Honestly, TOML, so long as you don't have a lot of dicts-of-lists. TOML becomes cumbersome with nested structures, where YAML remains at exactly the same pain level regardless of nesting. But, if you're nesting your config to that degree, you're probably doing config wrong.
If you do have a lot of nesting and you absolutely can't get rid of it, YAML is easier to write, easier to read, and easier to maintain.
If you do have a lot of nesting and you absolutely can't get rid of it, YAML is easier to write, easier to read, and easier to maintain.
100% disagree, YAML is a much worse experience to write than JSON. I always have to Google how to do even simple things like a list of dictionaries and how the dashes are supposed to be indented. The result is around 10 different options, ensuring I won't remember the "right way" for next time.
There are two ways to write a list of dictionaries:
way_1:
- key: val
way_2: [{key: val}]
If you want ten ways to write something, you want "scalars" (strings), which come in several flavors: literals, blocks without folding or chomps, blocks with chomps but no folding, blocks without chomps but with folding, and blocks with folding and chomps, and all the various directives to control chomping.
If you're google-fu doesn't get you to the right answer, that's a skill issue, not a yaml issue. YAML absolutely has flaws, but your not finding the right answer isn't one of them.
I think people dislike writing JSON just because there's more structure in the form of syntactical characters like quotes and brackets, but that's only a problem if you've not invested in learning structural editing (such as vim motions or similar); if you have, it's great!
Inline tables (json style objects / map) can be written over multiple lines. In TOML 1.0, they were limited to a single line.
This made deeper structures interspersing tables and arrays horrible, as toplevel tables are not the clearest when you start mixing arrays and tables in a non-trivial manner.
Now you can essentially embed json(5) in your toml (the syntax is not identical and it doesn’t have nulls, but it’s pretty close and a bit less verbose since names are unquoted, and it allows trailing commas).
If you do have a lot of nesting and you absolutely can't get rid of it, YAML is easier to write, easier to read, and easier to maintain.
I know this is anathema, but I frankly recommend XML. It's wordy, overengineered, and kind of nasty. But it does avoid a lot of issues that other markup languages run straight into. It doesn't do weird typing stuff (it's not even type-aware), it handles nesting just fine, it's got comments.
To the last point, nickel is also nice to work with in my experience, and embeds its std, doesn't require network connection to load std from a server.... (I couldn't find an option to just embed it in the crate)
If I understood it correctly comments were deliberately not included in the spec to make people not use it as a config language. So I guess there must have been a reason before that?
Also iirc I think both features are supported by JSON5.
For small or highly personal configs (like your own code editor) it's... fine. I find it kinda tedious to edit, personally. For something like a webserver or other complex application, the lack of comments is a pretty big deal. Open the default config for nearly any server application and it will have dozens or hundreds of commented lines explaining the options or showing their default values, which is incredibly helpful but completely not possible with JSON. The lack of comments also means it's not possible to communicate to others (including your future self) why some setting is what it is inside the file itself, which, though not insurmountable, is annoying.
json specially don't have comments to don't be used like a configuration file (to don't end up like xml config atrocity). The result ? json is used as a configuration file. Why ? Because peoples are idiots.
json specially don't have comments to don't be used like a configuration file
Wrong. JSON doesn’t have comments to avoid the use of comments as parsing directives. Crockford’s literally on record stating
Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handling it to your JSON parser.
Comments were omitted from JSON to try to stop people from using it for things like human editable config. It did not stop them though, it just made things worse. Json5 seeks to remedy that.
Neither json nor yaml is remotely as robust or powerful as xml for things like configuration and general serialization. At least json has the good grace to look simple, because it is simple, and thus has a simple spec. Yaml looks simple but is as complex as XML typically is to parse properly.
You do, just not Turing-completeness with arbitrary side-effects. Look at Dhall, it's a decent mix of power that's safe to wield. It cuts down a lot on repetition.
I don't know whether you are asserting that I want "power" in a configuration format, or that I already write my config files as programs in a Turing-complete language, but I assure you, both are false :)
In fact, I want as little power and expressiveness in the configuration format as possible. I want it to be just expressive enough to describe ways that users can configure my programs, and no more. Usually, the ability to describe a mapping from strings to strings is more than enough.
Mostly, the config files are .ini files, which just describe data, and certainly aren't Turing complete.
I very much agree you generally don't want code as configuration. However defining and reusing constants, perhaps with a limited ability to compute simple functions or reference other definitions is quite benign. This does not require Turing-completeness and should disallow unrestricted recursion and side-effects. Definitely take a look at Dhall because they have considered these things in detail.
As to why you might want this, for one thing ad-hoc file inclusion mechanisms are already commonplace. Config generators and arbitrary syntax are also somewhat common once people try to shoehorn complex configuration into stuff like INI files that lack enough structure. And at that point it's hard to make illegal states unrepresentable, statically check your config or even read it properly.
However defining and reusing constants, perhaps with a limited ability to compute simple functions or reference other definitions is quite benign
i think the different concerns should not be mixed together. A config should be a config, and nothing more. The ability to compute simple values, constants and referencing should be a preprocessing language that the user chooses to use, rather than the program author's choice. E.g., they can use a templating language and build the config that they want, if they desire such features as constants etc.
Thanks - I have looked at Dhall previously, but it seems like overkill for my needs. It also is more difficult to explain the syntax of Dhall files to (technical, but not necessarily developer) end-users, whereas they are fairly comfortable with .ini-style files.
Azure Resource Manager templates are probably the worst. Pretending to be json, but you can (and must) script inside the template, referring to other templates and resources etc. And script language is neither JS nor anything familiar from before.
(I never learned those properly, only did a couple of deployments, so I might be unfair, but I have never heard any praise for them from anyone.)
XML also has problems. There's no clear distinction between the use case for attributes and child tags, which causes a lot of common cases to have two obvious implementations.
XML is easier to parse. Even with the horrible DTD feature they adopted from SGML.
From a specification perspective, XML is smaller than YAML. Most of XML's specification complexity lies in the DTD part.
Security wise they have the same problems.
When you look at parsing performance, XML has the advantage. But this shouldn't matter much, as you really do not want to have to deal with huge YAML files.
XML only deals in strings though. With YAML, JSON, TOML and all the other popular formats, you have most of the primitive types you need: strings, bools, numbers. With XML, you need to layer another spec on top to describe how the string value contained in a node is parsed as a number...
No, YAML has the bool NO as a bool. The string "NO" is a string. I hate YAML, but YAML has clear (if bad) rules about what's a string and what's a bool and what's a number.
I don't not really agree. While they do provide values of certain (illdefined) types, they are meaningless without a schema. Effective they are all just string data for the consuming application. Especially because booleans and numbers are not primitive, as they can also be null.
json did right to not include comments, to try to deter the brainrot of some people, "hey, what if we put logic into comments! Yees Awsome idea!".
but maybe we should be pleased that yaml exists, it is the perfect place for the brainrot people who want to put logic into configuration, it will keep these people contained in yaml files.
${{ each para in parameters.param }}:
${{ if and(eq(para.type, 'zip'), eq(para.b, 'll')) }}:
It was originally intended to be a standard for messages sent between systems that were also human readable. The creator wanted it to be named Javascript Message Language, but JSML was already a thing so they pivoted to Javascript Object Notation.
The original name conveys it's intented purpose much better IMO.
It does. It's just one of those things of its era that were well thought out from a capabilities and ramifications standpoint but missed the mark on usability.
XML is a markup language, not a config language, and forcing it to be a config language is wrong in exactly the same ways as forcing JSON to be a config language is wrong.
JSON was not designed for it, but it has become exceedingly useful as a data struct, having actual structures and arrays that environment files don't have. There is a problem there.
But blaming JSON for YAMLs quirks is not it. IMHO.
Nicely structured blog and interesting blogpost, perhaps better suited for r/python. Also - what's the doubt with YAML (not) being superset of JSON?
NB For all my programmatic inputs, I use JSON. If it's created and maintained by people, I would pre-convert to JSON (yq). Golang supports JSON in the standard library, C provides some very lightweight parsers. Something much harder to achieve with YAML.
Semantically, the YAML data model is a superset of the JSON data model. YAML supports all of the JSON data types, plus additional stuff like references.
Syntactically, YAML 1.2 can parse all valid JSON into the correct structures. Before version 1.2, there were a few edge cases in JSON that didn't parse with YAML, mostly involving floats and string escapes. But YAML 1.2 fixes that.
So YAML 1.2 is a superset of JSON, both in syntax and semantics.
Whether or not your YAML parser supports 1.2 is a different story. Even today, 1.1 is the more commonly supported spec.
The Norway problem has no conflict with the superset relationship. Not does the !! sigil. These syntaxes are not recognized by JSON at all. All valid JSON is valid YAML 1.2, but there is valid YAML 1.2 that fails to parse by a JSON parser. That's what a language superset is.
So you didn't read the article (like probably none of those who downvoted me):
The actual reason might be that yaml requires maps to have unique keys34, while json only recommends it35. So perhaps most json (i.e. json where objects have unique keys) is a subset of yaml. Some ambiguity remains.
I was genuinely curious if you had a take on this statement?
YAML is not a strict superset of JSON. Here's a valid JSON string that is not valid YAML:
"\uD834\uDD1E"
This is an escaped UTF-16 surrogate pair. JSON spec allows it, YAML doesn't. Just test it with different YAML implementations, results are wild (it should be a treble clef).
I was curious about this, so I dug into the specs.
JSON doesn't support \U for 32 bit Unicode code points. So to input these in JSON you must use two \u 16 bit sequences to encode a surrogate pair.
YAML 1.2 supports both \u and \U.
The YAML spec says:
Each escape sequence must be parsed into the appropriate Unicode character.
The use of the word "character" seems to support the idea that YAML does not allow surrogate pairs. In Unicode terminology, every encoded character has a code point, but not every code point encodes a character. In particular, the surrogates are code points that do not individually encode characters.
This is the only line in the spec that I can find that deals with this topic.
This also technically means that you can't use any code point that doesn't encode a Unicode character. So under this interpretation, any unassigned code point is also illegal. This smells like a bug in the spec, since strict parsing would technically be dependent on a specific Unicode version.
IMO they should change "character" to "code point" and add a clarifying line about handling surrogates.
But yeah, I think there is a good argument that YAML doesn't support surrogate escape sequences, and that argument boils down to a single word in the spec.
(I'm only concerned about the spec here, since YAML is defined by spec not by implementation.)
You mentioned all the relevant points. My emphasis would be more on the semantics of escaped surrogates, since implementations today do not reject them, so changing that one word would just be adapting to reality. The „clarifying line about handling surrogates“ is the important thing, because if the spec just allowed any „code point“, the JSON superset proclamation still does not hold semantically.
I've seen this kind of thing before, and although it's definitely a real problem with YAML, it's also seems a bit artificial to me. Like, in the example given here they input a YAML file, which is then parsed without any context. They then output a similar file to what they started with. Is that how people actually use YAML?
I've used YAML myself - because I like that it is so easy to read and write manually. This problem with ambiguous types is a non-issue for me, because the code that reads the yaml data into the program's variables knows what type the variables are. NO cannot be mistaken as as false, because its getting read into a string, not a bool.
I guess maybe other use cases may involve reading YAML without knowing what kind of data to expect, and so then these problems are real. But I'm just not sure why someone would want to use YAML like that - and so the problem seems artificial to me. (But obviously, since these criticisms keep popping up, a lot of other people do use YAML like that. I suppose they must have their reasons.)
I've seen this kind of thing before, and although it's definitely a real problem with YAML, it's also seems a bit artificial to me. Like, in the example given here they input a YAML file, which is then parsed without any context. They then output a similar file to what they started with. Is that how people actually use YAML?
So I actually ran into this general class of problem with live code just a few weeks ago. For reasons that frankly rhyme with "questionable design", I had a program outputting a YAML file that was then being read as the input of another program. And this worked fine for a while. Then I added another variable and the whole thing broke.
Turned out the problem is that Program 1 was writing the file with ruamel, and Program 2 was reading it with pyyaml. And the file contained the string "1:4:0", which ruamel had dutifully serialized without quotes because why the fuck would you need quotes for that.
And then pyyaml parsed it as the integer 3840.
Because it turns out YAML 1.1 includes sexagesimal base-60 number literals for some godforsaken reason and so if you ever write a string consisting of numbers separated by commas you need to put it in quotes so that pyyaml doesn't turn it into an insane integer.
And ruamel writes YAML 1.2, so it hadn't bothered doing that; sexagesimal number literals were removed from 1.2.
YAML sucks, and it's just a matter of time until it bites you too.
because the code that reads the yaml data into the program's variables knows what type the variables are
because the code that reads the yaml data into the program's variables knows what type the variables are. NO cannot be mistaken as as false, because its getting read into a string, not a bool.
So,
title: Nonoverse
description: Beautiful puzzle game about nonograms.
countries:
- DE
- FR
- PL
- RO
Say you have a model
class configData
{
string title;
string description;
List<string> countries;
}
then doing a
Yaml.Parse<configData>(theYamlFromAbove)
Will return an instance of the configData class with the countries list containing the word "False" as a "country"
(Assuming the yaml parsing library is using the old spec)
So unless you're always writing your own parsing code, like doing some sort of
the code that reads the yaml data into the program's variables knows what type the variables are. NO cannot be mistaken as as false, because its getting read into a string, not a bool.
So, then you get "false". Congrats? /s
I mean.. this is the issue with dynamic typing and type coercion; not just YAML. YAML is just another example of this kind of issue because normally folks have a YOLO WCGW attitude and don't bother with schemas or other static validation.
And then we get what we "paid" for.... Not too surprising, very common, and although this example may seem contrived it's hardly artificial in the wild. This kind of thing happens a lot.
the norway problem is the funny one, but the variant that actually bit me in production was 'on' being parsed as boolean true in a kubernetes selector key. nobody had heard of yaml 1.1's 22 truthy values until that morning.
Merge keys were never part of the spec. They were in the type registry for YAML 1.1, which did not get updated for YAML 1.2. The spec doesn't require supporting the definitions in the type registry.
Also, 1.2 was released July 2009. The first commit to the semver.org repository was made in December 2009. Obviously the idea of semantic version is older than the website, but it was definitely not well-defined back then.
why did they remove merge keys of all things? Those tended to be useful for complicated configuration to reduce duplication without needing some special per-application handling.
While pyyaml is indeed stuck on 1.1, it has had commits (granted, not releases) within the last year, and the C library it wraps had had commits in the last within the last couple of weeks. "Unmaintained" may be overstating things.
Yeah so this article is just wrong. On multiple accounts.
I've personally been meaning to write an in-depth blog post about YAML's spec and the implicit typing rules, and I've been digging through the actual old mailing list. Fact is, this topic is far more nuanced and interesting than this article gives it credit for. Maybe I'll finish that blog post someday...
The extent of research done here is linking to whatever archive.org snapshots they could find, and using them as a source of truth. As an example, the article clearly asserts that YAML 1.0 allowed + and - as boolean values. The source? was invalidated less than 2 weeks later.
jletourneau@reddit
Ontario is another one that hits this problem. The truthiest province.
plg94@reddit
The implicit typing is just the worst. It works 95% of the time and looks "clean" in all examples, but it make for sooo many edge cases.
I have one where I need to store zip codes and telephone numbers. Sometimes these begin with a 0. Apparently in YAML1.1 this is then treated like an octal number and silently converted, meaning I don't get an error but a slightly different zip code. just great.
I defaulted to quote just everything, but once I get a chance to rewrite that script I'm gonna ditch yaml.
simonask_@reddit
Also, just as a general word of caution, zip codes / post codes should never be treated as numbers. They are codes, and should be treated as opaque sequences of characters.
plg94@reddit
yeah i know. but apart from a proper database I haven't found a config file format where I can easily define datatypes (like in "this key is always a string, that one always a positive int" etc.).
jmikola@reddit
Have you considered a corresponding JSON schema? I previously used that for a unified test format, which was basically functional tests for client libraries in various languages expressed in YAML and validated against a schema. We also converted the YAML to JSON to easier parsing, but there was no issue validating the YAML directly.
plg94@reddit
I know about JSON schema(ta) but haven't had the chance to play around with it. I wanted to avoid JSON because the files were meant to be human writable (and humans make mistakes, hence the need for strong types and validators).
But the project sprawled out since my first (very naive) implementation, so I think the real solution would actually be a proper database backend. But thanks for the suggestion.
garfieldevans@reddit
doug snickers
Goodie__@reddit
As a solid YAML hater: This gets posted every few years, and it's great every time.
But also: This person got it right many years ago, this isn't the Norway problem, it's a lack of foresight and thinking on YAMLs problem. This is why standards are hard, because in an attempt to have syntax sugar (yes/no for true/false) we end up overriding countries.
Successful-Money4995@reddit
Is it somewhat json's fault? If json had comments, maybe no one would have invented yaml?
Delta-9-@reddit
Tbh JSON is perfectly fine if you're using it for what it was intended for: serializing data over the wire.
JSON only sucks when people try to use it as a configuration format. It was never meant for configuration. It didn't need comments because it was only ever supposed to encode data that would last as long as a single TCP session. Then along came Sensu and LSP, taking "JS object notation" way too literally, and now we're all fucked with config files that don't parse if you put a comment in them and a syntax only slightly less painful to write than XML.
It's not really JSON's fault that people have abused it for things it wasn't meant to do. But yes, the limitations of JSON as a config format probably are a proximal cause for YAML existing in the first place.
Turns out, people like tree-shaped data expressed parsimoniously, and YAML is great at that. Arguably, it's even better than TOML for expressing trees, though I'd be among the first to say that TOML is better in many respects.
OrcaFlux@reddit
I would say great. It's mediocre at best. It would be great if the tree structure wasn't based on whitespace.
Delta-9-@reddit
If you're using JSON or XML for config, you're indenting your data to visually show the structure, anyway. Why let whitespace live in your config without paying rent?
OrcaFlux@reddit
What I said has nothing to do with visualization and everything to do with parsing.
Delta-9-@reddit
Unless you're writing the parser, does it matter? If you are writing the parser... why, when there are numerous open source parsers out there already?
OrcaFlux@reddit
You're still missing my point entirely.
HansDieterVonSiemens@reddit
Hey, you can still stop me from using JSON for the config file of my current project. Which file format you suggest that is human readable, I can effortlessly read/save as a python dict and where I can make comments?
Delta-9-@reddit
Honestly, TOML, so long as you don't have a lot of dicts-of-lists. TOML becomes cumbersome with nested structures, where YAML remains at exactly the same pain level regardless of nesting. But, if you're nesting your config to that degree, you're probably doing config wrong.
If you do have a lot of nesting and you absolutely can't get rid of it, YAML is easier to write, easier to read, and easier to maintain.
mort96@reddit
100% disagree, YAML is a much worse experience to write than JSON. I always have to Google how to do even simple things like a list of dictionaries and how the dashes are supposed to be indented. The result is around 10 different options, ensuring I won't remember the "right way" for next time.
Delta-9-@reddit
There are two ways to write a list of dictionaries:
If you want ten ways to write something, you want "scalars" (strings), which come in several flavors: literals, blocks without folding or chomps, blocks with chomps but no folding, blocks without chomps but with folding, and blocks with folding and chomps, and all the various directives to control chomping.
If you're google-fu doesn't get you to the right answer, that's a skill issue, not a yaml issue. YAML absolutely has flaws, but your not finding the right answer isn't one of them.
mort96@reddit
You forgot another way to write a list of dictionaries:
Delta-9-@reddit
Three is still less than ten.
_tskj_@reddit
I think people dislike writing JSON just because there's more structure in the form of syntactical characters like quotes and brackets, but that's only a problem if you've not invested in learning structural editing (such as vim motions or similar); if you have, it's great!
UltraPoci@reddit
With TOML 1.1, it's easier to deal with nested structures.
Delta-9-@reddit
I wasn't aware it had an update recently, but I see it's still using dynamic scope for nested structures... How did it get easier?
masklinn@reddit
Inline tables (json style objects / map) can be written over multiple lines. In TOML 1.0, they were limited to a single line.
This made deeper structures interspersing tables and arrays horrible, as toplevel tables are not the clearest when you start mixing arrays and tables in a non-trivial manner.
Now you can essentially embed json(5) in your toml (the syntax is not identical and it doesn’t have nulls, but it’s pretty close and a bit less verbose since names are unquoted, and it allows trailing commas).
ZorbaTHut@reddit
I know this is anathema, but I frankly recommend XML. It's wordy, overengineered, and kind of nasty. But it does avoid a lot of issues that other markup languages run straight into. It doesn't do weird typing stuff (it's not even type-aware), it handles nesting just fine, it's got comments.
tukanoid@reddit
To the last point, nickel is also nice to work with in my experience, and embeds its std, doesn't require network connection to load std from a server.... (I couldn't find an option to just embed it in the crate)
simonask_@reddit
Somebody suggested TOML, which is great, but I'm also personally a big fan of KDL. It's very, very readable.
ryncewynd@reddit
isnt there JSONC?
edgmnt_net@reddit
Dhall is far more powerful and reasonable, although I'm not sure how widely supported it is.
DonRobo@reddit
What's wrong with using json as a config format? I use it for a lot of my personal tools and I've always enjoyed working with it.
It's super easy to read, easy to edit, easy to parse, easy to understand.
Lonsdale1086@reddit
It doesn't officially support comments, which makes it annoying to have like:
To just be able to comment out and switch between them.
Or leave notes like
And also it's slightly mixed in how you can wrap strings/format data etc, some rules you've got to remember.
I still use it just fine though, it's my go-to for config files.
DonRobo@reddit
If I understood it correctly comments were deliberately not included in the spec to make people not use it as a config language. So I guess there must have been a reason before that?
Also iirc I think both features are supported by JSON5.
Delta-9-@reddit
For small or highly personal configs (like your own code editor) it's... fine. I find it kinda tedious to edit, personally. For something like a webserver or other complex application, the lack of comments is a pretty big deal. Open the default config for nearly any server application and it will have dozens or hundreds of commented lines explaining the options or showing their default values, which is incredibly helpful but completely not possible with JSON. The lack of comments also means it's not possible to communicate to others (including your future self) why some setting is what it is inside the file itself, which, though not insurmountable, is annoying.
josefx@reddit
As long as you remember things like storing larger numbers as strings.
PaintItPurple@reddit
TOML has fewer misfeatures, but YAML is generally easier to understand the structure of at a glance.
iamapizza@reddit
For some structures, yes.
For where it gets used the most, the k8s world, it's hell.
florinp@reddit
json specially don't have comments to don't be used like a configuration file (to don't end up like xml config atrocity). The result ? json is used as a configuration file. Why ? Because peoples are idiots.
Absolute_Enema@reddit
The alternatives were enterprise quality sexpr.
masklinn@reddit
Wrong. JSON doesn’t have comments to avoid the use of comments as parsing directives. Crockford’s literally on record stating
Magneon@reddit
Comments were omitted from JSON to try to stop people from using it for things like human editable config. It did not stop them though, it just made things worse. Json5 seeks to remedy that.
Neither json nor yaml is remotely as robust or powerful as xml for things like configuration and general serialization. At least json has the good grace to look simple, because it is simple, and thus has a simple spec. Yaml looks simple but is as complex as XML typically is to parse properly.
phlummox@reddit
But I don't want "power" in a configuration format, else I'd write all my config files as programs in a Turing-complete language.
edgmnt_net@reddit
You do, just not Turing-completeness with arbitrary side-effects. Look at Dhall, it's a decent mix of power that's safe to wield. It cuts down a lot on repetition.
phlummox@reddit
I don't know whether you are asserting that I want "power" in a configuration format, or that I already write my config files as programs in a Turing-complete language, but I assure you, both are false :)
In fact, I want as little power and expressiveness in the configuration format as possible. I want it to be just expressive enough to describe ways that users can configure my programs, and no more. Usually, the ability to describe a mapping from strings to strings is more than enough.
Mostly, the config files are .ini files, which just describe data, and certainly aren't Turing complete.
edgmnt_net@reddit
I very much agree you generally don't want code as configuration. However defining and reusing constants, perhaps with a limited ability to compute simple functions or reference other definitions is quite benign. This does not require Turing-completeness and should disallow unrestricted recursion and side-effects. Definitely take a look at Dhall because they have considered these things in detail.
As to why you might want this, for one thing ad-hoc file inclusion mechanisms are already commonplace. Config generators and arbitrary syntax are also somewhat common once people try to shoehorn complex configuration into stuff like INI files that lack enough structure. And at that point it's hard to make illegal states unrepresentable, statically check your config or even read it properly.
Chii@reddit
i think the different concerns should not be mixed together. A config should be a config, and nothing more. The ability to compute simple values, constants and referencing should be a preprocessing language that the user chooses to use, rather than the program author's choice. E.g., they can use a templating language and build the config that they want, if they desire such features as constants etc.
phlummox@reddit
Thanks - I have looked at Dhall previously, but it seems like overkill for my needs. It also is more difficult to explain the syntax of Dhall files to (technical, but not necessarily developer) end-users, whereas they are fairly comfortable with .ini-style files.
simonask_@reddit
If you like .ini files, you will absolutely love TOML.
didzisk@reddit
Azure Resource Manager templates are probably the worst. Pretending to be json, but you can (and must) script inside the template, referring to other templates and resources etc. And script language is neither JS nor anything familiar from before.
(I never learned those properly, only did a couple of deployments, so I might be unfair, but I have never heard any praise for them from anyone.)
tukanoid@reddit
Coughs in nix and nickel
levir@reddit
XML also has problems. There's no clear distinction between the use case for attributes and child tags, which causes a lot of common cases to have two obvious implementations.
elmuerte@reddit
XML is easier to parse. Even with the horrible DTD feature they adopted from SGML.
From a specification perspective, XML is smaller than YAML. Most of XML's specification complexity lies in the DTD part.
Security wise they have the same problems.
When you look at parsing performance, XML has the advantage. But this shouldn't matter much, as you really do not want to have to deal with huge YAML files.
mort96@reddit
XML only deals in strings though. With YAML, JSON, TOML and all the other popular formats, you have most of the primitive types you need: strings, bools, numbers. With XML, you need to layer another spec on top to describe how the string value contained in a node is parsed as a number...
tobiasvl@reddit
Except that the string "NO" is a bool
mort96@reddit
No, YAML has the bool
NOas a bool. The string"NO"is a string. I hate YAML, but YAML has clear (if bad) rules about what's a string and what's a bool and what's a number.elmuerte@reddit
I don't not really agree. While they do provide values of certain (illdefined) types, they are meaningless without a schema. Effective they are all just string data for the consuming application. Especially because booleans and numbers are not primitive, as they can also be null.
Valid JSON/YAML. But not a lot of fun for the consuming application.
At least JSON makes is rather explicit when something is a String. In YAML however.
oldsecondhand@reddit
Whats wrong with DTD (besides having fewer features than XSD)? It's so much nicer to read than XSD.
Mognakor@reddit
Never worked with DTD, but i like XSD for the simple code generation i can get with maven plugins.
masklinn@reddit
Comments were omitted from JSON to avoid their use as directives for parsing / interpretation.
JSON was never intended to be edited by hand in the first place, it was discovered as a data interchange format between computer.
tilitatti@reddit
json did right to not include comments, to try to deter the brainrot of some people, "hey, what if we put logic into comments! Yees Awsome idea!".
but maybe we should be pleased that yaml exists, it is the perfect place for the brainrot people who want to put logic into configuration, it will keep these people contained in yaml files.
o.o
Successful-Money4995@reddit
But editing xml sucks. People don't want that!
If json was not meant for human eyes then why not just keep using xml? What purpose was it supposed to solve?
rwinger3@reddit
It was originally intended to be a standard for messages sent between systems that were also human readable. The creator wanted it to be named Javascript Message Language, but JSML was already a thing so they pivoted to Javascript Object Notation. The original name conveys it's intented purpose much better IMO.
Magneon@reddit
It does. It's just one of those things of its era that were well thought out from a capabilities and ramifications standpoint but missed the mark on usability.
Delta-9-@reddit
XML is a markup language, not a config language, and forcing it to be a config language is wrong in exactly the same ways as forcing JSON to be a config language is wrong.
Absolute_Enema@reddit
Truly a story as old as time, making things uneieldy on purpose only ends up creating unnecessary pain in times of need.
Goodie__@reddit
JSON was not designed for it, but it has become exceedingly useful as a data struct, having actual structures and arrays that environment files don't have. There is a problem there.
But blaming JSON for YAMLs quirks is not it. IMHO.
Jhuyt@reddit
Yaml is a fair bit older than JSON, and was mostly inspired as an alternative to xml IIRC
newpua_bie@reddit
It used to be such a big problem for Yesmen that actually renamed the country and dropped the s
robidaan@reddit
Iso 3166 alpha 3
esiy0676@reddit
Nicely structured blog and interesting blogpost, perhaps better suited for r/python. Also - what's the doubt with YAML (not) being superset of JSON?
NB For all my programmatic inputs, I use JSON. If it's created and maintained by people, I would pre-convert to JSON (yq). Golang supports JSON in the standard library, C provides some very lightweight parsers. Something much harder to achieve with YAML.
cbarrick@reddit
YAML 1.2 is a strict superset of JSON.
Semantically, the YAML data model is a superset of the JSON data model. YAML supports all of the JSON data types, plus additional stuff like references.
Syntactically, YAML 1.2 can parse all valid JSON into the correct structures. Before version 1.2, there were a few edge cases in JSON that didn't parse with YAML, mostly involving floats and string escapes. But YAML 1.2 fixes that.
So YAML 1.2 is a superset of JSON, both in syntax and semantics.
Whether or not your YAML parser supports 1.2 is a different story. Even today, 1.1 is the more commonly supported spec.
Tubbles_@reddit
Did you read the article? It eludes to why yaml might not be a superset of json after all
cbarrick@reddit
The Norway problem has no conflict with the superset relationship. Not does the
!!sigil. These syntaxes are not recognized by JSON at all. All valid JSON is valid YAML 1.2, but there is valid YAML 1.2 that fails to parse by a JSON parser. That's what a language superset is.Tubbles_@reddit
So you didn't read the article (like probably none of those who downvoted me):
I was genuinely curious if you had a take on this statement?
flyx86@reddit
YAML is not a strict superset of JSON. Here's a valid JSON string that is not valid YAML:
This is an escaped UTF-16 surrogate pair. JSON spec allows it, YAML doesn't. Just test it with different YAML implementations, results are wild (it should be a treble clef).
cbarrick@reddit
I was curious about this, so I dug into the specs.
JSON doesn't support
\Ufor 32 bit Unicode code points. So to input these in JSON you must use two\u16 bit sequences to encode a surrogate pair.YAML 1.2 supports both
\uand\U.The YAML spec says:
The use of the word "character" seems to support the idea that YAML does not allow surrogate pairs. In Unicode terminology, every encoded character has a code point, but not every code point encodes a character. In particular, the surrogates are code points that do not individually encode characters.
This is the only line in the spec that I can find that deals with this topic.
This also technically means that you can't use any code point that doesn't encode a Unicode character. So under this interpretation, any unassigned code point is also illegal. This smells like a bug in the spec, since strict parsing would technically be dependent on a specific Unicode version.
IMO they should change "character" to "code point" and add a clarifying line about handling surrogates.
But yeah, I think there is a good argument that YAML doesn't support surrogate escape sequences, and that argument boils down to a single word in the spec.
(I'm only concerned about the spec here, since YAML is defined by spec not by implementation.)
flyx86@reddit
You mentioned all the relevant points. My emphasis would be more on the semantics of escaped surrogates, since implementations today do not reject them, so changing that one word would just be adapting to reality. The „clarifying line about handling surrogates“ is the important thing, because if the spec just allowed any „code point“, the JSON superset proclamation still does not hold semantically.
cbarrick@reddit
YAML 1.2 supports this.
https://yaml.org/spec/1.2.2/#57-escaped-characters
blind3rdeye@reddit
I've seen this kind of thing before, and although it's definitely a real problem with YAML, it's also seems a bit artificial to me. Like, in the example given here they input a YAML file, which is then parsed without any context. They then output a similar file to what they started with. Is that how people actually use YAML?
I've used YAML myself - because I like that it is so easy to read and write manually. This problem with ambiguous types is a non-issue for me, because the code that reads the yaml data into the program's variables knows what type the variables are.
NOcannot be mistaken as asfalse, because its getting read into a string, not a bool.I guess maybe other use cases may involve reading YAML without knowing what kind of data to expect, and so then these problems are real. But I'm just not sure why someone would want to use YAML like that - and so the problem seems artificial to me. (But obviously, since these criticisms keep popping up, a lot of other people do use YAML like that. I suppose they must have their reasons.)
ZorbaTHut@reddit
So I actually ran into this general class of problem with live code just a few weeks ago. For reasons that frankly rhyme with "questionable design", I had a program outputting a YAML file that was then being read as the input of another program. And this worked fine for a while. Then I added another variable and the whole thing broke.
Turned out the problem is that Program 1 was writing the file with ruamel, and Program 2 was reading it with pyyaml. And the file contained the string "1:4:0", which ruamel had dutifully serialized without quotes because why the fuck would you need quotes for that.
And then pyyaml parsed it as the integer 3840.
Because it turns out YAML 1.1 includes sexagesimal base-60 number literals for some godforsaken reason and so if you ever write a string consisting of numbers separated by commas you need to put it in quotes so that pyyaml doesn't turn it into an insane integer.
And ruamel writes YAML 1.2, so it hadn't bothered doing that; sexagesimal number literals were removed from 1.2.
YAML sucks, and it's just a matter of time until it bites you too.
Not in a duck-typed language!
blind3rdeye@reddit
That's pretty funny I reckon. Probably annoying and frustrating too - but also funny.
I suppose another advantage I have is that I'm not doing anything important to really care if something goes wrong.
Lonsdale1086@reddit
So,
Say you have a model
then doing a
Will return an instance of the configData class with the countries list containing the word "False" as a "country"
(Assuming the yaml parsing library is using the old spec)
So unless you're always writing your own parsing code, like doing some sort of
Then this issue can't be avoided for the flawed version of the library.
vplatt@reddit
So, then you get "false". Congrats? /s
I mean.. this is the issue with dynamic typing and type coercion; not just YAML. YAML is just another example of this kind of issue because normally folks have a YOLO WCGW attitude and don't bother with schemas or other static validation.
And then we get what we "paid" for.... Not too surprising, very common, and although this example may seem contrived it's hardly artificial in the wild. This kind of thing happens a lot.
Absolute_Enema@reddit
If you're going to make a validation problem into a typing problem, this is a problem of weak typing, not dynamic typing.
simonask_@reddit
More people need to know about KDL. It's awesome and cute.
boiledbarnacle@reddit
no
dml997@reddit
false
drcforbin@reddit
Norway
Optimal-Savings-4505@reddit
no way
chucker23n@reddit
We made it up
boiledbarnacle@reddit
Not this time
amroamroamro@reddit
not yes
KandevDev@reddit
the norway problem is the funny one, but the variant that actually bit me in production was 'on' being parsed as boolean true in a kubernetes selector key. nobody had heard of yaml 1.1's 22 truthy values until that morning.
kairos@reddit
IIRC, one that I ran into a couple of times with k8s was when a random string (like a short hash) turned out to be all numbers.
We quickly learned that even though it's not required, it's a good idea to put quotes around strings in YAML.
JonathanTheZero@reddit
And I thought the Norway problem sas that you had two different standards of the same language that both get maintained lmao (like Nynorsk and Bokmål)
Trang0ul@reddit
A blatant repost: https://www.reddit.com/r/programming/comments/1qaroyn/yaml_thats_norway_problem/
chucker23n@reddit
Reposts are explicitly allowed.
arj-co@reddit
Very Interesting!
TheBrokenRail-Dev@reddit
IMO one big issue is Merge Keys. They are an extremely powerful tool for reducing duplicated code (and are therefore great for configurations).
They were also removed in YAML 1.2. IMO this is probably one of the reasons behind 1.2's lack of momentum.
max123246@reddit
Why is it called 1.2 if it removes a feature? That's a breaking change is it not. I guess they don't use sem ver?
flyx86@reddit
Merge keys were never part of the spec. They were in the type registry for YAML 1.1, which did not get updated for YAML 1.2. The spec doesn't require supporting the definitions in the type registry.
Also, 1.2 was released July 2009. The first commit to the semver.org repository was made in December 2009. Obviously the idea of semantic version is older than the website, but it was definitely not well-defined back then.
max123246@reddit
Ah right, I forgot how old YAML is at this point
uasi@reddit
No they don't. Between 1.1 and 1.2 there're breaking changes here and there, as well as between 1.0 and 1.1
shinyfootwork@reddit
why did they remove merge keys of all things? Those tended to be useful for complicated configuration to reduce duplication without needing some special per-application handling.
transfire@reddit
Have a look at https://github.com/trans/yam
quetzalcoatl-pl@reddit
- is it a norway problem?
- NO
CrackerJackKittyCat@reddit
Ah, this generation's New England ZIP codes in CSV vs Excel.
gimpwiz@reddit
I'm going to store phone numbers as an integer! Probably int(11).
somethingworthwhile@reddit
USGS streamgage numbers….. UGH.
CalBearFan@reddit
Hey now, Puerto Rico has the 00 postal code issue as well!
alenym@reddit
LOL.
rminsk@reddit
Don't use `PyYAML`. It is no longer maintained and only supports YAML 1.1. Try a different library like `
ruamel.yaml\that supports YAML 1.2.`Delta-9-@reddit
While pyyaml is indeed stuck on 1.1, it has had commits (granted, not releases) within the last year, and the C library it wraps had had commits in the last within the last couple of weeks. "Unmaintained" may be overstating things.
mort96@reddit
Commits don't matter, releases do.
Delta-9-@reddit
Unmaintained projects don't get releases or commits.
RiotBoppenheimer@reddit
It gets even better when you realize that ruamel.yaml does not support Python 3.14 (yet).
Pjb3005@reddit
Yeah so this article is just wrong. On multiple accounts.
I've personally been meaning to write an in-depth blog post about YAML's spec and the implicit typing rules, and I've been digging through the actual old mailing list. Fact is, this topic is far more nuanced and interesting than this article gives it credit for. Maybe I'll finish that blog post someday...
The extent of research done here is linking to whatever archive.org snapshots they could find, and using them as a source of truth. As an example, the article clearly asserts that YAML 1.0 allowed
+and-as boolean values. The source? was invalidated less than 2 weeks later.starm4nn@reddit
I'd be happy to read it, but I feel like the very problem with YAML is that it needs "nuance".
PatagonianCowboy@reddit
>pip install and not uv install
ok bro
its_a_gibibyte@reddit
pip is the default package manager, so it's a reasonable default to use.
gmes78@reddit
pip installis nothing more than a noob trap. It just causes issues with dependency tracking.Delta-9-@reddit
Guess I'm a noob for using
pip installin production, without issue, for going on 10 years.Suspicious-Basis-885@reddit
Every time I touch YAML I gain a new appreciation for boring explicit JSON.
The fact that a country can accidentally become a boolean feels like a prank that escaped containment.
mektel@reddit
A few years ago I started using toml whenever I can.
Nixinova@reddit
Tldr yank already fixed this ages ago in v1.2... but lots of tooling doesn't want to support 1.2. So it is our problem, not yaml's.
tumes@reddit
100% my metric for someone’s credible seniority with (generally pre-node) frameworks. It’s an experience everyone should have to deal with.
stanley604@reddit
I think the Scunthorpe problem is worse.