Decade old defaults in parsers when people still thought remote loading schemas was a good idea. Meanwhile json schemas are repeating that same mistake.
It's really verbose. Also, it's painfully complicated, and a huge amount of its functionality is rarely-to-never used, and yet still provides security holes; see stuff like XXE attacks.
yeah strictyaml itself is kinda weird, at first glance it looks like a validator taped to an off-the-shelf yaml parser, instead of its own data format. interesting way to make triply sure you accept a strict subset of yaml, at least? maybe that way this library contributes less to data format proliferation?
aside of the usual YAML hilarity, IIRC, 556474e378 is 556474e378, not Infinity. As far as I know, there is an infinite amount of of numbers larger than 556474e378 before Infinity.
Silently changing data like this, YAML mess notwithstanding, is horrifying to me.
Seeing that they're using short hashes, there is a funny case of birthday problem in this. Using the exponential approximation from Wikipedia, it can be seen that with 5 bytes of hash (10 hex chars) and a hypothetical 1.2 million commits, there is as much as a 48 % chance of collision between any two commits. Of course, most projects never go that far, but it happens to be how many commits are in the Linux kernel (judging by gh), so it's not impossible. So I guess I'm encouraging using the full hash, not least when you're not typing it anyway but using some template system (as in OP).
This is for comparing their production client version to their production server version, though. So even if they reach 1.2 million commits at some point, I highly doubt they get anywhere near that many production deployments - especially if they are forcing clients to upgrade for each one. Even if they release weekly for 5 years, their chance of collision only reaches ~0.0001%. And even if that happens, it is only a problem if someone still has the client from the first release in the hash collision, hasn't updated between then and the current release, and now tries to use it.
I would agree if you said jsonc, or "json with comments". Or, and bear with me here, Microsoft's bicep might be nearly the right middle ground between json and yaml.
Yeah, I kinda like TOML, but only if your configuration isn't nested much, or at all. I think there's an argument to be made here for keeping your configuration as simple as possible (but no simpler).
Yaml, except for indentation, is as minimal as possible. No braces, so you deal with data only; no extraneous quotes; just pure names and data. Not to mention any time you deal with a multi-line text.
You loose precision, which is a fair argument. There are quirks ( on vs "on"). But the XML offers so much noise, that it's hardly readable. TOML requires typing-full-paths-non-stop. JSON is verbose and fails at multiline; or arrays.
right! but our bug was that we didn't quote a value that could potentially, even only through a very low chance, be interpreted as a number by a YAML parser
I mean, a git hash is a number, a long hexadecimal number. Probably YAML isn't treating "deadbeef" or "f00d" as numbers, but if you had just decimal numbers? Probably that's happened before, but you got away with it because "123456" compares the same whether it's a string or a number.
While treating it as a string works well enough, a git hash is not a string. It's a hexadecimal number. Stuffing 0x in front would have worked, until you got to 14/16 digit short hashes, at which point you would have probably hit a range bug. But that's unlikely - last I checked, even the Linux kernel used 12 digit short hashes.
How can you tell a senior yaml engineer apart from a junior yaml engineer? The senior yaml engineer will defensively quote all the strings.
(I have a blog where I try to write interesting in-depth posts, and then one time I wrote a rant about yaml, and it gets more views than any of my other posts combined.)
Yaml is mostly used for configuration, where values are typed manually and the user has syntax highlighting on. Most of these problems disappear when viewed in that context. However it's true that it's not well suited for templating and that sadly the ecosystem never fully adopted Yaml 1.2, even though it's super old at this point. I feel like it should be easy enough to migrate though, since i assume nobody actually uses sexagesimal numbers (and I think the norway problem hasn't been real for some time).
Json has a spec that's like 5 pages and yaml has one that's excessively long and has multiple versions and is loaded with little landmines like this one.
scratchisthebest@reddit
that thing the strictyaml dev named "syntax typing" claims another victim
ZorbaTHut@reddit
Yeah, YAML is on my permanent blacklist for this. If I can avoid using it, I avoid using it. It's just waiting to blow up on you.
I'd tolerate strictyaml if it had better support in multiple languages, but it doesn't.
It's kind of depressing that I still use XML because it's less awful than the alternatives.
jherico@reddit
Remember, every valid JSON document is also valid YAML.
bobbsec@reddit
What's wrong with XML? It's verbose but clear.
Worth_Trust_3825@reddit
Decade old defaults in parsers when people still thought remote loading schemas was a good idea. Meanwhile json schemas are repeating that same mistake.
ZorbaTHut@reddit
It's really verbose. Also, it's painfully complicated, and a huge amount of its functionality is rarely-to-never used, and yet still provides security holes; see stuff like XXE attacks.
scratchisthebest@reddit
yeah strictyaml itself is kinda weird, at first glance it looks like a validator taped to an off-the-shelf yaml parser, instead of its own data format. interesting way to make triply sure you accept a strict subset of yaml, at least? maybe that way this library contributes less to data format proliferation?
regardless i like the articles though :]
pilif@reddit
aside of the usual YAML hilarity, IIRC,
556474e378
is556474e378
, notInfinity
. As far as I know, there is an infinite amount of of numbers larger than556474e378
beforeInfinity
.Silently changing data like this, YAML mess notwithstanding, is horrifying to me.
rdtsc@reddit
IEEE 754 says no. You don't get numbers with arbitrary precision.
tanorbuf@reddit
Seeing that they're using short hashes, there is a funny case of birthday problem in this. Using the exponential approximation from Wikipedia, it can be seen that with 5 bytes of hash (10 hex chars) and a hypothetical 1.2 million commits, there is as much as a 48 % chance of collision between any two commits. Of course, most projects never go that far, but it happens to be how many commits are in the Linux kernel (judging by gh), so it's not impossible. So I guess I'm encouraging using the full hash, not least when you're not typing it anyway but using some template system (as in OP).
jdmetz@reddit
This is for comparing their production client version to their production server version, though. So even if they reach 1.2 million commits at some point, I highly doubt they get anywhere near that many production deployments - especially if they are forcing clients to upgrade for each one. Even if they release weekly for 5 years, their chance of collision only reaches ~0.0001%. And even if that happens, it is only a problem if someone still has the client from the first release in the hash collision, hasn't updated between then and the current release, and now tries to use it.
rdtsc@reddit
But still, what do you gain by saving a few bytes here? And if you say bandwidth, make the key shorter.
gredr@reddit
That's not a Git bug, or a hash bug, or even really a bug at all. That's a YAML feature. Yay YAML!
hikemhigh@reddit (OP)
the bug was that the TeamCity job didn't have quotes surrounding the value injected into the YAML
gredr@reddit
It's not a bug, it's a feature that YAML allows you to have unquoted strings. That's how you know YAML is so much better than JSON.
Venthe@reddit
Laugh all you want, the readability and the time yaml has saved me over the years paid off in droves. It's great at what it does
gredr@reddit
I disagree. It's a lousy format, very easy to get wrong, and very easy to be wrong but look correct.
XML is verbose, but at least it's easy to verify correctness.
jherico@reddit
Xml has its own set of foibles related to schema namespaces and escaping characters. Json is the ideal data format for most small config files
gredr@reddit
I would agree if you said jsonc, or "json with comments". Or, and bear with me here, Microsoft's bicep might be nearly the right middle ground between json and yaml.
jaskij@reddit
YAML is easy to read, XML is easy to write, TOML does both but doesn't handle nesting well. Choose your poison, as usual.
My favorite bit of XML is curves in SVG. It's basically turtle programming stuffed into an XML attribute.
TarMil@reddit
My main beef with TOML is that it has the worst array syntax I've ever seen.
sparr@reddit
Yeah? So you'd have no problem recognizing that
gameServerVersion: 556474e378
is a very large number?gredr@reddit
Yeah, I kinda like TOML, but only if your configuration isn't nested much, or at all. I think there's an argument to be made here for keeping your configuration as simple as possible (but no simpler).
Worth_Trust_3825@reddit
in what world yaml saves time, and is readable?
Venthe@reddit
In a real world.
Yaml, except for indentation, is as minimal as possible. No braces, so you deal with data only; no extraneous quotes; just pure names and data. Not to mention any time you deal with a multi-line text.
You loose precision, which is a fair argument. There are quirks (
on
vs"on"
). But the XML offers so much noise, that it's hardly readable. TOML requires typing-full-paths-non-stop. JSON is verbose and fails at multiline; or arrays.I'll stay with YAML, thank you very much.
bzbub2@reddit
you dropped this
/s
mr_birkenblatt@reddit
only with JSON you need an explicity /s. YAML allows you to use sarcasm without indication. That's how you know YAML is so much better than JSON.
sparr@reddit
No, with YAML it's implicit!
gredr@reddit
Yeah, I'd hope that was obvious?
hikemhigh@reddit (OP)
right! but our bug was that we didn't quote a value that could potentially, even only through a very low chance, be interpreted as a number by a YAML parser
KittensInc@reddit
See also: the Norway problem.
SpaceMonkeyAttack@reddit
I mean, a git hash is a number, a long hexadecimal number. Probably YAML isn't treating "deadbeef" or "f00d" as numbers, but if you had just decimal numbers? Probably that's happened before, but you got away with it because "123456" compares the same whether it's a string or a number.
theeth@reddit
Should have prefixed with 0x and not quote it then!
SpaceMonkeyAttack@reddit
... it'd probably work
gredr@reddit
Don't feel bad, YAML is definitely NOT a "pit of success".
jaskij@reddit
While treating it as a string works well enough, a git hash is not a string. It's a hexadecimal number. Stuffing 0x in front would have worked, until you got to 14/16 digit short hashes, at which point you would have probably hit a range bug. But that's unlikely - last I checked, even the Linux kernel used 12 digit short hashes.
OMG_A_CUPCAKE@reddit
Yeah, that headline made it sound like they got hit by a git hash collision
ruuda@reddit
How can you tell a senior yaml engineer apart from a junior yaml engineer? The senior yaml engineer will defensively quote all the strings.
(I have a blog where I try to write interesting in-depth posts, and then one time I wrote a rant about yaml, and it gets more views than any of my other posts combined.)
tanorbuf@reddit
Yaml is mostly used for configuration, where values are typed manually and the user has syntax highlighting on. Most of these problems disappear when viewed in that context. However it's true that it's not well suited for templating and that sadly the ecosystem never fully adopted Yaml 1.2, even though it's super old at this point. I feel like it should be easy enough to migrate though, since i assume nobody actually uses sexagesimal numbers (and I think the norway problem hasn't been real for some time).
Worth_Trust_3825@reddit
Cool. What type of value is
1.2.3
?HugoNikanor@reddit
String
Sillocan@reddit
I share your rant monthly! Its basically religious text when living in GitLab
hpxvzhjfgb@reddit
daily reminder that weak typing is a sin
ShinyHappyREM@reddit
all keys must be pressed with a force of >= 5kg
seanmorris@reddit
Just because its valid YAML to leave something unquoted, doesn't mean you should.
Hangman4358@reddit
I have argued for not unquoting strings forever. We have had multiple bugs like this.
"But the guotes look UGLY, we will only quote when it causes bugs not to" 🤦♂️
ShinyHappyREM@reddit
Redirect all bug reports of that nature to them.
cedric005@reddit
once again yaml
jherico@reddit
Json has a spec that's like 5 pages and yaml has one that's excessively long and has multiple versions and is loaded with little landmines like this one.
Fuck yaml.
brasetvik@reddit
country: false # NO for Norway
shim__@reddit
TL;DR: use an yaml lib
scottrycroft@reddit
I'd have some hesitancy at making a yaml library a production deployment dependency...
xeio87@reddit
TFW you have a few updates to download to match the server.