Parse, Don't Validate — In a Language That Doesn't Want You To · cekrem.github.io

[-]

TheRealPomax@reddit

This is why I wrote https://pomax.github.io/use-models-for-data at some point. Either your data fits your schema, with whatever rules need to be applied to the values to determine that they're right, or your data is bad and you'll have to deal with an exception.

[-]

Blue_Moon_Lake@reddit

Good ol' PHP type array which can be anything on Earth is my nemesis.

I have an absolute rule that every legacy project must be purged of that array type as the first step otherwise I refuse the work.

[-]

TheRealPomax@reddit

lol PHP. Let me introduce you to my good friend Perl.

[-]

jf908@reddit

Hey it's you! I was admiring earlier today the impact you've had on my knowledge of bezier curves and Japanese grammar :)

[-]

TheRealPomax@reddit

Always happy to hear someone enjoyed those works =)

[-]

lelanthran@reddit

Your writing style is (to my horror/delight) very much like mine (excessive use of asides in parentheses).

Of course, I used to be a Lisp program 20 years ago... (Not sure what you excuse is :-))

[-]

cekrem@reddit (OP)

HAHA, well, I don't know what to say to that. I've used Lisp as well. But mostly Elm, these days :D

[-]

lelanthran@reddit

HAHA, well, I don't know what to say to that. I've used Lisp as well. But mostly Elm, these days :D

You're the original blogger, right? Have you seen my explanation of PdV? I had the same idea as you did - explain it in a way that programmers used to mainstream languages can understand.

As a side-effect, it's also to demonstrate that PdV can be done in almost any strongly-typed language, such as C.

[-]

cekrem@reddit (OP)

Oh, nice! I haven't seen that (and I'm sadly not doing a lot of C either to be honest). Cool, thanks for sharing!

[-]

rsclient@reddit

You mean, just the right number of (parenthetical) asides. How other people can think without having branches in their conversation and writing has always been a puzzle to me.

[-]

femio@reddit

wow i've found my people (finally)

[-]

rsclient@reddit

I liked the takeaway: "make the type system carry the proof, not your memory"

[-]

ggchappell@reddit

Yup. And I don't have to read a whole post to figure out what that means.

[-]

davidalayachew@reddit

Yup. And I don't have to read a whole post to figure out what that means.

If you are referring to Alexis King's article title (the one this post is referring to), her title was almost perfect, she just needed to name it "Parse, don't just validate". That would have been much more clear.

Of course, her article is easy reading and jumps straight to the point. So it's not painful to RTFM in this case.

[-]

dlsspy@reddit

Adding words takes you further from the truth. The title and the phrasing is fine.

[-]

davidalayachew@reddit

Adding words takes you further from the truth. The title and the phrasing is fine.

I disagree, in that I think the title isn't bad, but could certainly be improved.

At the end of the day, validation is often considered to be part of parsing. So, to say don't validate kind of sends the wrong message.

Really, what her article is saying is don't just validate, take it further and encode those validations into the type system. Hence why I think Parse, don't just validate is a much better title -- it retains almost all of the simplicity, while removing the most common confusion point for readers.

And I know it's the most common confusion point because I paste her article often, and nearly every single time is the same confusion.

And again, I consider this to be a minor, unfortunate wart on an otherwise amazing essay. Her essay is #2 on my top 5 Programming Articles of All Time. It really is that good, and I think she is a great writer in general.

I just think she got the title wrong, is all.

[-]

dlsspy@reddit

I understand that you disagree, but you're adding complexity and introducing confusion around the distinction between parsing and validating that she's describing in her article.

> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output. By its very nature, a parser is a partial function—some values in the domain do not correspond to any value in the range—so all parsers must have some notion of failure.

A failing parser could be used as a validator, but adding the word "just" in there strongly implies that you could build a validator and a parser where she's drawing a pretty clear distinction between the two concepts. They both might need to do some of the same work at a high level, but they're not the same.

Adding "just" makes sense to people who have a specific concept of what a parser is in mind and what they believe a validator is and don't want to understand what the point of separation is. I typically point people to this in code review when they're writing validation functions pointing out that they should not be writing validation functions at all.

Perhaps this is a limitation of languages you're working in or something, but I found it very valuable to eradicate the notion of "validation" from programmer's brains as being a good idea in the first place. If a number must be positive, we have a data type that cannot represent numbers that are non positive and we use that everywhere. The point where you construct the value (i.e., the parser) can fail if you try to construct it from a wider value that the type can't represent. We don't "validate" the number, we just don't have a way to represent it, so the parser fails.

[-]

davidalayachew@reddit

A failing parser could be used as a validator, but adding the word "just" in there strongly implies that you could build a validator and a parser where she's drawing a pretty clear distinction between the two concepts. They both might need to do some of the same work at a high level, but they're not the same.

Then let me be explicit in my language -- I view validators as a pure subset of parsers, with the only caveat being that validators don't return a stronger type, just the same type (and usually, the same value).

So, I agree that she is drawing a line, but instead of it being a line in the sand, where the left is validators and the right is parsers, I think she is drawing a circle in a Venn Diagram, where everything a validator can do, a parser can do, and more. As in, the validator circle is completely encircled by the outer circle for parsers.

Therefore, the phrase "don't just validate" is telling the programmer to go beyond validating, pointing specifically to parsing.

I'll point to an example in Java.

Java is in the middle of adding null-awareness to its type system. One of the JDK library methods that will change once it is added is the basic library method Objects.requireNonNull(T input).

Right now, this method accepts T and returns T -- your stereotypical validator.

But once Java adds null-awareness to the type system, this method's signature is going to change to accepting T? and returning T!, where ? means null-possible and ! means null-impossible.

This is what I mean by saying that all validators are parsers. And even the validators that don't explicitly return the value (returning void for example) are still parsers because, after validating input, you will use it somewhere else after validating. So, in effect, you very much are "returning" input, even if the method signature does not reflect that.

Perhaps this is a limitation of languages you're working in or something,

I spent about a year programming Lisp and 2 for Haskell, so I doubt it.

I found it very valuable to eradicate the notion of "validation" from programmer's brains as being a good idea in the first place.

I somewhat agree, but imo, I feel like validation is more of an incomplete half-measure. Parsing is validation + transforming in my eyes. So, to just validate only makes sense for the most primitive of settings -- quick scripts or a barebones microservice that does nothing important. It certainly has its uses, but those are the type of uses I mean.

For all other cases, parsing is strictly superior.

[-]

dlsspy@reddit

> I view validators as a pure subset of parsers, with the only caveat being that validators don't return a stronger type, just the same type (and usually, the same value).

I think I can see that, but it still seems to be in conflict with the actual blog post we're discussing. I've not programmed in java in probably a couple decades, but the requireNonNull thing does seem to be a blurry validator.

The first result I found looking for it was using it strictly for its side effect of throwing an exception on null. This is how a typical validator is expressed and what the article is discussing. It either stops execution or doesn't, but it carries no information forward, so you have to assume that the validation didn't occur and must occur again.

I don't think that's a point of disagreement.

The disagreement seems to be around the idea that a parser has an implicit validation component which is where I think the mental model starts falling apart (as well as diverging from the article).

e.g., when I'm writing a parser for String -> Maybe Int, I don't "validate" the String nor do I "validate" each Char as I parse the individual digits. A naive implementation might just `traverse` the String with a `Char -> Maybe Int` function and then fmap a fold of that into the `Maybe Int`. At no point does one need to think about "validation" here. I can do a naive pattern match with a bunch of '0' -> Just 0; '1' -> Just 1; ...; _ -> Nothing matches and either the "less structured" input can map to the "more structured input" and we complete the parse, or we there's no match and we don't get a result.

That is pure and simple parsing and is not a superset of validation. The concept of validation isn't considered. Validation, to me at least, is an attempt to prove some input is incorrect. Besides not carrying information, it's typically incomplete. A parser finds the valid value that its input is hoping to represent if there is such a thing.

I don't think you're completely incorrect in your mental model, but I think a simpler mental model is a lot easier to manage and also happens to match the blog post.

I also understand that not all people have the same mental models with concepts. e.g., product people have tried to get me to view bugs and feature requests as fundamentally different concepts to the point of wanting me to use different systems with different priority mechanisms to track them, and seemingly get frustrated when I say "so, the software doesn't work as desired?"

The main difference I care here, though, is that while we both agree the article is great and everybody who works in software should read it and they'll all be better, I think it's slightly more correct than you do.

[-]

davidalayachew@reddit

Validation, to me at least, is an attempt to prove some input is incorrect. Besides not carrying information, it's typically incomplete.

Then this is the source of our disagreement. I feel the complete opposite.

Let's start with a dictionary definition.

Validate

To establish the soundness, accuracy, or legitimacy of: synonym: confirm.

Prove valid; show or confirm the validity of something.

Valid

Well grounded; just.

"a valid objection."

Producing the desired results; efficacious.

"valid methods."

All of these definitions, to me, spell out "prove correct", as opposed to "prove incorrect". One might even say "prove valid". The fact that "confirm" is a synonym reinforces that point to me.

That's an important distinction to make because of the next part of your quote.

A parser finds the valid value that its input is hoping to represent if there is such a thing.

[emphasis mine]

Finding the valid value sounds very much like confirming that the value is valid. Which is what the above definition is saying.

With this new definition of validation, let's rebase it against your number parsing example.

A naive implementation might just traverse the String with a Char -> Maybe Int function and then fmap a fold of that into the Maybe Int. At no point does one need to think about "validation" here. I can do a naive pattern match with a bunch of '0' -> Just 0; '1' -> Just 1; ...; _ -> Nothing matches and either the "less structured" input can map to the "more structured input" and we complete the parse, or we there's no match and we don't get a result.

Your "naive pattern matching" example does 2 things in my eyes.

Confirms that the result is a valid value.
Transforms the valid value into a new value.

That first step is the validation -- confirming that our input data is correct/usable/valid. Hence my point -- this is validation under the hood of pattern matching.

And maybe to help disambiguate, lets look at transformation separately.

Imo, Transformation is a complete function where every value in the input is known at compile time to have a matching output value. Basically a T -> T, no Maybe needed.

A good example of this is your typical grade school cypher, where kids have to translate a message by using a character transformer.

char transformer(final char c)
{
    return c + 42;
}

Due to integer overflow/wraparound, this function is (provably) guaranteed to have a matching output value for every possible input value it could receive.

I highlight this because I believe this is what it means to actually have no validation. Aside from the initial type test/overload-method-selection, there is no validation at all done here -- simply a basic transformation.

That's different than parsing, which requires an upfront validation check that your data is valid in the first place. If no check was needed, then it wouldn't be parsing, it would just be transformation alone.

Lastly, I think there's value in zooming out, and looking at the intent of these concepts, not just their semantics.

Like I mentioned, Transformation is a complete function that is guaranteed to map all inputs to an output. That completeness is a powerful trait to have. But knowing when and where it is safe to apply that transformation function is the difficulty. Not all values should be transformed, whether or not there is a representable output for them.

And that's why parsing is effective -- it combines Validation with Transformation to get the best of both worlds. We filter down our input domain with validation to only the valid input values, then unconditionally transform them using a transformer. Parsing.

You mentioned pattern-matching, which is great, because that's another example of this "best of both worlds" design.

If parsing is "validate, then transform if valid", then pattern-matching is "test, then extract if test passes". That test could be of many different forms -- a type test, a value test, etc. But this same trick of combining conditionality and totality is how we get the best of both worlds -- we use conditionality to cover the weaknesses of the other half.

The main difference I care here, though, is that while we both agree the article is great and everybody who works in software should read it and they'll all be better, I think it's slightly more correct than you do.

Hopefully you see now why I feel it is less correct than it could be? I feel like the title throws away useful information that could be retained (lol) to enhance the essay.

Now, you could argue that what I am saying doesn't align with the essay, but imo, the essay just doesn't make the connection, as opposed to disagreeing with me. Nothing I see in the article seems to contradict with what I am saying. If anything, I kind of feel like she sort of agrees with what I am saying, and just doesn't explicitly highlight it (or notice it).

[-]

dlsspy@reddit

Validation, to me at least, is an attempt to prove some input is incorrect. Besides not carrying information, it's typically incomplete.

Then this is the source of our disagreement. I feel the complete opposite.

I don't think that's very fundamental. it's generally more straightforward to prove something wrong than prove it right. An email address "validator" can easily find things that can't possibly be an email address, but it can't (in isolation) validate that an email address is correct.

You seem to believe that parsing is a superset of validation in which case it's still redundant to say tell people to validate /and/ parse. If parsing requires verifying things are valid, then it's at least confusing to tell people they need to do validation in addition to parsing.

From that perspective, I can see how you might call my example number parser a "validator", but only if you really want to. I just see it as the easiest way to find which digit a character represents allowing for a "not applicable" case. My mental model didn't include validation, just fallible transformation.

Hopefully you see now why I feel it is less correct than it could be?

I think I see why you feel that, but I still think you shouldn't and it's more productive if you understand why the article is suggesting that you don't look at things this way. Your mental model is in disagreement with the article and you're telling people that that it needs to be corrected.

I still agree with the article and want more people to think about using parsers instead of validators.

[-]

davidalayachew@reddit

An email address "validator" can easily find things that can't possibly be an email address, but it can't (in isolation) validate that an email address is correct.

This is conflating the business definition of correct (is the email the "correct" email) with the programmatic definition (is this a structurally valid email address?).

At the end of the day, if you want to prove that an email is valid in the business sense, the only fool-proof way is to send a confirmation email. But, in the name of saving bandwidth (and catching mistakes early on), we created this programmatic domain object of email, which we can assert various claims about. One of which is confirming that the email address follows at least a minimum level of quality (contains the @ symbol).

So yes, we can validate that an email address is "correct" to the level of quality that we ask of it in the code.

You seem to believe that parsing is a superset of validation in which case it's still redundant to say tell people to validate and parse. If parsing requires verifying things are valid, then it's at least confusing to tell people they need to do validation in addition to parsing.

Oh, I'm not saying it is perfect, I am saying it is better. At the end of the day, no matter how you slice it, MANY people thought her article had a very confusing and unclear title because they thought that it was saying that we should not validate at all.

By all means, maybe my title could be improved, but at the very least, it would prevent the confusion that so many others had with her existing title, and thus, would be an improvement.

Plus, worst case scenario, someone thinks "wait, parsing involves validating!", which is a better state of confusion to be in than "wait, we shouldn't validate data?!", which many people unironically read the title as. I consider mine as an improvement from hers.

From that perspective, I can see how you might call my example number parser a "validator"

I don't think your example is a validator -- I think your example includes validation. But it also includes transformation, thus making it a parser.

I just see it as the easiest way to find which digit a character represents allowing for a "not applicable" case. My mental model didn't include validation, just fallible transformation.

Well, back to the definitions I pointed out -- you are very much doing validation, according to the dictionary. It's just that you are also doing transformation, thus making your example a parser.

Parsing is, in effect, "fallible transformation". In the same way that Pattern-Matching is "conditional extraction".

the article is suggesting that you don't look at things this way. Your mental model is in disagreement with the article and you're telling people that that it needs to be corrected.

Can you point out where? I don't see it.

[-]

dlsspy@reddit

I think somewhere between

"the difference between validation and parsing lies almost entirely in how information is preserved"

and

"the precise definition of what it means to parse or validate something is debatable"

So I guess we're at the debate portion.

I would still argue that if you're in a situation where "MANY people thought her article had a very confusing and unclear title" then those people are the ones who have the most to learn. Just because one doesn't understand something, that doesn't mean the thing is wrong. This article would have much less value if everyone already thought this way.

I'm probably also a bit sensitive to this because I've heard the same argument against things like "Black Lives Matter" where people will try to argue that the three words that it's easy to write on things and chant and stuff don't convey the detailed nuance of the meaning and it allows other people to willfully misinterpret it (e.g., "wait, are you saying only black lives matter?").

People are going to find a way to misinterpret anything. If someone reads this article and comes away from it with "you should allow invalid input into your application" then that's kind of a choice, but the example validators and classic validators I've seen and usually point people in code review to this article when I see them are in that same shape: a side-effect only exception source when something isn't good that will have to be placed all throughout your codebase. If that's the conclusion someone gets from reading just the title of an article without engaging with the article itself, then I'd kind of rather not work with those people.

I'd still encourage you (and anyone else) to internalize "parse, don't validate." If you catch yourself writing a validation function, you've got an opportunity to do better.

[-]

davidalayachew@reddit

Just because one doesn't understand something, that doesn't mean the thing is wrong.

The purpose of the title is to communicate about the content in a useful way. If it is adding confusion, to the point where people find that it hurts their comprehension of the article, then that is a problem. Hence my point -- the title could be improved, and I think mine was an improvement, if not perfect.

I've heard the same argument against things like "Black Lives Matter"

I'd assert 2 differences.

BLM was to declare that Black Lives Matter, but they are currently being treated like they don't. The confusion, frankly, helped it have more mainstream exposure. And if your intent is to draw more attention to something, one could argue that keeping things simple was the right choice, even at the expense of clarity.
BLM was not meant to be a vehicle for precise, political discussion. It was to draw attention to the flaming dumpster fire that is treatment of black people in america (and other countries). Saying the "dumpster is on fire!" is a little different than trying to invite a deeply technical, nuanced discussion.

Regardless, I see your point. I just think that the modes of communication are different. One is a siren to an extremely urgent problem. The other is trying to convince others of a fairly novel discovery (for non-FP folks).

People are going to find a way to misinterpret anything. If someone reads this article and comes away from it with "you should allow invalid input into your application" then that's kind of a choice

This, I agree with. I think there were comparatively few people who thought that after reading the article. My only point is that, for folks with limited time and want to skim, the title hurts the comprehensibility for at least a decent chunk of programmers.

I'd still encourage you (and anyone else) to internalize "parse, don't validate." If you catch yourself writing a validation function, you've got an opportunity to do better.

Well, the reason why I like the "don't (just) validate" is because I am not just thinking about parsing.

Like I mentioned, this idea of combining 2 ideas together to cover each others weaknesses is a powerful concept in programming. And there are a few avenues where that has been explored, like parsing and pattern-matching. But there are others where it is (comparatively) uncharted, like typeclasses and functors.

For me, I don't even think about the "Parse" as much as I retain the "don't (just) validate" because then it allows me to think about what is the best fit for my situation at hand. Maybe it's parsing? Or maybe I should try something novel?

Me personally, I think the real spirit of her article is to see your problems for what they are, and adapt your design strategy accordingly. And yes, parsing is often an excellent choice. But parsing was discovered because it is an excellent choice, and not because it is the de-facto answer.

I think that we, as programmers, should focus on finding the best choice for the problems at hand, and getting into the spirit of looking past our strategies and seeing things from a zoomed out view is the best way to get their.

That is why I prefer internalizing "don't (just) validate", even though I have enough context to understand either way. My version encourages a more creative mindset from me, which helps me dig for and find some pretty cool solutions.

[-]

dlsspy@reddit

If I disagreed with you more, this would be a shorter and less pleasant conversation. I'm glad people have different perspectives at least. Would be boring if everyone agreed with me on everything.

[-]

davidalayachew@reddit

If I disagreed with you more, this would be a shorter and less pleasant conversation. I'm glad people have different perspectives at least. Would be boring if everyone agreed with me on everything.

I can agree to that.

Ty for your time.

[-]

max123246@reddit

Gotta say most programmers do not understand that typically reach for implicit assumptions rather than codifying invariants into the type system

[-]

Chii@reddit

It does take a lot more effort to program if you need to codify those invariants. If you don't care about the craft and is just looking to shit something out that mostly works...

[-]

max123246@reddit

In the words of Richard Gabriel:

(Worse is Better)[https://en.wikipedia.org/wiki/Worse_is_better]

And of course in the words of Richard Gabriel:

(Worse is Better is Worse)[https://www.dreamsongs.com/Files/worse-is-worse.pdf]

[-]

glenrhodes@reddit

The TypeScript angle here is interesting because the language actively works against you. You can do parse-don't-validate with Zod or io-ts, but now you're fighting two type systems simultaneously. Haskell makes this basically free with newtype and smart constructors; TypeScript makes you earn it.

[-]

BenchEmbarrassed7316@reddit

Good article. For me, the as operator in TypeScript is the equivalent of unsafe in Rust (with a subsequent call to transmute).

[-]

hasparus@reddit

Nice article, but I think it's missing an arktype shout-out. I feel it's the most typescript-y and one of the best performing alternatives.

[-]

george_____t@reddit

Worth noting that Alexis' follow-up post points out that this sort of nominal ("extrinsic") type safety is a lot weaker than the structural ("intrinsic") version that she mostly had in mind.

[-]

george_____t@reddit

In Elm I’d reach for an opaque type and a smart constructor and be done in about four lines.

And then deal with the fact that I can no longer use that type as a key to a dictionary, etc... I don't know how people can bear that language. At least now that Haskell can be compiled to WebAssembly, there's a serious alternative.

[-]

elperroborrachotoo@reddit

I'm not so much against the principle as I'm irrationally pissed off by the examples.

This lists various incomplete attempts at validating an e-mail through an regexp. We've long agreed that the only sane way to verify an e-mail is to request information sent to it. Even if that's not possible, verifying that it contains an @ is at best a UI hint in data entry.

(Oh, and mail servers may treat the local part case-sensitive, FWIW.)

What's the worth of a "validated" e-mail address that's not really valdiated?

Storing an age? Admittedly, some software has become very short-lived, but it's not that bad yet, isn't it?

An arbitrary upper limit, while unlikely to be reached at least in the near future, still recalls all the problems of storing two-digit birth years. To complicate matters, in some cases a valid lower age may depend on region or regional legalities, somethign that cannot be reasonably expressed in a parsed type.

My gripe is:
What does type Email express? Something that looks like an email to the famous moron in a hurry? Ad-hoc validation examples make it look like it's okay to pass on invalid addresses as valid, or - worse - reject valid addresses as invalid. Are all the "Falsehoods programmers believe..." in vain?

Disclosure: I dont have a better simple, inutitive example handy.

[-]

evincarofautumn@reddit

I guess the reason email addresses are appealing as an example is that they’re both widespread and more complicated than you might think.

But as far as I’ve seen, usually in these types of articles, the end result remains a string internally, which is still discarding information. Merely wrapping something in a newtype does add some type safety, but if all you do is pull it apart again and do string stuff to it, it’s just ceremonial.

What I’d like to see instead is an AST. The email address string is just a compact serialisation format for that data structure.

Now, emails are still not a great example, because there’s rarely an actual reason to parse the structure of the address in that way. But at least this makes it plain what the point of “parse, don’t validate” is: to transform the input into a format that can only represent valid values.

[-]

Nwallins@reddit

the end result remains a string internally, which is still discarding information. Merely wrapping something in a newtype does add some type safety, but if all you do is pull it apart again and do string stuff to it, it’s just ceremonial.

Not to my reading, as the only way to have the newtype is having gone through the parse/validation function. It may be a string, but it is guaranteed to no longer be an arbitrary string.

[-]

evincarofautumn@reddit

That’s true, as long as it’s encapsulated. What I mean is that you discover internal structure of a value through parsing, and if you discard that and only keep the Boolean “yep, it’s valid” encoded by the newtype constructor, then the value needs to be reparsed when you want to actually use any of the structure. Sometimes yes/no is all you need, but not for most of the things I parse.

[-]

Nwallins@reddit

Let's take a simplified email address. If you are saying that that ValidEmailAddress should have a name_portion and host_portion, then yes, I and parse-dont-validate completely agree, to the extent that the program needs to operate on either portion. But name_portion and host_portion remain strings. And if the system doesn't use the portions and only entire addresses, then splitting is unnecessary.

[-]

umtala@reddit

Here's a good example. Let's say you have JSON where the value is stored as a digit string:

{
  "amount": "123456789012345678901234567890"
}

JSON has no bigint type so you have to use a string. What you can do is make a Zod type that parses this digit string and turns it into a bigint:

z.object({
  amount: z.regex(/^-?[0-9]+$/).transform(s => BigInt(s))
})

The input type is { amount: string }, the output type is { amount: bigint }.

A validation approach would require first validating the shape of the JSON, then transforming the amount into the type you want. In practice this tends to be error-prone especially if you have to do it more than once.

Parsing skips the intermediate validated-but-wrong-type step and lets you go directly to the type that you need.

[-]

lelanthran@reddit

What's the worth of a "validated" e-mail address that's not really valdiated?

As a value? None (Other than to warn the user that the "email" they typed in is invalid).

As a type? All the value that every other type has.

Compare:

void foo (const char *email, const char *password) { ... }

with

void foo (email_t *email, password_t *password) { ... }

Can you not see the value in preventing the caller of foo from accidentally swapping the email and password when calling foo?

You're thinking of "validation" only in terms of "Validate this value" (which is, to be fair, what 'Parse, don't validate' calls validation), but there is value in storing types distinct from each other, even if they use the same underlying representation.

In the latter case, you're leaning on the languages strong typing rules (like in the C examples above) to ensure that emails, once they get into the system, are never going to be accidentally treated as any other string.

[-]

umtala@reddit

Can you not see the value in preventing the caller of foo from accidentally swapping the email and password when calling foo?

No. The solution to naming mishaps is object property shorthand and consistent naming across your codebase. e.g. in JS:

const email = "alice@example.com"
const password = "hunter2"
foo({ email, password })

Look ma, no mixing possible!

Types are good at ensuring that data is of the right shape. Types are not good at distinguishing one string from another string, and every attempt to use types for this tends to lead to excessive boilerplate, boxing and unboxing of values.

This is one of the things that JS and Rust get right and other languages are yet to catch up with.

[-]

lelanthran@reddit

The solution to naming mishaps is object property shorthand and consistent naming across your codebase. e.g. in JS:

Your solution requires that all the devs practice discipline all the time.

The PdV approach requires that the single dev responsible for data ingress practice discipline.

Types are good at ensuring that data is of the right shape.

It seems to me that you are arguing that types should not carry semantic information, but that information should be carried by the variable names, correct?

[-]

elperroborrachotoo@reddit

Can you not see the value in preventing the caller of foo from accidentally swapping the email and password when calling foo?

That's strong typng alright, but has nothng to do with validaiton vs. aprsing.

As I said, I am not arguing against the principle, I'm just irrationally angered by the quality of examples.

[-]

lelanthran@reddit

That's strong typng alright, but has nothng to do with validaiton vs. aprsing.

I'm saying there is a distinction between validating the value and validating the type.

Of course it is; your compiler is validating the type, so that you cannot accidentally use one string type when you meant to use another string type.

(Also, the complaint I always see about "Email is not validated unless you receive a reply when you send the activation link", is a trite and thoughtless one. A moment's reflection would reveal that that is true for almost all contact information, and yet throughout the decades, we still stored it, didn't we?)

[-]

elperroborrachotoo@reddit

let me rephrase: what does "parse don't type" add over "use strong types"?

[-]

lelanthran@reddit

Assuming you mean 'Parse, Don't Validate'...

Using a strongly-typed system does not mean that you are using types aligned to values entering the system from the outside.

"PdV" adds correctness guarantees within your system; it's effectively saying "if a foo_t is ever seen within the system, the validation for it was already run and it is safe to treat as a foo_t".

[-]

bannable@reddit

I'm going to assume that your misquoting of "parse, don't validate" was an honest error.

For whatever definition you want to use of the term, "strong" is not a trait that applies to a type. It applies to a type system.

const foo: any = ... is, by some definitions of "strong", strongly typed. It's not a useful type, but the type is there.

So that's the difference: Parse your data into structured types, and don't confuse deser for parsing - using a typed language alone will not save you.

[-]

RecursiveServitor@reddit

Typed ids is a good example of this, where there may be no validation of the value, but we wrap it in a type so the compiler can help with correctness.

[-]

rsclient@reddit

I liked the examples because they were short, and agree that different examples might have been better.

But my disagreement was how the age had to be more than zero. In this day and age, don't parents pre-register an email for their kids? Even before birth?

We can certainly all agree that everything having to do with people is painful. What if a person doesn't have a nationality? Or a known birthday? Or an address?

[-]

anon_cowherd@reddit

There's one good reason: in the UI, making sure the user didn't accidentally type something like their first name instead of an email address.

There's "this must be a real world email address" and there is "this string must match the format of an email address".

The send email function is a bad example to use, because yes a user shouldn't be validated by presence of email alone, but it is at least an easily comprehensible example.

[-]

elperroborrachotoo@reddit

you night as well ask what the value of branded / new types is

I'm not asking that, really. I'm wonderng what "parse don't validate" adds to strong semantic types.

What's the actual guarantees type Email should make?

can't be assigned from string
has passed through EmailFromString function
not empty, contains an @
matches RFC 5322 spec
was verified at least once
was verified "recently"
...

Isn't that a very central question?

[-]

anon_cowherd@reddit

That depends entirely on your domain language. This level of typing gets into the whole DDD paradigm where the whole business has an agreed upon vocabulary.

At the very least:

Bullets one is true two is sufficient though not strictly necessary (there could be multiple parsers that produce an Email) Three and four are implementation details Five and six are states of a combination of a user and an address.

Consider the types as categories: what distinguishes the category of email addresses from the broader category of strings? The verification state is relevant to a specific user at a point in time (users can change email addresses, and they can be recycled among many users) but isn't relevant to the quiddity of an email address itself.

[-]

Tubthumper8@reddit

Maybe a better example would be a phone number?

A PhoneNumber type would carry both the country code, the rest of the digits, and possibly an extension code - as well as the fact that the country code exists and the digits follow a valid format for thar country code.

Unless of course you'd say that PhoneNumber isn't actually valid until the system has called that number and someone answered, so that example might have the same flaw as Email

[-]

T_D_K@reddit

There's value in all of the following:

Better information in apis and function signatures
Eliminating copious amounts of null and empty string boilerplate
Encoding and enforcing business rules with the compiler
Making sure that your email address is plausible before you spend vendor credits on something that will obviously fail

Validating that email is never going to be perfect, but doing as best you can is a lot better than giving up.

[-]

nculwell@reddit

The link to the original article is dead, here's a working link:

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

[-]

cekrem@reddit (OP)

Thanks! I'll update right away!

[-]

ggchappell@reddit

Thanks for that.

[-]

max123246@reddit

I'd love to see a similar post for Python. They've added some really nice stuff like with the match statement and dataclasses but there's so many options that I don't know what to reach for when building a library. Like there's protocols, ABCs, just declaring a type as a union of other types, creating an enum, dataclasses...

[-]

Deep-Thought@reddit

I get the point of the post, and agree with it in most cases. But validation still has a place when a type system is not robust enough to model all the requirements of the type. Think of cases where validation rules span several fields of the request.

[-]

davidalayachew@reddit

I get the point of the post, and agree with it in most cases. But it is never good to be too dogmatic. Validation still has a place when a type system is not robust enough to model all the requirements of the type. Think of cases where validation rules span several fields of the request.

I agree with your point, but the example isn't great.

Parsers compose. Meaning, you can put a parser into a parser into a parser into a parser. And if the inner parser fails, then the outer parser fails. Conversely, if the inner parser succeeds, but the outer parser fails, then the value of the inner parser is just "thrown away".

In your example, even if the individual fields parse correctly, but the overall request does not, well, no problem -- you just throw an exception instead of returning your parsed object.

But like I said, your point is still correct. Java just recently added Pattern-Matching, but are still working on adding some of the nice-to-haves that usually come with it. As a result, asserting certain validations about your type are either awkward or prohibitively verbose to do. In those cases, simple validation would probably still be the better net tradeoff, until the nice-to-haves get released later.

[-]

sailing67@reddit

ive been burned by this so many times in typescript. you add a zod schema to validate something and think youre done, but the type is still string | undefined downstream and you're basically validating in one place and asserting in another. switching to parsing-first made my code so much easier to reason about tbh. less defensive checks everywhere.

[-]

Nephophobic@reddit

While I agree with the post, two things:

Use type guards in Typescript and not validators (i.e. boolean-returning functions)
If you're using discriminating unions in Typescript but not ts-pattern, you're missing half of the solution!

[-]