Plain text has been around for decades and it’s here to stay.
Posted by Successful_Bowl2564@reddit | programming | View on Reddit | 55 comments
Posted by Successful_Bowl2564@reddit | programming | View on Reddit | 55 comments
Valmar33@reddit
"Plain text" is not some "simple" thing. There's more to the world than ASCII. There's other languages to deal with, requiring the unfortunately necessary monstrosity of complexity that is the UTF-8 standard. Besides that, "plain text" is just another form of binary. For a processor, "plain text" is just more bundles of bits.
Smallpaul@reddit
How does this comment respond to the article?
Humble-Vegetable9691@reddit
There is no plain text. There is always a thing called encoding. It is how my CP 852 lost the double box drawing characters.
Smallpaul@reddit
Plain text is an abstraction above text encoding and below markup languages, programming languages, text based protocols, etc.
It is an abstraction with multiple instantiations due to character set and encoding.
Humble-Vegetable9691@reddit
You are still thinking in EN-US.
Smallpaul@reddit
Name a country where what I said is not true and explain how, with concrete examples.
Humble-Vegetable9691@reddit
Open https://www.ascii-codes.com/ in one tab
Open https://www.ascii-codes.com/cp852.html in an other tab
go to the bottom of both pages
Check that your """plain""" 185 looks like my 185? 189? 198?
Trust me, your """plain""" text boxes with combined single and double boxes are looking quite disgusting on my screen.
Also, "E8329BFD4697D9EC37" is hellohello but in classic SMS PDU (7-bit alphabet). It is quite plaintext regarding the long accented letters of my alphabet ;) (look at 235 on the CP 852 linked above)
Valmar33@reddit
This got to me a little
Potent? Eh...
tav_stuff@reddit
UTF-8 is not a ”monstrosity of complexity” – it’s actually very simple. The complexity comes from Unicode
claytonkb@reddit
I adapted a UTF-8 library for use in my own C string processing library... it was a breeze. UTF-8 is refreshingly straightforward in an ocean of character "standards" that seem to be ever-shifting...
tav_stuff@reddit
Yeah it’s really great. I remember writing UTF-8 code for the first time in C and being amazed at how easy it was.
Then I wrote a Unicode library for C… and oh boy was that a rabbit hole. Took me 2 months to go through everything, and it was crazy complicated
sacules@reddit
Yeah utf8 is so simple it fit into a napkin :) thanks to ken thompson and rob pike for that implementation.
Ameisen@reddit
I mean, a lot of very complex stuff fits on a napkin.
Most people don't really understand quaternions, yet the quaternion formulas fit on a small napkin.
levir@reddit
You could write Maxwell's equations on a napkin, describing the electro-magnetic force in it's entirety.
sacules@reddit
Yeah they were iterating on how to implement this and the solution seemed to fit into a napkin they forgot at a restaurant.
squigs@reddit
I do wonder whether we'd have come up with something a little tidier than UTF-8 if we'd planned variable length encodings earlier.
Enerbane@reddit
HUH??
I'm having a hard time with this. I'm sure there's probably a point in here that can be defended but...
What exactly do you think the "U" in UTF stands for??
Unicode Transformation Format is very simple but Unicode is complex?
Tau-is-2Pi@reddit
UTF-8 only cares about encoding the Unicode codepoints into 1 to 4 bytes. It's basically just a variable-width integer format. It's very straightforward.
boa13@reddit
Exactly. The Transformation Format is well thought-out and rather simple. What it transforms, Unicode entities, is very complex.
locoluis@reddit
Yes. UTF-8 is just an algorithm used to transform Unicode characters to sequences of 8-bit bytes.
Valmar33@reddit
Ah, I might have mixed them up. Cheers. :)
FullPoet@reddit
How to spot OP is from USA :)
danielcw189@reddit
They clarified what they mean with "ASCII" at the end of the article
meganeyangire@reddit
I bet OP didn't even have to switch a keyboard layout even once in their entire life.
neuralbeans@reddit
If only it were simple! With all the different encodings, line endings, and byte orders.
Valmar33@reddit
Yeah, computers are not "simple", once you look past the layers of abstractions.
It's a miracle anything works even half-decently.
neuralbeans@reddit
A classical example of multiple competing standards surviving longer than necessary.
programming-ModTeam@reddit
Your post or comment was removed for the following reason or reasons:
Your posting was removed for being off topic for the /r/programming community.
Listicles are not allowed on /r/programming.
mmaldacker@reddit
the idea of plain text is to be easily pipenle in unix, this is just stylised UI
daltorak@reddit
That's not "ASCII". Box drawing symbols are not ASCII characters. They were part of code page 437, and today are Unicode codepoints.
And a dirty little secret of modern terminal programs is that these symbols have a distinct drawing path. They don't use the symbols built into the font, otherwise you'd get gaps between lines because of line-height & zoom settings.
Example: Microsoft Windows Terminal Gets Glyph Rendering Improvements | Extremetech
xitiomet@reddit
Code page 437 was often branded as "Extended ASCII", which while incorrect, was the most common use for 128-255. I'd say its forgivable to call the original box drawing symbols ASCII.
otac0n@reddit
You have to pick something for the top half. This is pretty standard. More than excusable, this is common usage.
lerliplatu@reddit
Taking that argument to its extreme, UTF-8 is ASCII.
KevinCarbonara@reddit
Except not at all. That's not the same argument. That's an entirely different argument you've tried to shoehorn in.
KevinCarbonara@reddit
Previously known as "high ASCII".
It's completely fine to call it ASCII
Dwedit@reddit
Even the original text mode hardware had a distinct drawing path for box drawing characters.
For 80 column text, you either had the screen resolution at 640 pixels wide, or 720 pixels wide. For 720 pixels wide, you used a 9 pixel wide character cell. 8 pixels is the natural width for a font (1 bit = 1 pixel), but now you need 9 pixels instead. So you make the last column blank, except for one specific case: Characters C0-DF. Those instead repeat the 8th pixel.
daltorak@reddit
Yep, and just to keep things fun and exciting for global-minded developers, the IBM PS/55 (a mid-80s ro early-90s Japanese-market version of the PS/2) had a 12x24 font mapped into a 12x29 space. The font couldn't even fit into the ROM (much larger number of glyphs than Latin languages), so IBM-compatible machines of the time had to use software rendering of text.
Dwedit@reddit
Japan had DOS/V, which provided software-rendered text modes that were really the 640x480 VGA graphics mode. Confusingly, it released around the same time as DOS 5.0, but the V is for VGA and not 5.
JEEZUS-CRIPES@reddit
Colloquially, many people understand ASCII through this context:
https://en.wikipedia.org/wiki/ASCII_art
mr_birkenblatt@reddit
They were originally "extended ASCII" before Unicode was a thing
SkoomaDentist@reddit
I'm reminded of people who complain that Windows NT used 16-bit characters instead of UTF-8 without realizing that the first unicode standard is only from 1991 and UTF-8 from 1993.
There is a lot of computing that predates unicode.
danielcw189@reddit
to be fair:
the wrote this at the end:
jesus_was_rasta@reddit
Text is the universal interface since Unix, and always will be :D
Valmar33@reddit
Except when it isn't ~ always the caveat. Excellent for human-readable config, not so great for structured logging where you want to store all of the related metadata for debugging.
Honestly, files are the universal interface for Unix / Linux ~ nearly everything is abstracted to a file. But not necessarily textually-based.
ebkalderon@reddit
Yup. Just look at how Unix also brought us the kitchen-sink
ioctl()API, for all those times where text and file manipulation isn't sufficient to accomplish a task.valarauca14@reddit
It isn't that text is insufficient as plan9/inferno and arguably even powershell + com has demonstrated. You can throw yacc/bison/peg/etc. at just about anything and get a reasonable AST to validate. Sure, it is overhead, but if we want a hackable system it isn't "a lot" of overhead.
It is more-so different problem usecases. Plain text for the shell works, as you can just throw
| sort -u k1on the end of a command's output to de-duplicate information.While on the other end, how does the "hackability" of plain text help me when configuring kTLS on a 400gbs card or de-duplicating identical extents on a BTRFS file system?
mechanicalpulse@reddit
Doubly so with Plan 9 from Bell Labs, where even the networking stack was file-based. It's nice to be able to use a single interface to get at everything, but the absence of typing can be a footgun.
One of the things I find interesting about PowerShell is typed objects in I/O. The
Where-Objectcommandlet is akin to a structured grep. There's a similar pattern now in the *nix world that can be attributed to the growing availability of JSON output from tooling and the ubiquity ofjqas a structured analogue of line-oriented unstructured text processing tools likeawkandsed.thats_a_nice_toast@reddit
I think a more interesting use case is taking notes. Writing things down in plain text feels liberating.
bzbub2@reddit
this topic could be explored much deeper than what was done here (just refers to ....diagramming, and links to web and gui tools for that), but, not a bad reminder
pydry@reddit
While true it's not a very interesting thing to point out.
g00berc0des@reddit
By extension neither was this comment pointing out that it wasn't very interesting, yet here we are. Adding a bit of entropy to the universe if only just to share opinions.
VictoryMotel@reddit
So controversial, so brave
holyknight00@reddit
It was always great to read and use, but it was a pain in the ass to create and maintain. Now we got it easier and made sense again.
RammRras@reddit
Saving this post. Both the article and the comments here are very helpful to me for my next personal project to have fun (and let it die after 2 weeks) ☺️
Ok_Issue_6675@reddit
Agreed :) - but plain text is not that plain. Actually very complex for many folks, especially when it comes to multi language and different symbols. In my opinion usually smart people use a lot of terminal work and plain text.