Plain text has been around for decades and it’s here to stay.

[-]

Valmar33@reddit

"Plain text" is not some "simple" thing. There's more to the world than ASCII. There's other languages to deal with, requiring the unfortunately necessary monstrosity of complexity that is the UTF-8 standard. Besides that, "plain text" is just another form of binary. For a processor, "plain text" is just more bundles of bits.

[-]

Smallpaul@reddit

How does this comment respond to the article?

[-]

Humble-Vegetable9691@reddit

There is no plain text. There is always a thing called encoding. It is how my CP 852 lost the double box drawing characters.

[-]

Smallpaul@reddit

Plain text is an abstraction above text encoding and below markup languages, programming languages, text based protocols, etc.

It is an abstraction with multiple instantiations due to character set and encoding.

[-]

Humble-Vegetable9691@reddit

You are still thinking in EN-US.

[-]

Smallpaul@reddit

Name a country where what I said is not true and explain how, with concrete examples.

[-]

Humble-Vegetable9691@reddit

Open https://www.ascii-codes.com/ in one tab

Open https://www.ascii-codes.com/cp852.html in an other tab

go to the bottom of both pages

Check that your """plain""" 185 looks like my 185? 189? 198?

Trust me, your """plain""" text boxes with combined single and double boxes are looking quite disgusting on my screen.

Also, "E8329BFD4697D9EC37" is hellohello but in classic SMS PDU (7-bit alphabet). It is quite plaintext regarding the long accented letters of my alphabet ;) (look at 235 on the CP 852 linked above)

[-]

Valmar33@reddit

This got to me a little

There is a certain power and longevity of monospace plain text that’s worth celebrating – not just because the file format is portable, but because text editing as interface is so well-known and potent.

Potent? Eh...

[-]

tav_stuff@reddit

UTF-8 is not a ”monstrosity of complexity” – it’s actually very simple. The complexity comes from Unicode

[-]

claytonkb@reddit

I adapted a UTF-8 library for use in my own C string processing library... it was a breeze. UTF-8 is refreshingly straightforward in an ocean of character "standards" that seem to be ever-shifting...

[-]

tav_stuff@reddit

Yeah it’s really great. I remember writing UTF-8 code for the first time in C and being amazed at how easy it was.

Then I wrote a Unicode library for C… and oh boy was that a rabbit hole. Took me 2 months to go through everything, and it was crazy complicated

[-]

sacules@reddit

Yeah utf8 is so simple it fit into a napkin :) thanks to ken thompson and rob pike for that implementation.

[-]

Ameisen@reddit

I mean, a lot of very complex stuff fits on a napkin.

Most people don't really understand quaternions, yet the quaternion formulas fit on a small napkin.

[-]

levir@reddit

You could write Maxwell's equations on a napkin, describing the electro-magnetic force in it's entirety.

[-]

sacules@reddit

Yeah they were iterating on how to implement this and the solution seemed to fit into a napkin they forgot at a restaurant.

[-]

squigs@reddit

I do wonder whether we'd have come up with something a little tidier than UTF-8 if we'd planned variable length encodings earlier.

[-]

Enerbane@reddit

HUH??

I'm having a hard time with this. I'm sure there's probably a point in here that can be defended but...

What exactly do you think the "U" in UTF stands for??

Unicode Transformation Format is very simple but Unicode is complex?

[-]

Tau-is-2Pi@reddit

UTF-8 only cares about encoding the Unicode codepoints into 1 to 4 bytes. It's basically just a variable-width integer format. It's very straightforward.

[-]

boa13@reddit

Exactly. The Transformation Format is well thought-out and rather simple. What it transforms, Unicode entities, is very complex.

[-]

locoluis@reddit

Yes. UTF-8 is just an algorithm used to transform Unicode characters to sequences of 8-bit bytes.

[-]

Valmar33@reddit

UTF-8 is not a ”monstrosity of complexity” – it’s actually very simple. The complexity comes from Unicode

Ah, I might have mixed them up. Cheers. :)

[-]

FullPoet@reddit

There's more to the world than ASCII.

How to spot OP is from USA :)

[-]

danielcw189@reddit

They clarified what they mean with "ASCII" at the end of the article

[-]

meganeyangire@reddit

I bet OP didn't even have to switch a keyboard layout even once in their entire life.

[-]

neuralbeans@reddit

If only it were simple! With all the different encodings, line endings, and byte orders.

[-]

Valmar33@reddit

If only it were simple! With all the different encodings, line endings, and byte orders.

Yeah, computers are not "simple", once you look past the layers of abstractions.

It's a miracle anything works even half-decently.

[-]

neuralbeans@reddit

A classical example of multiple competing standards surviving longer than necessary.

[-]

programming-ModTeam@reddit

Your post or comment was removed for the following reason or reasons:

Your posting was removed for being off topic for the /r/programming community.

Listicles are not allowed on /r/programming.

[-]

mmaldacker@reddit

the idea of plain text is to be easily pipenle in unix, this is just stylised UI

[-]

daltorak@reddit

That's not "ASCII". Box drawing symbols are not ASCII characters. They were part of code page 437, and today are Unicode codepoints.

And a dirty little secret of modern terminal programs is that these symbols have a distinct drawing path. They don't use the symbols built into the font, otherwise you'd get gaps between lines because of line-height & zoom settings.

Example: Microsoft Windows Terminal Gets Glyph Rendering Improvements | Extremetech

[-]

xitiomet@reddit

Code page 437 was often branded as "Extended ASCII", which while incorrect, was the most common use for 128-255. I'd say its forgivable to call the original box drawing symbols ASCII.

[-]

otac0n@reddit

You have to pick something for the top half. This is pretty standard. More than excusable, this is common usage.

[-]

lerliplatu@reddit

Taking that argument to its extreme, UTF-8 is ASCII.

[-]

KevinCarbonara@reddit

Except not at all. That's not the same argument. That's an entirely different argument you've tried to shoehorn in.

[-]

KevinCarbonara@reddit

That's not "ASCII". Box drawing symbols are not ASCII characters.

Previously known as "high ASCII".

It's completely fine to call it ASCII

[-]

Dwedit@reddit

Even the original text mode hardware had a distinct drawing path for box drawing characters.

For 80 column text, you either had the screen resolution at 640 pixels wide, or 720 pixels wide. For 720 pixels wide, you used a 9 pixel wide character cell. 8 pixels is the natural width for a font (1 bit = 1 pixel), but now you need 9 pixels instead. So you make the last column blank, except for one specific case: Characters C0-DF. Those instead repeat the 8th pixel.

[-]

daltorak@reddit

Yep, and just to keep things fun and exciting for global-minded developers, the IBM PS/55 (a mid-80s ro early-90s Japanese-market version of the PS/2) had a 12x24 font mapped into a 12x29 space. The font couldn't even fit into the ROM (much larger number of glyphs than Latin languages), so IBM-compatible machines of the time had to use software rendering of text.

[-]

Dwedit@reddit

Japan had DOS/V, which provided software-rendered text modes that were really the 640x480 VGA graphics mode. Confusingly, it released around the same time as DOS 5.0, but the V is for VGA and not 5.

[-]

JEEZUS-CRIPES@reddit

Colloquially, many people understand ASCII through this context:

ASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable (from a total of 128) characters defined by the ASCII Standard from 1963 and ASCII compliant character sets with proprietary extended characters (beyond the 128 characters of standard 7-bit ASCII).

https://en.wikipedia.org/wiki/ASCII_art

[-]

mr_birkenblatt@reddit

They were originally "extended ASCII" before Unicode was a thing

[-]

SkoomaDentist@reddit

I'm reminded of people who complain that Windows NT used 16-bit characters instead of UTF-8 without realizing that the first unicode standard is only from 1991 and UTF-8 from 1993.

There is a lot of computing that predates unicode.

[-]

danielcw189@reddit

to be fair:

the wrote this at the end:

(Caveat: These tools are “ASCII” in a colloquial sense, the same way people use “GIFs” to refer to a certain category of looping animations.)

[-]

jesus_was_rasta@reddit

Text is the universal interface since Unix, and always will be :D

[-]

Valmar33@reddit

Text is the universal interface since Unix, and always will be :D

Except when it isn't ~ always the caveat. Excellent for human-readable config, not so great for structured logging where you want to store all of the related metadata for debugging.

Honestly, files are the universal interface for Unix / Linux ~ nearly everything is abstracted to a file. But not necessarily textually-based.

[-]

ebkalderon@reddit

Yup. Just look at how Unix also brought us the kitchen-sink ioctl() API, for all those times where text and file manipulation isn't sufficient to accomplish a task.

[-]

valarauca14@reddit

It isn't that text is insufficient as plan9/inferno and arguably even powershell + com has demonstrated. You can throw yacc/bison/peg/etc. at just about anything and get a reasonable AST to validate. Sure, it is overhead, but if we want a hackable system it isn't "a lot" of overhead.

It is more-so different problem usecases. Plain text for the shell works, as you can just throw | sort -u k1 on the end of a command's output to de-duplicate information.

While on the other end, how does the "hackability" of plain text help me when configuring kTLS on a 400gbs card or de-duplicating identical extents on a BTRFS file system?

[-]

mechanicalpulse@reddit

Honestly, files are the universal interface for Unix / Linux ~ nearly everything is abstracted to a file. But not necessarily textually-based.

Doubly so with Plan 9 from Bell Labs, where even the networking stack was file-based. It's nice to be able to use a single interface to get at everything, but the absence of typing can be a footgun.

One of the things I find interesting about PowerShell is typed objects in I/O. The Where-Object commandlet is akin to a structured grep. There's a similar pattern now in the *nix world that can be attributed to the growing availability of JSON output from tooling and the ubiquity of jq as a structured analogue of line-oriented unstructured text processing tools like awk and sed.

[-]