What UNIX Pipelines Got Right (And How We Can Do Better)

[-]

FlyingRhenquest@reddit

One of the big challenges of "Doing better" is presenting data in a way that makes sense. IBM kinda tried to do it with OS/2, which used document object model (DOM) and the various objects in your GUI were actual objects that you could drag and drop and would interact with each other. I saw a couple of attempts by various developers to make use of that, but they never really worked that well and the idea kind of fizzled out. Gnome also tried to do it and I guess technically dbus does kind of try to implement it, but the only time I ever notice dbus is when it's making my emacs windows hang on startup (Which hasn't happened in a while, so they either fixed that or I'm not trying to use emacs in a way that causes the problem anymore.) Funnily, IBM, which put a lot of cutting edge stuff in OS/2, never really used any of those features in their own applications. A big portion of the company viewed OS/2 and intel processors as toys that would never multi-task as well as the mainframes do. Also funnily my phone now multitasks better than their mainframes did at the time, so suck it 1995! HAH!

But I digress. The closest I've come, personally, to "doing better" as you suggest in your article, was by implementing Python APIs to heavily-threaded C++ objects using Pybind11 and starting C++ threads in the same memory space from a Python interpreter process. The C++ threads would happily run in the background, unaffected by Python's global interpreter lock, and I could post requests for work to a queue in Python and the C++ code would pick it up and run it behind the scenes, making the results available in another queue for me when the tasks were completed. I think if I wrote some Imgui objects with that functionality, I could launch multiple graphical controls from the Python process. Some of the C++ objects I wrote could run rest services as well, and it was kind of neat to post something to a rest service in one window and see an array in Python get updated in real time.

Outside of that, there haven't really been any major advances in shells that I'm aware of since before threads made it into Linux, and that's really a shame. Changing environment variables in threaded programs is a pretty good way to crash your program and it would require some fundamental changes to the process model and C standard library to fix that. Easier to just implement your own thread-safe data store and stop using environment variables.

I'm not sure there's a great way to represent an execution graph that would mimic the simplicity of pipes in a text mode environment. It'd probably morph into a model where you start multiple agents to monitor things and they'd be capable of notifying you with graphical controls or pop-ups when something they were interested in happened. The down side of that model is that if one thing crashes in your thread space, the whole thing crashes (The Windows 3.1 problem.) So applications written to exist in such an environment would have to be a lot more careful about things like null pointer dereferences and array bounds overflows and all those fun things.

I think about this every few years and ponder trying to experiment with writing a shell and environment like that, but bash in a process with UNIX pipes really works well enough for me. Anything more complex and I reach for a programming language. Python might end up being my shell at some point, if I keep writing objects that it can interact with in C++ or other languages. Rust has a library similar to Pybind11, too, so maybe that will end up being the thing that finally bridges all the gaps and allows multiple languages to be used to develop applications that can actually work together (Kinda like Corba, DOM, COM/DCOM, RPC and REST were supposed to heh.)

[-]

jessepence@reddit

IBM's object model was SOM - System Object Model. The Document Object Model is what is used in browsers.

[-]

FlyingRhenquest@reddit

Hmm, I'd swear I heard "DOM" bandied about a lot when I was working there but to be fair that was 30 years ago and I'm probably misremembering. There have been a lot of object models over the years!

[-]

atheken@reddit

30 years ago would have been early Netscape days, I don’t recall web standards/the DOM at that time, but it’s possible there was just some overlap.

Mainly, this is just a reminder that 30 years ago was early internet… I’m tired, boss.

[-]

jessepence@reddit

"Level Zero DOM" or "D0M0" was implemented at the same time as JavaScript by Brendan Eich.

[-]

Mysterious-Rent7233@reddit

There was also a thing called OpenDoc that was related to SOM.

[-]

phycle@reddit

Take a look at GStreamer's pipeline syntax or FFMpeg's filter_complex. There you are specifying not just a pipeline, but a graph. There can be multiple parallel streams of data at once. Of course it all becomes rather hairy once you want to build a semi-complex graph.

[-]

FlyingRhenquest@reddit

Oh yeah, I'm familiar with ffmpeg's thing. I wrote a little C++ wrapper for ffmpeg that accomplishes something similar with boost::signals2 callbacks. I feel like my graphs need a bit more structure though, since they're kind of a pain in the ass to debug when something goes wrong. I need to clean up some of the objects in there -- didn't like using the boost state machine and signals for the player object, need better support for ownership semantics of ffmpeg frames and packets, that sort of thing. But in that version of the library I also started delving into sending compressed segments over the network, which would allow the execution graphs to make optimum use of CPU in a cloud setup.

I think I need to make the graph and the connections in the graph first class entities that can be used by a serialization manager to allow the objects to be defined with JSON. Then the graph could easily be displayed and manipulated with Javascript or imgui. That's getting more into the front end than I usually do, but I'm having fun with it!

[-]

Broccoli-stem@reddit

Can someone who read it tell us what the article is about, so that I don't have to go to this blog spam website

[-]

AyrA_ch@reddit

It's about how piping (progA | progB) was a good invention

[-]

Mysterious-Rent7233@reddit

And what should come next.

[-]

AyrA_ch@reddit

Powershell already does the next generation of data piping. It sends actual class instances between commands

[-]

blackkettle@reddit

and why it’s not good enough any more because OP can’t natively pipe JSON and spam it to a million child processes without tee (which they apparently also don’t know exists).

[-]

Mysterious-Rent7233@reddit

If you are curious about the content, why are you afraid to go to Substack?

[-]

throwaway490215@reddit

Lol at the "How we can do better".

The whole "lets type our pipes" spiel has been attempted at least 120 times, so unless you cite at least 5 and why this time is different, its not even worth talking about any ideas as vague as this.

Linear Topology Only

Skill issue. Writing a tee wrapper script to give you the syntax you desire takes all of 10 seconds for a LLM

Heavy Implementation

Heavy by what standard? Cross language requires this. Its cost are irrelevant to 99.999% of cases.

Constrained Error Handling

Design a better system that doesn't explode in complexity and can be written in text, and we can talk about it.

Once you've designed the syntax, it again takes only a few lines of python or any other language to impl it.

[-]

Downtown_Category163@reddit

PowerShell has typed pipes, you send objects through the pipe, even if that object is "line of text"

[-]

valarauca14@reddit

power shell does not have pipes, it pretends to have pipes.

It doesn't have processes. Everything is usually running in the same CLR processor, when other threads are created they are controlled by the pwsh.exe process.
Each process run serially one-by-one, with the pipe fully buffering the output of 1 action before the next starts.

As a direct result:

Powershell "pipes" cannot block the writer if the reader isn't keeping up, instead the pwsh.exe will just explode with an OOM error.
Powershell "pipes" do not allow for each job in the pipeline to execute in parallel.

The two bullet points are literally Unix v1 pipe features that Power Shell "pipes" do not support. They are not pipes.

[-]

flying-sheep@reddit

I think nushell has real pipes and objects.

[-]

valarauca14@reddit

nushell

Amusingly it has a better approach as it emulates pipes in userland, by copying output between processes. Which is an ugly hack as the per-process buffering doesn't work. This leads to a loss of inter-activity in a lot of cases (see: redirection-pipe label on their issue tracker).

This amusingly still isn't "pipes" as a first order OS construct, which Microsoft-Windows does offer (via the Named Pipe Stream API). Which systems like Cygwin & Mingw\d+ do use, to properly emulate Unix.

[-]

cat_in_the_wall@reddit

poweshell is like vader saying "if you only knew the power of the dark side". these days, you can run powershell on linux. you can join the dark side too.

once you embrace piping objects, any text based pipeline seems weak and archaic.

ok maybe you don't like .net. but it's a fully fledged environment, unlike the shell languages you're used to.

join us.

[-]

montibbalt@reddit

All I want for programming tools on Windows is a version of powershell that uses F# as its built-in scripting language

[-]

cat_in_the_wall@reddit

the syntax of powershell is, admittedly, ass. i can sing songs all day about how i think it is better than bash and friends in so many ways. but i can't say shit about syntax.

i personally think f# is hard to read, but it can't be any worse than powershell.

[-]

zacker150@reddit

Personally, I like the PowerShell syntax better. Bash syntax was created to save as many characters as possible. PowerShell syntax was written to be read.

[-]

throwaway490215@reddit

And it's good proof that even with a few billion and a mass buy in from the super popular OS developers, it creates a massive headache, a very high interoperability / extensibility price 'fixed' with a bunch of hacks, and no pay off worth mentioning.

[-]

flying-sheep@reddit

So far the strongest attempt I've seen for object pipes is https://www.nushell.sh

It's still a bit in flux (deprecations and removals happen every few months), but if you can live with that is petty great.

[-]

shevy-java@reddit

Once you've designed the syntax, it again takes only a few lines of python or any other language to impl it.

The basic syntax is indeed simple, but the question is what you can do with it. I like UNIX pipes but I rarely use them; most I do is e. g. "ls | more". For anything larger I tend to use ruby, simply because I find many classic UNIX tools to be crap. I use tons of aliases to ruby scripts, so I kind of have a "meta" environment for data operation here.

[-]

tesfabpel@reddit

Perhaps more importantly, pipelines broke the tyranny of synchronous execution. When cat writes to stdout, it doesn't block waiting for grep to process that data. The sender continues its work independently.

Except if the pipe's buffer is full, then cat should now block when writing to stdout, until grep reads from stdin...

[-]

shevy-java@reddit

I love the UNIX pipelines concept, but to be fair, it has to be said that a lot of that came due to the computers being limited.

The pipelines are a bit like method chaining in OOP - you have component A that is processed, then B, then C, aka object.foobar1.foobar2 and so forth, just with the | in use, and possibly different programs written in different languages. I feel that a natural evolution would be to have an object-oriented pipe system. Microsoft's powershell picked up some ideas but failed in other areas. The Elixir variant I kind of like, even though |> is not quite as succinct as |. But Elixir is, IMO, not really object-oriented (that is debatable but to me it just feels very different to ruby, despite the obvious similarities in code; I think Erlang makes some things really weird).

[-]

EnUnLugarDeLaMancha@reddit

It was not pipelines what brought "isolation". Isolation comes from processes, and they already existed before pipelines. What pipelines solved was the problem of "connecting" different processes: Before them, programs had to write the output to some file, then other process would read it, etc. Pipelines were invented simply to avoid the boilerplate of having to do all that, nothing else.

[-]

mark_99@reddit

It's also about efficiency. You avoid round trips to the filesystem, and each process can run concurrently.

[-]

valarauca14@reddit

Just FYI, before the PDP-11/45 & VAX-11 (circa ~1975) when a program wasn't running it was swapped to disk. There was no physical/logical memory address split, so every (userland) program running on a machine occupied the same physical & logical memory space.

Early PDP-11s did have memory protection, so the userland process couldn't stomp all over machine memory. But multi-tasking was a lot more primitive then you may think.

[-]

TrainsareFascinating@reddit

And before that, we had … overlays! Remember where Fortran COMMON came from?