What UNIX Pipelines Got Right (And How We Can Do Better)
Posted by ketralnis@reddit | programming | View on Reddit | 32 comments
Posted by ketralnis@reddit | programming | View on Reddit | 32 comments
FlyingRhenquest@reddit
One of the big challenges of "Doing better" is presenting data in a way that makes sense. IBM kinda tried to do it with OS/2, which used document object model (DOM) and the various objects in your GUI were actual objects that you could drag and drop and would interact with each other. I saw a couple of attempts by various developers to make use of that, but they never really worked that well and the idea kind of fizzled out. Gnome also tried to do it and I guess technically dbus does kind of try to implement it, but the only time I ever notice dbus is when it's making my emacs windows hang on startup (Which hasn't happened in a while, so they either fixed that or I'm not trying to use emacs in a way that causes the problem anymore.) Funnily, IBM, which put a lot of cutting edge stuff in OS/2, never really used any of those features in their own applications. A big portion of the company viewed OS/2 and intel processors as toys that would never multi-task as well as the mainframes do. Also funnily my phone now multitasks better than their mainframes did at the time, so suck it 1995! HAH!
But I digress. The closest I've come, personally, to "doing better" as you suggest in your article, was by implementing Python APIs to heavily-threaded C++ objects using Pybind11 and starting C++ threads in the same memory space from a Python interpreter process. The C++ threads would happily run in the background, unaffected by Python's global interpreter lock, and I could post requests for work to a queue in Python and the C++ code would pick it up and run it behind the scenes, making the results available in another queue for me when the tasks were completed. I think if I wrote some Imgui objects with that functionality, I could launch multiple graphical controls from the Python process. Some of the C++ objects I wrote could run rest services as well, and it was kind of neat to post something to a rest service in one window and see an array in Python get updated in real time.
Outside of that, there haven't really been any major advances in shells that I'm aware of since before threads made it into Linux, and that's really a shame. Changing environment variables in threaded programs is a pretty good way to crash your program and it would require some fundamental changes to the process model and C standard library to fix that. Easier to just implement your own thread-safe data store and stop using environment variables.
I'm not sure there's a great way to represent an execution graph that would mimic the simplicity of pipes in a text mode environment. It'd probably morph into a model where you start multiple agents to monitor things and they'd be capable of notifying you with graphical controls or pop-ups when something they were interested in happened. The down side of that model is that if one thing crashes in your thread space, the whole thing crashes (The Windows 3.1 problem.) So applications written to exist in such an environment would have to be a lot more careful about things like null pointer dereferences and array bounds overflows and all those fun things.
I think about this every few years and ponder trying to experiment with writing a shell and environment like that, but bash in a process with UNIX pipes really works well enough for me. Anything more complex and I reach for a programming language. Python might end up being my shell at some point, if I keep writing objects that it can interact with in C++ or other languages. Rust has a library similar to Pybind11, too, so maybe that will end up being the thing that finally bridges all the gaps and allows multiple languages to be used to develop applications that can actually work together (Kinda like Corba, DOM, COM/DCOM, RPC and REST were supposed to heh.)
jessepence@reddit
IBM's object model was SOM - System Object Model. The Document Object Model is what is used in browsers.
FlyingRhenquest@reddit
Hmm, I'd swear I heard "DOM" bandied about a lot when I was working there but to be fair that was 30 years ago and I'm probably misremembering. There have been a lot of object models over the years!
atheken@reddit
30 years ago would have been early Netscape days, I don’t recall web standards/the DOM at that time, but it’s possible there was just some overlap.
Mainly, this is just a reminder that 30 years ago was early internet… I’m tired, boss.
jessepence@reddit
"Level Zero DOM" or "D0M0" was implemented at the same time as JavaScript by Brendan Eich.
Mysterious-Rent7233@reddit
There was also a thing called OpenDoc that was related to SOM.
phycle@reddit
Take a look at GStreamer's pipeline syntax or FFMpeg's filter_complex. There you are specifying not just a pipeline, but a graph. There can be multiple parallel streams of data at once. Of course it all becomes rather hairy once you want to build a semi-complex graph.
FlyingRhenquest@reddit
Oh yeah, I'm familiar with ffmpeg's thing. I wrote a little C++ wrapper for ffmpeg that accomplishes something similar with boost::signals2 callbacks. I feel like my graphs need a bit more structure though, since they're kind of a pain in the ass to debug when something goes wrong. I need to clean up some of the objects in there -- didn't like using the boost state machine and signals for the player object, need better support for ownership semantics of ffmpeg frames and packets, that sort of thing. But in that version of the library I also started delving into sending compressed segments over the network, which would allow the execution graphs to make optimum use of CPU in a cloud setup.
I think I need to make the graph and the connections in the graph first class entities that can be used by a serialization manager to allow the objects to be defined with JSON. Then the graph could easily be displayed and manipulated with Javascript or imgui. That's getting more into the front end than I usually do, but I'm having fun with it!
Broccoli-stem@reddit
Can someone who read it tell us what the article is about, so that I don't have to go to this blog spam website
AyrA_ch@reddit
It's about how piping (
progA | progB
) was a good inventionMysterious-Rent7233@reddit
And what should come next.
AyrA_ch@reddit
Powershell already does the next generation of data piping. It sends actual class instances between commands
blackkettle@reddit
and why it’s not good enough any more because OP can’t natively pipe JSON and spam it to a million child processes without tee (which they apparently also don’t know exists).
Mysterious-Rent7233@reddit
If you are curious about the content, why are you afraid to go to Substack?
throwaway490215@reddit
Lol at the "How we can do better".
The whole "lets type our pipes" spiel has been attempted at least 120 times, so unless you cite at least 5 and why this time is different, its not even worth talking about any ideas as vague as this.
Skill issue. Writing a
tee
wrapper script to give you the syntax you desire takes all of 10 seconds for a LLMHeavy by what standard? Cross language requires this. Its cost are irrelevant to 99.999% of cases.
Design a better system that doesn't explode in complexity and can be written in text, and we can talk about it.
Once you've designed the syntax, it again takes only a few lines of python or any other language to impl it.
Downtown_Category163@reddit
PowerShell has typed pipes, you send objects through the pipe, even if that object is "line of text"
valarauca14@reddit
power shell does not have pipes, it pretends to have pipes.
pwsh.exe
process.As a direct result:
pwsh.exe
will just explode with an OOM error.The two bullet points are literally Unix v1 pipe features that Power Shell "pipes" do not support. They are not pipes.
flying-sheep@reddit
I think nushell has real pipes and objects.
valarauca14@reddit
Amusingly it has a better approach as it emulates pipes in userland, by copying output between processes. Which is an ugly hack as the per-process buffering doesn't work. This leads to a loss of inter-activity in a lot of cases (see:
redirection-pipe
label on their issue tracker).This amusingly still isn't "pipes" as a first order OS construct, which Microsoft-Windows does offer (via the Named Pipe Stream API). Which systems like Cygwin & Mingw\d+ do use, to properly emulate Unix.
cat_in_the_wall@reddit
poweshell is like vader saying "if you only knew the power of the dark side". these days, you can run powershell on linux. you can join the dark side too.
once you embrace piping objects, any text based pipeline seems weak and archaic.
ok maybe you don't like .net. but it's a fully fledged environment, unlike the shell languages you're used to.
join us.
montibbalt@reddit
All I want for programming tools on Windows is a version of powershell that uses F# as its built-in scripting language
cat_in_the_wall@reddit
the syntax of powershell is, admittedly, ass. i can sing songs all day about how i think it is better than bash and friends in so many ways. but i can't say shit about syntax.
i personally think f# is hard to read, but it can't be any worse than powershell.
zacker150@reddit
Personally, I like the PowerShell syntax better. Bash syntax was created to save as many characters as possible. PowerShell syntax was written to be read.
throwaway490215@reddit
And it's good proof that even with a few billion and a mass buy in from the super popular OS developers, it creates a massive headache, a very high interoperability / extensibility price 'fixed' with a bunch of hacks, and no pay off worth mentioning.
flying-sheep@reddit
So far the strongest attempt I've seen for object pipes is https://www.nushell.sh
It's still a bit in flux (deprecations and removals happen every few months), but if you can live with that is petty great.
shevy-java@reddit
The basic syntax is indeed simple, but the question is what you can do with it. I like UNIX pipes but I rarely use them; most I do is e. g. "ls | more". For anything larger I tend to use ruby, simply because I find many classic UNIX tools to be crap. I use tons of aliases to ruby scripts, so I kind of have a "meta" environment for data operation here.
tesfabpel@reddit
Except if the pipe's buffer is full, then
cat
should now block when writing to stdout, untilgrep
reads from stdin...shevy-java@reddit
I love the UNIX pipelines concept, but to be fair, it has to be said that a lot of that came due to the computers being limited.
The pipelines are a bit like method chaining in OOP - you have component A that is processed, then B, then C, aka object.foobar1.foobar2 and so forth, just with the | in use, and possibly different programs written in different languages. I feel that a natural evolution would be to have an object-oriented pipe system. Microsoft's powershell picked up some ideas but failed in other areas. The Elixir variant I kind of like, even though |> is not quite as succinct as |. But Elixir is, IMO, not really object-oriented (that is debatable but to me it just feels very different to ruby, despite the obvious similarities in code; I think Erlang makes some things really weird).
EnUnLugarDeLaMancha@reddit
It was not pipelines what brought "isolation". Isolation comes from processes, and they already existed before pipelines. What pipelines solved was the problem of "connecting" different processes: Before them, programs had to write the output to some file, then other process would read it, etc. Pipelines were invented simply to avoid the boilerplate of having to do all that, nothing else.
mark_99@reddit
It's also about efficiency. You avoid round trips to the filesystem, and each process can run concurrently.
valarauca14@reddit
Just FYI, before the PDP-11/45 & VAX-11 (circa ~1975) when a program wasn't running it was swapped to disk. There was no physical/logical memory address split, so every (userland) program running on a machine occupied the same physical & logical memory space.
Early PDP-11s did have memory protection, so the userland process couldn't stomp all over machine memory. But multi-tasking was a lot more primitive then you may think.
TrainsareFascinating@reddit
And before that, we had … overlays! Remember where Fortran COMMON came from?