Microsoft open-sources "the earliest DOS source code discovered to date"

[-]

AykutSek@reddit

The OCR failure is the wildest part. Decades of ML progress and recovering this code still came down to humans reading paper printouts line by line.

And Quick and Dirty OS ending up as the foundation of modern Windows is one of those things that sounds made up but isn't.

[-]

SatansLoLHelper@reddit

In the late 90s we were scanning OCR at 99.5% accuracy. Luckily the software knows that it doesn't get the right word, and a human has to help. Is that a 0 or O. Logically it is 0rganized.

[-]

etancrazynpoor@reddit

You had some amazing OCR, as it was not my experience.

[-]

SatansLoLHelper@reddit

Over 4 years we went from 95% which is complete garbage and could barely help index files to 99.5. So I understand your pain.

The quality of the scans. We were scanning paper at 300dpi in greyscale. I think we were scanning microfilm at 3000dpi.

This is one of those I was working graveyard playing doom on the production computer for a million dollar xerox printer, and my boss asked if I could put a roll of microfilm on CD stories.

I didn't realize my budget was unlimited. I would have spent so much more.

[-]

GeneratedMonkey@reddit

You didn't scan microfilm in that dpi. It gets enlarged but the final output is going to be 300dpi for OCR. No OCR engines were trained on stuff that high in dpi.

[-]

SatansLoLHelper@reddit

Paper was 300dpi, microfilm was 3000dpi.

A roll of 2k pages on microfilm was 30 minutes at optimal speed to scan.

ocr, that is post production.

Scanning paper... There is a world of difference. Staples. Someo

[-]

GooberMcNutly@reddit

Even 99% accuracy is still one mistake per line. Bad with textual content, useless with code.

[-]

the only OCR that really bothers me is google books not knowing what a long s was. fomeone fhould really fet them ftraight about it. fimply maddening to read through fome 1800s text and every fingle long s is incorrect. fuch a pain in the afs.

[-]

gex80@reddit

Thank you, I hated reading that.

[-]

Media_Browser@reddit

Like reading Shakespeare’s folio a little discombobulating .

[-]

SatansLoLHelper@reddit

It took me a few seconds to figure out what you were saying. I am not even close enough to a linguist to properly get my S's right. I muddle my way through anglish.

[-]

cynicalkane@reddit

I'm having flashbacks to my Cambridge copy of John Locke. Why was it important in 2008 to preferve long Ss in a modern textbook? I wanted to read two treatises of government, not two treatifes of government.

[-]

amroamroamro@reddit

ending up as the foundation of modern Windows

im not sure there's much of dos foundations left ever since windows nt

[-]

phire@reddit

As far as I'm aware, NT is a reasonably clean break from DOS.

But to this day, you are not allowed to name a file CON, PTN, AUX, CLOCK$, NUL, COM1-0 or LPT1-9. Or any of those with an extension, like CON.txt.

Why, because DOS used those files as devices, just like /dev/* in unix. Except DOS 1.0 didn't support folders, so these magic device files ended up implicitly in every single subdirectory.

Windows NT inherited this because it inherited the dos shell (cmd.com) and support for .bat files which all use these magic files.

[-]

entreprenr30@reddit

It's wild I cannot name a folder "con" in Windows. Luckily "com" works, and that would have been even worse, since "com" is very common in Java package folder structures.

[-]

Chisignal@reddit

AUX

Wait, really? I have lots of aux directories in my projects, I'm fond of it in addition to stuff like vendor, opt, etc, etc - does that mean I couldn't open my project on Windows?

[-]

andrewpiroli@reddit

Kind of. CON, PRN, AUX, NUL, COM, LPT are all reserved names for Win32 applications. The filesystem and kernel support it just fine, and applications can access them using the file namespace \\?\ but not all APIs accept that and most applications do not do it at all. This notably includes explorer.exe, cmd.exe, and powershell.exe

So you can have an application that uses the correct API that can create and work with these files, but the regular Windows shell applications will totally fail with them. This is also one of the reasons that I still install 7-Zip even though they added .7z support into Windows. The 7-Zip File Manager supports namespaces and filename or paths that are >256 characters whereas Explorer still chokes.

[-]

Mahedros@reddit

I actually got bit by this a while back at my job. A co-worker on a mac named a file Prn.java (PRN is a medical abbreviation) and it made git pull completely fail on my windows laptop until they renamed it

[-]

amroamroamro@reddit

those reserved names were brought along for backward compatibility, but this is mostly enforced in windows shell applications, the underlying win32 api and file system allow you to bypass that parsing with a special prefix:

echo hello > \\?\C:\path\to\CON

https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#win32-file-namespaces

[-]

fluidtoons@reddit

That’s a good point- maybe replacing “modern Windows” with “early Windows” there would be more accurate

I remember being shocked hearing that VMS influenced NT…

Anyway, I loved DOS (even tried to write a shell for FreeDOS in high school). Shame all that knowledge is nearly useless these days, haha. I ended up getting more into Linux, thankfully

[-]

mallardtheduck@reddit

Shame all that knowledge is nearly useless these days

Those of us active in the "retrocomputing" hobby would respectfully disagree... Sure, it's a hobby rather than a profession, but you wouldn't call the knowledge of someone who, say. someone who works on vintage cars or steam locomotives "useless".

[-]

fluidtoons@reddit

Thank you for reminding me not everything is about making money, Mallard- honestly, I’ve just been really worried/stressed out about income for myself lately (used to be in tech, been trying to do art, hah)

But I apologize- retrocomputing knowledge is certainly not useless

One of my fave YouTube channels is LGR, and I was reading about RISC OS just yesterday. Someday I’d love to have an SGI machine set up… and like I said, I loved DOS (QBasic was my first programming language/tool, followed by Turbo C++)

[-]

dlg@reddit

I remember being shocked hearing that VMS influenced NT…

Add one to each ASCII character of VMS and you get WNT, or Windows NT.

✋ ⃤ 🤚

[-]

mallardtheduck@reddit

In a technical sense, sure, there's no DOS code left in modern (64-bit, NT-based) Windows. Although there are still some "principles" (e.g. drive letters) inhered from DOS (although drive letters were copied from CP/M, but anyway...).

In a business sense, DOS was the "foundation" that lead to Microsoft's dominance of the desktop OS market.

[-]

Frolo_NA@reddit

i mean linux was a hobby OS so it isn't that surprising.

[-]

olearyboy@reddit

Kernel

[-]

bionicjoey@reddit

(just a hobby, won’t be big and professional like gnu)

[-]

cynicalkane@reddit

Linux partially embraced and partially fell backwards into the worse-is-better principle, and that's why it won out.

(I personally prefer BSD, though.)

[-]

psinerd@reddit

I have a running joke at work about how to guarantee your project makes it into production: put one of the 4 magic words in the title: sandbox, playground, POC, or experimental.

[-]

ValuableKooky4551@reddit

The word "prototype" just means we use it in production from day 1.

[-]

Clitaurius@reddit

Bill made it to solve a quick and dirty problem (scheduling) to make some quick and dirty money. "And then we iterate right?" meme always has been.

[-]

happyscrappy@reddit

Modern OCR packages just really are not geared toward recognizing 8x8 or 9x9 fonts like were used on line and dot-matrix printers back then.

I was trying it myself for some perfectly formed low-res text (found in old video and screenshots) and the results surprised me.

I know it can be made to be very effective. As you say we have so much machine and ML to work with now. But the training and development just hasn't typically been in that direction.

[-]

SpaceCadet87@reddit

But you could so easily spin up a quick dirty OpenCV script if you wanted to use today's tech couldn't you?

[-]

tnoy@reddit

Some OCR engines will have specific modes for computer printouts.

From experience, the accuracy with scans of dot-matrix prints in Abbyy is significantly higher when you tell it to do so.

Same for if you're trying to OCR specific fonts like MICR E-13B or OCR-A

[-]

Effective_Hope_3071@reddit

I love that they dropped the Q and kept the D in quick and dirty lol

[-]

mallardtheduck@reddit

Back in that era, "DOS" (Disk Operating System) was a generic term for the software that allowed computers to use (floppy) disks (see, for example; Apple DOS, Atari DOS, TRSDOS, etc), so "QDOS" was a play on the existing term anyway.

[-]

roscoelee@reddit

It's always stayed dirty!

[-]

dlg@reddit

Why OS so messy

[-]

ChocomelP@reddit

There are some creative interpretations of this comment that get dark very quickly.

[-]

Expensive-Example-92@reddit

It's no longer quick, it's just dirty

[-]

Thundechile@reddit

The code is hosted on github, which may or may not be online currently. MS has problems with all "new" tech.

[-]

LittleLui@reddit

Hey, 79.99% has three nines!

[-]

Thundechile@reddit

LOL yeah. Learned yesterday that they infact don't report the outages correctly either, monitor may show green even though there were major outages in a service on a given day.

[-]

alxhu@reddit

That's why we use this one: https://mrshu.github.io/github-statuses/

[-]

Synaps4@reddit

FreeDOS developers going wild with excitement

[-]

albertowtf@reddit

Do they?

This seems kinda ultra late to the party. Everything that needed to be redone is probably redone by now

[-]

Cybertools4u@reddit

That’s the part I love about old computing history: the most important artifacts are often held together by paper, patience, and a few stubborn humans. It’s funny that we can train models on half the internet, but recovering foundational code still needed someone to sit there and read printouts like archaeology. Also, “Quick and Dirty OS” becoming part of the DNA of modern Windows is exactly the kind of messy origin story every big technology has. Clean brands usually start from very improvised beginnings.

[-]

ExplorerPrudent4256@reddit

The long-s problem is particularly nasty because it's not just one character - it affects every s in the document. Modern OCR is trained on contemporary fonts, so historical documents with distinct typographical features (long s, æ/oe ligatures, specific spacing) consistently trip it up. If you want to digitize really old texts, you basically need a model trained specifically for that era's typography, which most general-purpose OCR won't do.

[-]

idebugthusiexist@reddit

Ah, yeh. That feeling when you discover some code you wrote decades ago. It's useless to anyone now and you are kinda a bit embarrassed by it, but you just can't get yourself to delete it for some reason, so you archive it on GitHub anyways. Because why not

[-]