Microsoft open-sources "the earliest DOS source code discovered to date"
Posted by Choobeen@reddit | programming | View on Reddit | 54 comments
Old 86-DOS source code dates back to the time before Microsoft bought it.
April 30, 2026
AykutSek@reddit
The OCR failure is the wildest part. Decades of ML progress and recovering this code still came down to humans reading paper printouts line by line.
And Quick and Dirty OS ending up as the foundation of modern Windows is one of those things that sounds made up but isn't.
SatansLoLHelper@reddit
In the late 90s we were scanning OCR at 99.5% accuracy. Luckily the software knows that it doesn't get the right word, and a human has to help. Is that a 0 or O. Logically it is 0rganized.
etancrazynpoor@reddit
You had some amazing OCR, as it was not my experience.
SatansLoLHelper@reddit
Over 4 years we went from 95% which is complete garbage and could barely help index files to 99.5. So I understand your pain.
The quality of the scans. We were scanning paper at 300dpi in greyscale. I think we were scanning microfilm at 3000dpi.
This is one of those I was working graveyard playing doom on the production computer for a million dollar xerox printer, and my boss asked if I could put a roll of microfilm on CD stories.
I didn't realize my budget was unlimited. I would have spent so much more.
GeneratedMonkey@reddit
You didn't scan microfilm in that dpi. It gets enlarged but the final output is going to be 300dpi for OCR. No OCR engines were trained on stuff that high in dpi.
SatansLoLHelper@reddit
Paper was 300dpi, microfilm was 3000dpi.
A roll of 2k pages on microfilm was 30 minutes at optimal speed to scan.
ocr, that is post production.
Scanning paper... There is a world of difference. Staples. Someo
GooberMcNutly@reddit
Even 99% accuracy is still one mistake per line. Bad with textual content, useless with code.
knome@reddit
the only OCR that really bothers me is google books not knowing what a long s was. fomeone fhould really fet them ftraight about it. fimply maddening to read through fome 1800s text and every fingle long s is incorrect. fuch a pain in the afs.
gex80@reddit
Thank you, I hated reading that.
Media_Browser@reddit
Like reading Shakespeare’s folio a little discombobulating .
SatansLoLHelper@reddit
It took me a few seconds to figure out what you were saying. I am not even close enough to a linguist to properly get my S's right. I muddle my way through anglish.
cynicalkane@reddit
I'm having flashbacks to my Cambridge copy of John Locke. Why was it important in 2008 to preferve long Ss in a modern textbook? I wanted to read two treatises of government, not two treatifes of government.
amroamroamro@reddit
im not sure there's much of dos foundations left ever since windows nt
phire@reddit
As far as I'm aware, NT is a reasonably clean break from DOS.
But to this day, you are not allowed to name a file CON, PTN, AUX, CLOCK$, NUL, COM1-0 or LPT1-9. Or any of those with an extension, like CON.txt.
Why, because DOS used those files as devices, just like
/dev/*in unix. Except DOS 1.0 didn't support folders, so these magic device files ended up implicitly in every single subdirectory.Windows NT inherited this because it inherited the dos shell (cmd.com) and support for .bat files which all use these magic files.
entreprenr30@reddit
It's wild I cannot name a folder "con" in Windows. Luckily "com" works, and that would have been even worse, since "com" is very common in Java package folder structures.
Chisignal@reddit
Wait, really? I have lots of
auxdirectories in my projects, I'm fond of it in addition to stuff likevendor,opt,etc, etc - does that mean I couldn't open my project on Windows?andrewpiroli@reddit
Kind of. CON, PRN, AUX, NUL, COM, LPT are all reserved names for Win32 applications. The filesystem and kernel support it just fine, and applications can access them using the file namespace
\\?\but not all APIs accept that and most applications do not do it at all. This notably includes explorer.exe, cmd.exe, and powershell.exeSo you can have an application that uses the correct API that can create and work with these files, but the regular Windows shell applications will totally fail with them. This is also one of the reasons that I still install 7-Zip even though they added .7z support into Windows. The 7-Zip File Manager supports namespaces and filename or paths that are >256 characters whereas Explorer still chokes.
Mahedros@reddit
I actually got bit by this a while back at my job. A co-worker on a mac named a file Prn.java (PRN is a medical abbreviation) and it made git pull completely fail on my windows laptop until they renamed it
amroamroamro@reddit
those reserved names were brought along for backward compatibility, but this is mostly enforced in windows shell applications, the underlying win32 api and file system allow you to bypass that parsing with a special prefix:
https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#win32-file-namespaces
fluidtoons@reddit
That’s a good point- maybe replacing “modern Windows” with “early Windows” there would be more accurate
I remember being shocked hearing that VMS influenced NT…
Anyway, I loved DOS (even tried to write a shell for FreeDOS in high school). Shame all that knowledge is nearly useless these days, haha. I ended up getting more into Linux, thankfully
mallardtheduck@reddit
Those of us active in the "retrocomputing" hobby would respectfully disagree... Sure, it's a hobby rather than a profession, but you wouldn't call the knowledge of someone who, say. someone who works on vintage cars or steam locomotives "useless".
fluidtoons@reddit
Thank you for reminding me not everything is about making money, Mallard- honestly, I’ve just been really worried/stressed out about income for myself lately (used to be in tech, been trying to do art, hah)
But I apologize- retrocomputing knowledge is certainly not useless
One of my fave YouTube channels is LGR, and I was reading about RISC OS just yesterday. Someday I’d love to have an SGI machine set up… and like I said, I loved DOS (QBasic was my first programming language/tool, followed by Turbo C++)
dlg@reddit
Add one to each ASCII character of VMS and you get WNT, or Windows NT.
✋ ⃤ 🤚
mallardtheduck@reddit
In a technical sense, sure, there's no DOS code left in modern (64-bit, NT-based) Windows. Although there are still some "principles" (e.g. drive letters) inhered from DOS (although drive letters were copied from CP/M, but anyway...).
In a business sense, DOS was the "foundation" that lead to Microsoft's dominance of the desktop OS market.
Frolo_NA@reddit
i mean linux was a hobby OS so it isn't that surprising.
olearyboy@reddit
Kernel
bionicjoey@reddit
cynicalkane@reddit
Linux partially embraced and partially fell backwards into the worse-is-better principle, and that's why it won out.
(I personally prefer BSD, though.)
psinerd@reddit
I have a running joke at work about how to guarantee your project makes it into production: put one of the 4 magic words in the title: sandbox, playground, POC, or experimental.
ValuableKooky4551@reddit
The word "prototype" just means we use it in production from day 1.
Clitaurius@reddit
Bill made it to solve a quick and dirty problem (scheduling) to make some quick and dirty money. "And then we iterate right?" meme always has been.
happyscrappy@reddit
Modern OCR packages just really are not geared toward recognizing 8x8 or 9x9 fonts like were used on line and dot-matrix printers back then.
I was trying it myself for some perfectly formed low-res text (found in old video and screenshots) and the results surprised me.
I know it can be made to be very effective. As you say we have so much machine and ML to work with now. But the training and development just hasn't typically been in that direction.
SpaceCadet87@reddit
But you could so easily spin up a quick dirty OpenCV script if you wanted to use today's tech couldn't you?
tnoy@reddit
Some OCR engines will have specific modes for computer printouts.
From experience, the accuracy with scans of dot-matrix prints in Abbyy is significantly higher when you tell it to do so.
Same for if you're trying to OCR specific fonts like MICR E-13B or OCR-A
Effective_Hope_3071@reddit
I love that they dropped the Q and kept the D in quick and dirty lol
mallardtheduck@reddit
Back in that era, "DOS" (Disk Operating System) was a generic term for the software that allowed computers to use (floppy) disks (see, for example; Apple DOS, Atari DOS, TRSDOS, etc), so "QDOS" was a play on the existing term anyway.
roscoelee@reddit
It's always stayed dirty!
dlg@reddit
Why OS so messy
ChocomelP@reddit
There are some creative interpretations of this comment that get dark very quickly.
Expensive-Example-92@reddit
It's no longer quick, it's just dirty
Thundechile@reddit
The code is hosted on github, which may or may not be online currently. MS has problems with all "new" tech.
LittleLui@reddit
Hey, 79.99% has three nines!
Thundechile@reddit
LOL yeah. Learned yesterday that they infact don't report the outages correctly either, monitor may show green even though there were major outages in a service on a given day.
alxhu@reddit
That's why we use this one: https://mrshu.github.io/github-statuses/
Synaps4@reddit
FreeDOS developers going wild with excitement
albertowtf@reddit
Do they?
This seems kinda ultra late to the party. Everything that needed to be redone is probably redone by now
Cybertools4u@reddit
That’s the part I love about old computing history: the most important artifacts are often held together by paper, patience, and a few stubborn humans. It’s funny that we can train models on half the internet, but recovering foundational code still needed someone to sit there and read printouts like archaeology. Also, “Quick and Dirty OS” becoming part of the DNA of modern Windows is exactly the kind of messy origin story every big technology has. Clean brands usually start from very improvised beginnings.
ExplorerPrudent4256@reddit
The long-s problem is particularly nasty because it's not just one character - it affects every s in the document. Modern OCR is trained on contemporary fonts, so historical documents with distinct typographical features (long s, æ/oe ligatures, specific spacing) consistently trip it up. If you want to digitize really old texts, you basically need a model trained specifically for that era's typography, which most general-purpose OCR won't do.
idebugthusiexist@reddit
Ah, yeh. That feeling when you discover some code you wrote decades ago. It's useless to anyone now and you are kinda a bit embarrassed by it, but you just can't get yourself to delete it for some reason, so you archive it on GitHub anyways. Because why not
RumbuncTheRadiant@reddit
So... what was the difference between A(bort), R(etry), I(gnore)?
netuddki303@reddit
maybe the throwed error codes
mr_birkenblatt@reddit
the code
spline_reticulator@reddit
https://tenor.com/view/mhmm-reading-oh-oh-yeah-gif-14434817
this_knee@reddit
Fun, but also … yawn.