Is jupyter notebooks gonna become text based any time soon?
Posted by Consistent_Tutor_597@reddit | Python | View on Reddit | 42 comments
Hey guys. I used to work a lot with jupyter. But had to move on because .ipynb doesn't go very well in git and ai agents don't really work with them well for similar reasons.
Main culprit is not the notebook itself but .ipynb format. I understand that the notebook world evolved in inline outputs etc. But I think would be cool if .py based notebooks with #%% becomes first class citizen everywhere. There's a tool I used called jupytext which does that but it's bolted on and not native support.
The other tool I have heard about is marimo? I have never used it but it seems like it forces u to not redefine the same variable again. Which is unnatural in python. If python allows u to update a variable, ur notebook should too. But let me know what you guys think. And if there's potential for the data science world to move there anytime soon. I think most people have to explore in notebooks and then convert to py.
daffidwilde@reddit
I guess people use notebooks for reasons other than mine, but I think if there was to be a text file standard for Python notebooks then the text should be at the forefront. Something like the Rmd or Quarto format. I use notebooks for scratch work (often not under version control) or for presenting/teaching (with lots of markdown cells)
Marimo attempts what you’re looking for, but it comes with a very different philosophy to notebooks than Jupyter. Not being able to reuse a variable name is a constraint to allow other magic to happen reliably. Give it a go, I’d say!
kaddkaka@reddit
When to choose quarto and when to choose marimo?
daffidwilde@reddit
I used to use Quarto to do all the docs and reports for a postdoc project I worked on (in R). Rendering a .qmd file straight to a formatted Word document made everyone happy. If LLMs were a thing at the time, I probably would’ve automated the whole thing (shame!)
I’ve since used Quarto to do other Python doc sites, and used Jupyter for my tutorials because I needed the plugin ecosystem (metadata tags)
I’ve only had a cursory play around with Marimo, but I could see it being far more useful for deploying lightweight apps/dashboards
123_alex@reddit
Use Marino for a bit. You'll get used to not redefine variables and you'll never look back at jupyter.
Theprotagonist5@reddit
there's already a path for this - jupytext lets you sync .ipynb files to plain .py or .md files automatically. you get git-friendly diffs without giving up the notebook ui.
marimo is also worth a look. newer notebook format that's natively a python script, reactive cells, plays nice with version control.
the core .ipynb format is json and the jupyter team hasn't really signaled moving away from it. too much tooling depends on it at this point.
i use jupytext on anything i want tracked in git - data pipelines, scraping notebooks, stuff hitting external apis. i do a lot of web data work so i'm also running resid͏ential pro͏xies (geo͏node, around $5/gb) and keeping everything as synced .py files makes the pipeline way easier to actually review in PRs.
Feuermurmel@reddit
Have you tried this solution? https://stackoverflow.com/a/73218382
It adds a Git filter that'll leave your .ipynb files as is, but will omit the output cells from what is checked into the Git repository. You're left with the text-based JSON notebook files in Git.
d4njah@reddit
Jupytext mate
CaptainFoyle@reddit
Read the post mate
d4njah@reddit
fair call rushed to read it - but jupytext can be installed as a add on for jupyterlab etc. update the git repo to block all ipynb notebooks forcing users to only use jupytext formats to commit code. Seems like there's a need for better CI/CD pipelines for OP's team.
Ok-Management-1760@reddit
Look into jupytext
https://jupytext.readthedocs.io/en/latest/
It will do what you need to do.
CaptainFoyle@reddit
Read the post before responding
flixflexflux@reddit
OP wrote that
Toby_Wan@reddit
I prefer marimo now
funkdefied@reddit
I’m a huge fan of Marimo
Wh00ster@reddit
What are you on about? It is text
CaptainFoyle@reddit
The file obviously. Change a line, and the whole binary file gets re-created in git.
Same as a word file is not really text
py_curious@reddit
nteract 2 is built specifucally to work with agents. You share a notebook with an agent like a pair programmer, or just let the agent build the notebook in a headless session while you tell it what to do and it shows you the results as it builds.
I really like it. I built an agent with Anaconda Agent Studio, added the nteract plug-in and watched the agent as it created new cells, edited existing ones, ran things and chatted with me about what was happening.
https://www.nteract.io/ https://github.com/nteract/nteract
For transparency, my colleagues at Anaconda are contributing to this project so yes, I know them and I want the project to succeed because of that but also because it's genuinely solving some problems with how agents work with notebooks.
j_hermann@reddit
There is NB-Convert to remove output cells from your Notebooks for Git and also to convert into practically any format you can dream of.
franzperdido@reddit
Mystmd is great! It's by the same team!
IAmASquidInSpace@reddit
It is kinda strange that, with how popular Markdown has become for documentation, no one has made any tool that allows using Markdown with fenced code blocks as cells in Jupyter. Seems like a perfect application, but then again, I guess it's not as easy as I am making it sound.
drbobb@reddit
This is supported in marimo.
IAmASquidInSpace@reddit
Oh, that I didn't know! Gonna have to have a look.
runawayasfastasucan@reddit
Check out quart, qmd, if I interpret you right.
IAmASquidInSpace@reddit
Yeah, that's pretty much what I mean. Neat! Thanks!
runawayasfastasucan@reddit
I meant Quarto btw, sorry! But yeah those are pure text files with code blocks.
You could also do nbconvert to convert an jupyter notebook to a .py file, this can be done as a pre commit hook or something like that.
drbobb@reddit
https://docs.marimo.io/guides/exporting/markdown/
drbobb@reddit
See also https://github.com/marimo-team/marimo/blob/main/marimo/_tutorials/markdown_format.md
IAmASquidInSpace@reddit
Both are just framework-specific solutions, not out-of-the-box solutions/support for existing Markdown docs or the general format of Markdown docs (like used by e.g. MkDocs or Zensical). But that's what I was thinking of.
drbobb@reddit
Both? Both docs are about the same thing.
Incidentally I found that's it's easier to teach an LLM to write marimo notebooks in the markdown format than in the .py format, it's basically enough to feed it the above tutorial.
sowenga@reddit
Check out Quarto markdown maybe. You write a .qmd file and then render it to some output format of choice (which could eg be GitHub flavored markdown).
IAmASquidInSpace@reddit
That is indeed what I thought of, pretty cool! Thanks for the link!
justneurostuff@reddit
there are tons of tools allowing markdown with fenced code blocks. This is even possible in Jupyter with Jupytext or other extensions.
IAmASquidInSpace@reddit
Tons? People have suggested three so far, and one of them doesn't even do what I described. I wouldn't call that "tons"...
justneurostuff@reddit
Okay...more than 5? I guess I say tons because that feels like a lot.
liimonadaa@reddit
I've mostly switched to quarto notebooks
pandongski@reddit
This. And with vs code, it even runs cells in the Jupyter interactive repl
_redmist@reddit
Marimo is pretty great.
I will say, the not reusing variables is at times a mild annoyance but nevertheless i would recommend it.
HungrySecurity@reddit
Actually, .ipynb is a text format. It's essentially just JSON under the hood. But if you're looking for something closer to standard Markdown, you might want to check out RMarkdown or Quarto.
morganpartee@reddit
This, and agents work fine with Json
justneurostuff@reddit
I feel like it defeats the point of the format. The cool thing about jupyter notebooks is that outputs are embedded inside. There are already many, many existing formats without this feature.
AverageComet250@reddit
Gonna be honest, never had a problem with notebooks in git. And 95% of the time you're not actually after version control w/ them, so just add the file and write a nonsense commit msg - you can automate using a hash of the datetime - or use syncthing or a netshare instead.
I have 0 experience using them w/ AI tho. I don't see why the format would be bad other than needed lots of tokens but I'm sure there's a reason for your problems.
Consistent_Tutor_597@reddit (OP)
It's not easy when multiple people edit the same notebook. We were for some time even running notebooks with papermill for jobs. Because it was much easier to maintain for analytical stuff.