Is openpyxl still relevant?
Posted by petekindahot@reddit | Python | View on Reddit | 47 comments
I'm a college student, I've just learned pandas and I was planning to start freelancing with openpyxl, pandas and numpy. Wanted to try gigs like data cleaning or automation services. But as I searched about openpyxl, I read that it's used to work with 2010 excel sheets. And that's all.
So my question was is this module/library still relevant?
JackieChanX95@reddit
It unfortunately doesn’t support the full suite of XLSX/XLSM features but for simple tables should be fine. If u want to support the full feature set in Python world there is only xlwings
AlSweigart@reddit
Absolutely, for all the reasons others in these comments have given. But also keep in mind that Google Sheets and other spreadsheet apps almost certainly have a way to import and export to Excel's .xlsx format, which openpyxl works on.
sethclaw10@reddit
Still relevant. Pandas still uses openpyxl to read in modern Excel files.
mrbartuss@reddit
Now that we have Polars and DuckDB, the question is whether Pandas is still relevant
madness_of_the_order@reddit
Polars also uses openpyxl
sylfy@reddit
Polars is compatible with openpyxl, but it doesn’t use it as a default. Polars switched from xlsx2csv to calamine as the default engine. Openpyxl was the last choice fallback.
MRanse@reddit
What does polars use for Excel IO?
sylfy@reddit
Calamine through fastexcel as a default backend IIRC. Way faster than openpyxl.
EmbedStats@reddit
fastexcel for reading and xlsxwriter for writing
boat-la-fds@reddit
I don't know exactly but the excel extra includes fastexcel and openpyxyl.
xiviajikx@reddit
Damn how far out of the loop am I?
throwaway19293883@reddit
They are great, worth learning. Takes a little bit to learn polars syntax coming from pandas but it’s worth learning. I have some scripts that used to take several minutes to run that now finish in seconds.
Zouden@reddit
Pandas works perfectly fine for applications that don't need the performance of Polars, like what OP wants to do. So yes it's still relevant.
Kronologics@reddit
Oh yeah, everyone just dropped Pandas on release of Polars. There’s absolutely zero legacy code for that industry standard of decades. Completely irrelevant … /s
petekindahot@reddit (OP)
Alright thanks!!
Hotel_Arrakis@reddit
The world runs on Excel. Openpyxl is easy to learn and powerful. It's my go-to dealing with excel files. "It works on 2010 Excel files" is technically correct, but misses the point completely. Around 2010, Microsoft came out with the XLSX format. This format has not changed.
petekindahot@reddit (OP)
Ohh thanks, "The world runs on Excel" I've heard it a lot.
ninhaomah@reddit
And yet you ask this ?
petekindahot@reddit (OP)
Well I guess that was a dumb question, lol. How did you learn openpyxl? Like resources or yt videos?
ninhaomah@reddit
I am a working adult that has been in IT for 20 years before USB cables.
I just learn on the job as I need.
I learnt cloud with certs because there wasn't cloud when I was in school.
I am learning AI and Agents.
So I don't "learn".
My manager says here is excel file with list of users , compare with this list from this API and get me the users that exists in both or in one of the files.
I Google , get the code , update some parts , test , run , get the result and give it to my manager.
Then I forget about it move on to next project.
petekindahot@reddit (OP)
😶 okay. I'll look it up myself. Goodluck for future!
ninhaomah@reddit
No. It's due to experience and exposure.
For example , you said you heard excel runs the world many times. Yet you ask this. Why ?
I bet because you don't believe that in age of AI and Agents and Databases , people still use excel.
Did I guess correctly ?
petekindahot@reddit (OP)
You're right
ninhaomah@reddit
But that knowledge comes with experience , which you have none as of now.
Nothing wrong since you are still a student.
dr3aminc0de@reddit
Why are you laying into this kid?
ninhaomah@reddit
Laying ?
It's a fact. It's not knowledge issue. He is just a student so no experience.
Whats the issue here ?
I also had no exp and did stupid things and got into trouble when I started.
Facts are facts.
CaptainFoyle@reddit
What's your point? What are you contributing to the question OP has? A simple "yes" would have been ok, without your biography and life philosophy.
CaptainFoyle@reddit
Figuring out how to make something work is learning. Not everything "learning" means sitting your ass down with a book or documentation.
adamrees89@reddit
The world had usb 20 years ago, in fact it was implemented 30 years ago! Not to ignore the rest of your points, but you must have been exposed to USB if you’ve only been working for 20 years…
CaptainFoyle@reddit
Yeah
cgoldberg@reddit
I still use it
CodePalAI@reddit
relevant, yeah, pandas leans on it under the hood. for freelancing the honest bit: clients dont pay for openpyxl, they pay because their excel process is held together with manual copy-paste. learn it well enough to read messy real-world sheets (merged cells, weird headers) and you'll have work. if you're only reading, polars + fastexcel is faster, but openpyxl when you need to write formatting back.
big_data_mike@reddit
Yes. We still have an excel ingestion pipeline running on Python 3.8 and pandas 1.0. The files are small and execution time is <1 second so there is no need to move to something else.
Business runs on excel and it’s so ingrained it likely won’t go anywhere for a long time.
Vivid_TV@reddit
I used it twice in just yet days in an enterprise.
Pandas and openpyxl, it just works!
uniqueusername42O@reddit
Have you.. tried?
Oddly_Energy@reddit
If you run your code on a Windows computer with Excel installed, you may also want to take a look at the xlwings package. It uses your Excel as an "engine" for reading and writing Excel files. On large files, it is faster than openpyxl despite the extra overhead of running an Excel instance.
And it works in both directions, so you can call functions in your local python code through user defined functions in Excel. You can for example have an input table in an Excel workbook and have your python code generate new output in an output table as soon as you make a change in the output table.
The downside is that needs Windows and Excel, and that heuristic malware scanners sometimes flags it. I have had our IT department contact me once because they got an alert. So if you use xlwings for file reading and writing, you may need a fallback to openpyxl in your code.
Icy_Peanut_7426@reddit
Fastexcel is better if you’re just reading Excel data.
Try polars instead of Pandas.
petekindahot@reddit (OP)
I'm not just gonna be reading excel data, if I start doing data cleaning and similar things I'll probably be with json and CSV files as well
GrainTamale@reddit
I've only ever done "Excel" stuff in Python like twice. At least in my data world, all cleaning and ETL is done in polars.
runawayasfastasucan@reddit
Explore Polars and duckdb and be ready to get your mind blown.
oliver_extracts@reddit
openpyxl is fine. the xlsx format hasnt really changed in 15 years and theres no sign it will. pandas uses it under the hood anyway so youre already depending on it whether you know it or not. for freelance data cleaning and automation work you dont need fastexcel or polars, those are optimization tools and youre not going to hit the limits openpyxl + pandas have on any gig-scale dataset.
petekindahot@reddit (OP)
Ohh thanks this helps a lot
Intelligent-Cow341@reddit
I used it in a new app just this past week. The app is a personal productivity app for a piece of consulting work I am doing. I have the Excel export to give me a way of sharing the information captured and also as a backup.
AlexMTBDude@reddit
The way I would check is to look at the release history in PyPi.org, to see if it's s till being updated: https://pypi.org/project/openpyxl/#history
Double_Cost4865@reddit
I think it’s also the least bloated package that you can use to read named ranges in Excel, so worth learning. However, I work with projects that use over a hundred of input tables and find that reading actual Table objects is more reliable, easier and faster with fastexcel.
dayeye2006@reddit
The underlying data format xlsx is pretty defined. So I assume the new excel UI features are irrelevant
TallowWallow@reddit
Not a fata engineer, but I've used it in Pandas to generate excel data.