I’m looking for advice from people who have handled very large Excel/CSV imports in production systems.

Posted by just_human_right@reddit | learnprogramming | View on Reddit | 13 comments

Current requirement from my client:

Upload 3 Excel sheets

One sheet contains 150k+ rows

Another contains 40k+ rows

Data needs to be inserted into multiple relational tables based on hierarchical categorization

Each text field may also need translation into multiple languages (up to 15 languages, though realistically 3–5 most of the time)

The biggest challenge is making the import:

Reliable

Scalable

Consistent

Fault tolerant

User friendly with proper progress/error handling

Current approach:

Process/import data in chunks instead of loading the full file into memory

Import only the source language first (English)

Save core data into DB

Push translation processing into background jobs/queues (npm BullMq Package)

This works better than translating during import, but once the dataset becomes huge, even the background translation jobs become extremely heavy and time consuming.

I’d really like to hear how experienced engineers handle cases like this in real-world production systems.

Tech stack is mainly Node.js, Next js and MySQL for databse