How to optimize a Python function that sorts large datasets?
Posted by Secret-Bat-1683@reddit | learnprogramming | View on Reddit | 7 comments
So lets say I am trying to need to optimize a Python function that sorts large datasets. Currently my approach is too slow for datasets with hundreds of elements. How would you improve the function using a more efficient algorithm, like QuickSort or MergeSort, and why would it perform better?
sudomeacat@reddit
Another reason I haven’t seen is your sort comparator is expensive
The sort algorithms have their worst case complexity based on a O(1) comparator. However if the elements you’re sorting are annoying and need a custom comparator that runs in say O(M) time (i.e. based on the largest element of an unsorted list), then the complexity of say quick sort goes to O(NM*log(N)).
So if you are currently using your own BubbleSort, this would run in O((NM)^2) time.
So I would look at your comparator and try to optimize that if you aren’t allowed to change the sorting algorithm
numeralbug@reddit
Your approach is too slow for hundreds of elements? Is your computer powered by a hand crank?
teraflop@reddit
Just to be clear, when you say "sort", do you mean putting values into ascending/descending order (which is what programmers normally mean by "sorting") or something else?
If you are actually sorting, then there is rarely any reason to use anything except the built-in
sort()
andsorted()
functions, which as others have said should be instantaneous for small lists.hellbound171_2@reddit
I would start by giving us the full homework question, instead of pretending like this is part of a personal project
dmazzoni@reddit
Sorting "hundreds" of things should happen in the blink of an eye. It doesn't normally get slow until you hit millions.
The standard solution is to call Python's built-in sort method (or the sorted() function) and pass in your own comparator - e.g. you provide the method that determines whether one record is less than or greater than or equal, and Python uses that to do the sort, quickly.
Are you doing that, or did you write your own sorting function?
Whatever801@reddit
Can you share your code? Rolling your own sorting algorithm isn't gonna be the answer as built in python sorting is already very optimized. Most likely:
your code is doing something bad, which I can help identify if you share it. This is what I suspect to be the case given that you're only dealing with hundreds of datapoints
your code is already optimized, in which case multi-threading is the answer.
romple@reddit
2 may be the answer. It may also make things slower which is always a fun result.