Jupyter Notebook showing how to optimize Pandas operations with Index Optimization, Memory Optimization, and Vectorization.
Use timeit and lineprofiler to measure performance.
Techniques to optimize the performance of your pandas dataframe operations:
- Use
group_byinstead offilterfor categorical columns - Prefer
joininstead ofmergefor joining dataframes - Filter dataframes before joining them
- Use
inplaceoption to optimize memory usage - Use
vectorizationto improve speed of transformation operations
- https://www.youtube.com/watch?v=HN5d490_KKk
- https://www.youtube.com/watch?v=nxWginnBklU
- https://medium.com/analytics-vidhya/understanding-vectorization-in-numpy-and-pandas-188b6ebc5398
- https://towardsdatascience.com/five-killer-optimization-techniques-every-pandas-user-should-know-266662bd1163
- https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.07-Timing-and-Profiling.ipynb
- https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam
Made with ❤️ by Data Max