High Performance Python



Data Scientists often have large datasets and powerful hardware at their disposal. However, the excitement of fast computation in Python slows against a steep learning curve. This talk will build your confidence and intuition around high performance computing with Python. We step through a complete example while also covering the core concepts so you can generalize to your own work. Description
An example data science pipeline with numpy and pandas
Common heuristics for when to accelerate your code
Quick survey of common approaches
An example data processing pipeline with numpy
How to accelerate on a single machine with Numba
Brief introduction to Numba
Quick comparison to cython
Accelerating our example pipeline with numba
How to distribute on a cluster with Numba and Dask
Brief introduction to Dask
Quick comparison to PySpark, Ray
Accelerating our example pipeline with numba and dask
How to accelerate and distribute with Numba, Dask, and Rapids
Brief introduction to Rapids & GPUs
Quick comparison to other GPU computing methods
Accelerating our example pipeline with numba, dask, and rapids
Conclusion
Review of performance gains
Summary of when to apply each to your project
Where to find hardware and example costs for various pipelines and data volumes PUBLICATION PERMISSIONS:
PyData provided Coding Tech with the permission to republish PyData tech talks. CREDITS:
PyData YouTube channel: https://www.youtube.com/c/PyDataTV/videos https://www.youtube.com/watch?v=HhV6yzKXqFU

Leave a Reply

Your email address will not be published. Required fields are marked *