{"id":118942,"date":"2023-09-14T13:51:28","date_gmt":"2023-09-14T13:51:28","guid":{"rendered":"https:\/\/livablesoftware.com\/?p=118942"},"modified":"2023-09-14T13:51:28","modified_gmt":"2023-09-14T13:51:28","slug":"landscape-of-the-high-performance-python-ecosystem","status":"publish","type":"post","link":"https:\/\/livablesoftware.com\/landscape-of-the-high-performance-python-ecosystem\/","title":{"rendered":"Landscape of the High-Performance Python Ecosystem"},"content":{"rendered":"
Today\u2019s Data Science (DS) and Machine Learning (ML) have drastically grown in importance. In the Python ecosystem, the popularity of libraries and frameworks such as NumPy, Pandas, TensorFlow, SciPy, etc, shows this growth of interest.\u00a0<\/span><\/p>\n But while it is becoming <\/span>easier to quickly prototype<\/b> DS and ML applications, it\u2019s an entirely different <\/span>challenge to scale<\/b> them up. This requires deep skills to best exploit (high-performance) devices capabilities such as multicore CPU or fast GPU. Considering that data scientists are not necessarily experienced software developers, it may be very complex to choose and assess the tools and techniques that enable such performance enhancement.<\/span><\/p>\n To fill this knowledge gap, we have proposed a survey that could be used as a practical reference tool for practitioners. We have focused on the Python language for obvious market share reasons. In particular, our study has focused on performance enhancement approaches based on the CPython interpreter but we discuss other specific interpreters made for high-performance Python implementations such as Pyston<\/a>.<\/span><\/p>\n The full details of the work are available on our published article\u00a0 : Landscape of High-performance Python to Develop Data Science and Machine Learning Applications<\/a>, by <\/span>Oscar Castro<\/span><\/i>, <\/span>Pierrick Bruneau<\/span><\/i>, <\/span>Jean-S\u00e9bastien Sottet<\/span><\/i> and <\/span>Dario Torregrossa<\/span><\/i>, published in the ACM Computing Surveys<\/a>. Keep reading for its main takeaways!<\/span><\/p>\n Firstly, we have identified three prototypical usage scenarios:<\/span><\/p>\n Then we have tried to evaluate the best acceleration approaches for each scenario.\u00a0<\/span><\/p>\n A fourth option, when reimplementing the full project is possible, is to go for a reimplementations reusing specific frameworks (again, this will often force us to change the way of thinking and programming our way into the problem.\u00a0<\/span><\/p>\n We have evaluated for each tool, library and framework\u00a0 belonging on those categories according to:<\/span><\/p>\n The full results are available in the ACM survey article, available at https:\/\/dl.acm.org\/doi\/10.1145\/3617588<\/a><\/span><\/p>\n Some promising options are Cupy<\/a> as GPU accelerated drop-in replacement of Numpy, Numba<\/a>, a JIT approach to accelerate execution of (part of) code just providing annotation, and finally Nuitka,<\/a>\u00a0which is a transpiler approach generating highly optimized C++ code.<\/span><\/p>\nThe survey<\/h2>\n
\n
\n
\n