{"id":118942,"date":"2023-09-14T13:51:28","date_gmt":"2023-09-14T13:51:28","guid":{"rendered":"https:\/\/livablesoftware.com\/?p=118942"},"modified":"2023-09-14T13:51:28","modified_gmt":"2023-09-14T13:51:28","slug":"landscape-of-the-high-performance-python-ecosystem","status":"publish","type":"post","link":"https:\/\/livablesoftware.com\/landscape-of-the-high-performance-python-ecosystem\/","title":{"rendered":"Landscape of the High-Performance Python Ecosystem"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Today\u2019s Data Science (DS) and Machine Learning (ML) have drastically grown in importance. In the Python ecosystem, the popularity of libraries and frameworks such as NumPy, Pandas, TensorFlow, SciPy, etc, shows this growth of interest.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">But while it is becoming <\/span><b>easier to quickly prototype<\/b><span style=\"font-weight: 400;\"> DS and ML applications, it\u2019s an entirely different <\/span><b>challenge to scale<\/b><span style=\"font-weight: 400;\"> them up. This requires deep skills to best exploit (high-performance) devices capabilities such as multicore CPU or fast GPU. Considering that data scientists are not necessarily experienced software developers, it may be very complex to choose and assess the tools and techniques that enable such performance enhancement.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To fill this knowledge gap, we have proposed a survey that could be used as a practical reference tool for practitioners. We have focused on the Python language for obvious market share reasons. In particular, our study has focused on performance enhancement approaches based on the CPython interpreter but we discuss other specific interpreters made for high-performance Python implementations such as <a href=\"https:\/\/www.pyston.org\/\" target=\"_blank\" rel=\"noopener\">Pyston<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The full details of the work are available on our published article\u00a0 : <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3617588\" target=\"_blank\" rel=\"noopener\">Landscape of High-performance Python to Develop Data Science and Machine Learning Applications<\/a>, by <\/span><i><span style=\"font-weight: 400;\">Oscar Castro<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">Pierrick Bruneau<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">Jean-S\u00e9bastien Sottet<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">Dario Torregrossa<\/span><\/i><span style=\"font-weight: 400;\">, published in the <a href=\"https:\/\/dl.acm.org\/journal\/csur\" target=\"_blank\" rel=\"noopener\">ACM Computing Surveys<\/a>. Keep reading for its main takeaways!<\/span><\/p>\n<h2>The survey<\/h2>\n<p><span style=\"font-weight: 400;\">Firstly, we have identified three prototypical usage scenarios:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Vanilla Python development\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Projects integrating with ML frameworks (e.g., TensorFlow)\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Low-level and highly-intensive Python programs (i.e., relying on popular numerical libraries such as Numpy to solve canonical problems).<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Then we have tried to evaluate the best acceleration approaches for each scenario.\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For vanilla scenarios, or when there is a need to deal with legacy code we suggest to use distributed memory and parallelization approaches such as <\/span><span style=\"font-weight: 400;\">MPI, OpenMP, task-based parallelisation, as well as program transformation and compilation (either semi or full automatic).<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For the second type of project, we could go for parallelism and GPU exploitation<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For the optimization of low-level canonical projects we have explored drop-ins or decorator or alternative implementations of NumPy, Pandas, SciKit-Learn libraries<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">A fourth option, when reimplementing the full project is possible, is to go for a reimplementations reusing specific frameworks (again, this will often force us to change the way of thinking and programming our way into the problem.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We have evaluated for each tool, library and framework\u00a0 belonging on those categories according to:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">\u00a0their level of maturity,\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">maintenance: development activity period (release on PyPi, activity on forges), maintainers ( enterprise, academic institution, individuals),<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">\u00a0targeted hardware (CPU; GPU, both),\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">usage complexity: is it more or less intrusive in the code, requires tweak and does it have a high learning curve.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">is it open-source<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><a href=\"https:\/\/livablesoftware.com\/popularity-will-not-bring-contributions-oss-project\/\" target=\"_blank\" rel=\"noopener\">level of popularity<\/a>\u00a0 according to github stars and\/or PyPi downloads.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The full results are available in the ACM survey article, available at <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3617588\" target=\"_blank\" rel=\"noopener\">https:\/\/dl.acm.org\/doi\/10.1145\/3617588<\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Some promising options are <a href=\"https:\/\/cupy.dev\/\" target=\"_blank\" rel=\"noopener\">Cupy<\/a> as GPU accelerated drop-in replacement of Numpy, <a href=\"https:\/\/numba.pydata.org\/\" target=\"_blank\" rel=\"noopener\">Numba<\/a>, a JIT approach to accelerate execution of (part of) code just providing annotation, and finally <a href=\"https:\/\/nuitka.net\/\" target=\"_blank\" rel=\"noopener\">Nuitka,<\/a>\u00a0which is a transpiler approach generating highly optimized C++ code.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note that we did not have deeply explored alternatives beyond the standard Python language\/Interpreter due to its current popularity (and available libraries) in the addressed domains. This popularity has, however, potentially eclipsed other environments with more mature dependency management or promising high-performance approaches such as the <a href=\"https:\/\/www.pyston.org\/\" target=\"_blank\" rel=\"noopener\">Pyston<\/a> interpreter or alternative languages such as Julia.<\/span><\/p>\n<h2>Acceleration of legacy code<\/h2>\n<p><span style=\"font-weight: 400;\">Starting from scratch and dealing with a complete framework that enforces some development behavior or code design thinking that allows easiest acceleration are in some cases the best way to go for practitioners.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, in my own opinion, dealing with <\/span><b>the acceleration of legacy code is also crucial<\/b><span style=\"font-weight: 400;\"> (given the huge number of Python code already written) and certainly the most complicated task as manually rewriting from scratch is not really an option. So what are our options?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On the one hand, to be able to accelerate legacy code most of the popular approaches tend to be just replacements of existing libraries (called drop-ins) or annotation-based acceleration (e.g., Numba). While they offer<\/span><b> quick local wins<\/b><span style=\"font-weight: 400;\"> in terms of performance, practitioners rapidly face a <\/span><b>significant number of corner cases <\/b><span style=\"font-weight: 400;\">where this \u201csmooth\u201d drop-in is not that straightforward.\u00a0 Even drop-ins require adding or refactor code, for instance: memory transfer still has to be managed by hand from central to GPU. Sometimes it is even worse, as the refactoring requires a specific trick, and all the functions are not ported, some data types are not covered, etc.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On the other side, transformations and compilers that target performance acceleration are complex pieces of engineering that require rigor and technical expertise but that, in their ultimate form, would completely avoid the need to refactor manually any piece of code and automatically transform any legacy code into a better, accelerated, one. Examples of transformations that are well known are those reducing the complexity of nested loops or function chaining. But, in practice, it is difficult to reach complete code coverage in terms of language constructs, libraries, etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This is why with this survey, we wanted to target not only practitioners but also tools designers highlighting the <strong>gaps and the open challenges for (Python) programs acceleration<\/strong>, including the potential of using AI-based techniques for this task. Looking forward to seeing the new developments in this field!!<\/span><\/p>\n<span class=\"et_bloom_bottom_trigger\"><\/span>","protected":false},"excerpt":{"rendered":"<p>A survey on the libraries, frameworks and options to speed up the performance of your python programs.<\/p>\n","protected":false},"author":14,"featured_media":118943,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[47,15,117],"tags":[168,166,45,167],"_links":{"self":[{"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/posts\/118942"}],"collection":[{"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/comments?post=118942"}],"version-history":[{"count":4,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/posts\/118942\/revisions"}],"predecessor-version":[{"id":118950,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/posts\/118942\/revisions\/118950"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/media\/118943"}],"wp:attachment":[{"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/media?parent=118942"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/categories?post=118942"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/livablesoftware.com\/wp-json\/wp\/v2\/tags?post=118942"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}