Today’s Data Science (DS) and Machine Learning (ML) have drastically grown in importance. In the Python ecosystem, the popularity of libraries and frameworks such as NumPy, Pandas, TensorFlow, SciPy, etc, shows this growth of interest. 

But while it is becoming easier to quickly prototype DS and ML applications, it’s an entirely different challenge to scale them up. This requires deep skills to best exploit (high-performance) devices capabilities such as multicore CPU or fast GPU. Considering that data scientists are not necessarily experienced software developers, it may be very complex to choose and assess the tools and techniques that enable such performance enhancement.

To fill this knowledge gap, we have proposed a survey that could be used as a practical reference tool for practitioners. We have focused on the Python language for obvious market share reasons. In particular, our study has focused on performance enhancement approaches based on the CPython interpreter but we discuss other specific interpreters made for high-performance Python implementations such as Pyston.

The full details of the work are available on our published article  : Landscape of High-performance Python to Develop Data Science and Machine Learning Applications, by Oscar Castro, Pierrick Bruneau, Jean-Sébastien Sottet and Dario Torregrossa, published in the ACM Computing Surveys. Keep reading for its main takeaways!

The survey

Firstly, we have identified three prototypical usage scenarios:

  • Vanilla Python development 
  • Projects integrating with ML frameworks (e.g., TensorFlow) 
  • Low-level and highly-intensive Python programs (i.e., relying on popular numerical libraries such as Numpy to solve canonical problems).

Then we have tried to evaluate the best acceleration approaches for each scenario. 

  • For vanilla scenarios, or when there is a need to deal with legacy code we suggest to use distributed memory and parallelization approaches such as MPI, OpenMP, task-based parallelisation, as well as program transformation and compilation (either semi or full automatic).
  • For the second type of project, we could go for parallelism and GPU exploitation
  • For the optimization of low-level canonical projects we have explored drop-ins or decorator or alternative implementations of NumPy, Pandas, SciKit-Learn libraries

A fourth option, when reimplementing the full project is possible, is to go for a reimplementations reusing specific frameworks (again, this will often force us to change the way of thinking and programming our way into the problem. 

We have evaluated for each tool, library and framework  belonging on those categories according to:

  •  their level of maturity, 
  • maintenance: development activity period (release on PyPi, activity on forges), maintainers ( enterprise, academic institution, individuals),
  •  targeted hardware (CPU; GPU, both), 
  • usage complexity: is it more or less intrusive in the code, requires tweak and does it have a high learning curve.
  • is it open-source
  • level of popularity  according to github stars and/or PyPi downloads.

The full results are available in the ACM survey article, available at

Some promising options are Cupy as GPU accelerated drop-in replacement of Numpy, Numba, a JIT approach to accelerate execution of (part of) code just providing annotation, and finally Nuitka, which is a transpiler approach generating highly optimized C++ code.

Note that we did not have deeply explored alternatives beyond the standard Python language/Interpreter due to its current popularity (and available libraries) in the addressed domains. This popularity has, however, potentially eclipsed other environments with more mature dependency management or promising high-performance approaches such as the Pyston interpreter or alternative languages such as Julia.

Acceleration of legacy code

Starting from scratch and dealing with a complete framework that enforces some development behavior or code design thinking that allows easiest acceleration are in some cases the best way to go for practitioners. 

However, in my own opinion, dealing with the acceleration of legacy code is also crucial (given the huge number of Python code already written) and certainly the most complicated task as manually rewriting from scratch is not really an option. So what are our options?

On the one hand, to be able to accelerate legacy code most of the popular approaches tend to be just replacements of existing libraries (called drop-ins) or annotation-based acceleration (e.g., Numba). While they offer quick local wins in terms of performance, practitioners rapidly face a significant number of corner cases where this “smooth” drop-in is not that straightforward.  Even drop-ins require adding or refactor code, for instance: memory transfer still has to be managed by hand from central to GPU. Sometimes it is even worse, as the refactoring requires a specific trick, and all the functions are not ported, some data types are not covered, etc.  

On the other side, transformations and compilers that target performance acceleration are complex pieces of engineering that require rigor and technical expertise but that, in their ultimate form, would completely avoid the need to refactor manually any piece of code and automatically transform any legacy code into a better, accelerated, one. Examples of transformations that are well known are those reducing the complexity of nested loops or function chaining. But, in practice, it is difficult to reach complete code coverage in terms of language constructs, libraries, etc.

This is why with this survey, we wanted to target not only practitioners but also tools designers highlighting the gaps and the open challenges for (Python) programs acceleration, including the potential of using AI-based techniques for this task. Looking forward to seeing the new developments in this field!!

Join our Team!

Follow the latest news on software development, especially for open source projects

You have Successfully Subscribed!

Share This