Artificial intelligence is going to change the way software is developed. On the research side, we’ve seen many initiatives attempting to bring some of the benefits of AI and Machine Learning in particular to software development. Many of these papers are part of the curated repository “Awesome Machine Learning on Source Code”. ML can help in all phases of software development, not just at the code level. However, the combination of Machine Learning and Programming is where most of the works in the area are focusing right now.
While I fully agree that AI is going to massively impact how we develop software, I was especially interested in checking out whether, today, a regular programmer has already available some useful intelligent tools that use AI techniques to enhance his productivity.
So I searched for all so-called intelligent/smart “copilots” for developers and tried to accomplish a “simple” programming task with their help: opening and reading a local text file. This is something simple enough for them to be able to help and that, at the same time, a task complex enough (for me) to make me look it up online every time I have to do it (I never manage t remember the names of the classes to be combined for this).
Reading the description of some of the smart IDEs I found (“understands the world’s code and provides you with the right suggestion at the right time” , “analyzes all the code on the web and gives you fast, smart completions ordered by popularity”), it seemed that my little experiment would be a great success. But it wasn’t. In part because as I mention at the end, the quality of the training data plays a major role in the quality of the prediction and while many of the tools claim to be able to learn from Stack Overflow and other general public resources, it’s only when you train the tool with your own company data that results start to make sense.
Let’s dive in the tools I tried (in no particular order). Note that since the first version of this post (2017) I’ve added new tools to the list and moved some others to the list of dead tools.
Visual Studio IntelliCode
IntelliCode is a very recent addition to Visual Studio. Created by Microsoft itself, IntelliCode saves you time by putting what you’re most likely to use at the top of your completion list. IntelliCode recommendations are based on thousands of open source projects on GitHub each with over 100 stars. More specifically, Microsoft trains a base model on public code repositories and (optionally) a custom model based on the developer’s own code repositories to find patterns in API usage. “As of May 2019, IntelliCode uses over 14,000 total repositories”.
As you can see in this screenshot, IntelliCode was able to properly recommend the method I should use in the loop for reading the fie. Note how, among all the potential methods to be used at this point of the code, IntelliCode stars and puts as a first recommendation the one I need (the readLine).
According to Microsoft, IntelliCode is also able to merge the recommendations with the context of your code in order to tailor the completion list to promote common practices. IntelliCode isn’t limited to statement completion. Signature help also recommends the most likely overload for your context. And even more exciting, this post announces a new feature coming up soon: repeated edits. IntelliCode will be able to automatically detect when you’re repeating the same modifications on your code over and over again (kind of a manual refactoring) and suggest you an automatic application of those changes in all places of the code where a similar modification is required.
Kite defines itself as the plugin for your IDE that uses machine learning to give you useful code completions for Python.
In terms of features, it’s quite similar to some of the above tools. It is useful to give you inline documentation plus examples from its broad collection of samples that could be useful to you. This is what I got for my file reading example.
Until now, suggestions were based on the classes/methods referenced in your code but without taking into account the actual usage of those in the code. Latest versions improve this behavior by providing a new type Line-of-Code completions that incorporate context from code you’ve previously written to power smart completions up to a full line of code. Kite claims this improvement to reduce around 30-40% of the keystrokes needed to complete the code.
The following video is also a good explanation of how Kite works.
Codota promises to provide you with the right suggestion at the right time. Codota supports IntelliJ IDEA, Android Studio, and Eclipse and provides autocompletion for Java and Kotlin.
In contrast with the above solutions, it requires to keep you an open internet connection so that the plugin can communicate with Codota’s cloud to find the suggestions (only the minimal contextual information is sent).
A nice feature is that you can benefit from Codota even if you don’t have the plugin installed. Codota’s website allows you to search for code snippets from the web interface itself. See below what I got when trying to find examples using the BufferedReader class. Once you get the first set of results, you can refine the search to improve the accuracy. In this example, if I refine the search to look for examples that beyond BufferedReader use the readLine method from BufferedReader, I do get a nice example of iterating over a file.
There is also a version for teams which looks more promising and useful since it helps to detect which of your other team members have already written a similar piece of code (and who did it so that you can go and ask for help).
TabNine is the new kid on the block. Similar to the Google paper, it tries to predict the next “token” in the programming sequence based more on the patterns found in the training data set than on the previous code samples from the same user. This makes TabNine especially useful when programming repetitive/common code.
It supports over 20 programming languages and as long as you have enough samples available you can easily add more. You can see below a Java example.
TabNine is free but the business model revolves around a Cloud version of TabNine (still in beta) that you could contract to make sure the latency of the suggestions are low enough to make the programming experience with TabNine as smooth as possible.
Other smart assistants
PHPBot is a bot that pretends your virtual assistant when writing PHP code. Given the name of the PHP function, it will provide a set of usage examples for that function to help you understand how that function works. It’s more a replacement of the official PHP documentation than an intelligent system able to understand your code and suggest adapted examples for it.
Facebook AI has announced Aroma. Aroma is a code-to-code search and recommendation tool that uses machine learning (ML) to make the process of gaining insights from big codebases much easier. So, it’s more of an intelligent code searching tool than a smart autocompletion tool. Still, by showing examples of similar code to the one you’re trying to write (and assuming that the examples correspond to high-quality code as its part of your company’s codebase), the recommendation helps you to be faster but also to realize early on possible mistakes/refactoring opportunities like the missing exception handling in the example.
For now, Aroma is not available for the general public.
In case you’re wondering, Google does not yet have a tool for programmers but this doesn’t mean he is not doing some interesting work in this area. In fact, a team from Google Brain has recently published the paper Neural Networks for modeling source code edits where they train a network with millions of fine-grained edits from thousands of Python developers to predict future edits. Note that Google does not focus on the static aspects of the code but more on the code as a dynamic object that evolves over time.
Dead intelligent IDEs
Even if this field is rather new, we seem to have already the first casualties.
CodePilot.ai is more of an advanced search code engine. As they say, search is not a solved problem for software developers. It can search in your local environment or on StackOverflow or GitHub.
In my example, it quickly detected that I had already another piece of code using the same
Scanner class. For the moment, CodePilot is not using any kind of machine learning algorithm to provide more meaningful results so it basically shows all “hits” for the keywords you look for.
CodePilot started as a private company but a few months ago, its founder stopped working on the project and released it on GitHub. Technically speaking the project is not dead, just “open source”. But since that initial release, nobody else has contributed to the project so I’d consider it done.
Eclipse Code Recommender
Eclipse Code Recommender, aka the Intelligent development environment (their words),
is was an official Eclipse project powered by the CodeTrails company (as of today, this project appears as an archived project).
It offers a code completion feature for Java that sorts out the possible methods to use at a given point based on what other developers have done in the past. It uses Jayes, a pure Java, open-source Bayesian Network library, to power this code completion. As the Figure shows, in my reading loop, Eclipse Code Recommender suggests
next()as the first option.
The “learning” part uses code samples taken from the Eclipse Marketplace. Recommendations are provided for
org.eclipse.* APIs. There are plenty of recommendation options to tune the suggestions process. Submitting your own previous code (e.g. to “train” the system with your own coding style) requires a CodeTrails license. As an alternative, you can register (and use later on) code snippets.
CodeCorrect plugs into the StackOverflow API to find solutions to common errors in a developer’s code. It was the result of a TechCrunch Disrupt Hackathon and got some media attention but the project seems to remain at the “proof-of-concept” stage at the moment with no further updates in the last months.
Ai.codes aims to provide intelligent code completion for a number of languages. It’s developed by a team of AI researchers, compiler writers and software engineers. They released an IntelliJ plugin called “AI predictive coding” in 2016 as part of its public beta phase but it looks like they never really got out of beta.
Final thoughts on the state of Smart Coding IDEs
Based on this simple experiment, I’m confident to conclude that, so far, intelligent IDEs are more marketing than reality. I do believe that this kind of tools will progress a lot in the next years and may become a real virtual assistant for the programmers but we are far from there yet.
Also, they have very strong competition. It’s not enough they are useful, they need to prove they are faster and more accurate than the programmer searching himself for an answer online. If I search on Google for something like “open file in Java”, I get an answer from SO that directly provides a code sample that I can just copy&paste in my editor. You get similar results in Bing (even better, depending on the query, Bing even directly provides a code snippet showing how to do it, ready for you to copy to the IDE). Very difficult to beat this.
Also, note that the “intelligence” comes from trying to sort, rank,… the code samples the smart IDEs find in SO or GitHub, not so much on trying to understand the code you’re trying to write and provide contextual help for that. IMHO, this is the direction they should be heading to (and some start to go in this direction!). I want the IDE to help me write my code, not to help me find online a code sample that I’ll need to read, understand and adapt. Even better, if they are able to automatically adapt the samples they find to the id’s (and even coding style) I’m using in my code.
Moreover, as any ML approach, the quality of the outcome depends on the quality of the training data. If instead of training the system with Stack Overflow or rando GitHub data, you train it with your internal / private repositories, you’ll get much better results. If your company has enough internal repos to go ahead with this training, I think you could give these smart IDEs a go. Otherwise, better to wait a little bit longer!
In a more distant future, I guess we’ll also see some coder bots with whom you’ll be able to do some pair programming and chat about what is your goal regarding a specific method and let the chatbot find the best solution for you. In particular, I hope to see these editors going one step further from line completion. I’d like them to suggest pattern-based completions where they actually “get” what I’m trying to write and complete it for me. Not just the current line, or the next line but the complete loop or structure.