As part of our research work, we periodically monitor GitHub to know what is being brewed and what is getting more attention (using stars as a proxy for attention / popularity) from the community.
During May 2022, we performed a quick pass on the top 25 starred repositories in the platform, and the results were surprising, starting by the fact that most of them were not software development projects.
We analyzed each repository and identified six main categories according to what the project is about. The 6 categories are: software, awesome list, books, study plan, algorithm collection and style guide. Software and books categories are self-descriptive. Awesome lists are becoming popular as well-known knowledge bases about a topic; study plans provide a curated list of resources to learn about a specific technology, commonly web technologies or programming languages; and algorithm collections and styleguides provide examples of high-quality code snippets and common practices when programming.
The following figure shows the distribution of these categories for the top 25 starred repositories in GitHub:
The list of repositories can be found here.
Other “surprises” we think are worth highlighting:
- Only 9 repositories are devoted to develop a software product. As can be expected, only well-known projects in software development (e.g., Vue, React, Tensorflow or Flutter) made it to the list. This means that more than half of the top projects in GitHub are NOT software projects!
- Awesome lists and study plan together have become the most popular kind of repositories in GitHub. Curiously enough, these repositories are not used to track code but documentation evolution. Regarding awesome lists, changes (and contributions) are just adding/removing elements to a list of items.
- Although most of them are in English, we start to see projects are in Chinese in the top 25 (e.g., Python-100-Days or JavaGuide), thus demonstrating the relevance of the platform in this country.
With these results, it is clear that GitHub has become a social platform beyond coding. Which we guess is good as we see more and more collaborative efforts taking place in the open. But this also opens new (research) questions:
- Does this increasing presence of non-software projects in GitHub affect efforts by the mining-software community that should always take the necessary precautions to filter out irrelevant projects when conducting massive analysis of GitHub data?
- Some projects can be sponsored (or have external open collective or Patreon sites set up). Great for their long-term sustainability. But this should go together with a transparent governance model to clarify where the money will go and who can benefit from it. Especially, when we know that more and more, non-technical contributors play a key role.
Would GitHub become the only platform to publish projects? Will then this mean that many of the marketing concerns in other platforms (e.g., SEO, algorithms to recommend related projects, etc.) will start to be very relevant for GitHub as they can impact what projects raise to the top (with the money-wise consequences this could bring)? Time will tell but we can assure you we’ll keep an eye on it!
(Featured image by Sunny studio – stock.adobe.com)
Associate Professor at Universitat Oberta de Catalunya and researcher at SOM Research Team, in Barcelona, Spain, he likes investigating on how software is developed, in particular how open-source software is developed and how people collaboratively drives the creation process. He has been working mainly in the area of programming & domain-specific languages, modeling, modernization and model-driven engineering.
Thanks for sharing this interesting post 🙂