The role of non-tech/non-coding contributors in Open Source Software (OSS) is poorly understood. Most of current research around OSS development focuses on the coding aspects of the project (e.g., commits, pull requests or code reviews) while ignoring the potential of other types of contributions. Often, due to the assumption that these non-tech contributions are not significant in number and that, in any case, they are handled by the same people that are also part of the “coding team”.
This paper aims to investigate whether this is actually the case by analyzing the frequency and diversity of non-coding contributions in OSS development. Is the number of non-coding contributors indeed significant, or OSS projects are still very much “code-driven”? And, what about the contributor diversity?, i.e., are these contributors only collaborating to the project in these non-coding roles, or are they just the same group of core developers taking all types of roles depending on the needs of the project? Or, are these non-coding contributors migrating to coding roles?. These are some of the questions we have studied in the paper: On the analysis of non-coding roles in open source development led by Javier Cánovas and published in the Empirical Software Engineering Journal.
For each project we have classified all project actions and members based on a precise definition of possible contribution roles in the GitHub platform, and compute several metrics related to role compositions, diversity and evolution. Projects were analyzed with SourceCred. SourceCred is able to analyze GitHub repositories and build a collaboration graph, where nodes represent assets of the repository (e.g., users, comments, issues, pull requests, etc.) and edges represent relationships among those (e.g., a user authors a commit, a comment belongs to an issue, etc.).
Among other results, our analysis of the data in these collaboration graphs reveals:
- A significant number of non-coding activities (e.g., opening or commenting on issue requests, or reacting to other’s contributions) in all projects.
- A deeper analysis also shows that these activities are usually performed by people not involved in coding roles, uncovering the presence and importance of dedicated non-coding contributors in OSS.
- Contributors with single non-coding roles prevail, revealing a high community specialization and the need for projects to put in place proper migration or collaboration paths to ensure the proper communication and interaction among members playing different roles.
You can read the full paper using the open access link above or keep reading for an extended summary.
Role Characterization in GitHub
The Open Source principles favor a rich variety of possible ways to contribute to the project development and evolution beyond code contributions. Due to its community-driven approach to development, users of the software can contribute feature requests and bug reports, comment on those made by others or vote to help in the prioritization of the project next steps, among others. Social coding platforms like GitHub provide the infrastructure to facilitate these types of interactions.
Obviously, the same person can play different roles on a project, e.g., she can submit a bug report (with a user hat on) while later committing some code that adds a new feature (with a developer hat on).
To better understand the different types of contributions and their relative importance to the overall project evolution, we defined a set of contributors’ roles targeting the specificities of the project development process enabled by GitHub and other similar social coding platforms (see Section 8 for other previous role classification proposals). GitHub promotes a pull-based development process, where new contributions to the code base are submitted and reviewed via pull requests. This is specially true for external developers and occasional contributors, as members of the project can (and usually do) directly push their code to any branch in the repository. To facilitate the collaborative development of the project, GitHub also offers an issue-tracker, a wiki system and project’s activity reports.
Based on this, we have identified six types of contributors’ roles in GitHub:
- DEVELOPER : The activity of this contributor’ role is mainly focused on submitting commits and/or pull requests with code modifications. They may also comment on their pull requests.
- REVIEWER : Code contributions can be revised by any GitHub user via pull request reviews. This is the role of the REVIEWER, which is focused on reviewing others’ code (and commenting on these reviews).
- MERGER : In GitHub, pull requests must be explicitly accepted and merged into the project’s codebase to make the contribution effective. This is the role of the MERGER. This typically happens after the REVIEWER s have completed their job and DEVELOPER s have modified and resubmitted their code accordingly.
- REPORTER : The activity of this contributor’ role is devoted to contribute issues (and comment on such issues) to help raising concerns on the project, give ideas for its future evolution or influence its development.
- COMMENTER : GitHub allows users to comment on any aspect of the project, in particular, on issues and/or pull requests. A COMMENTER is a person that enrich the project discussion by commenting on other people’s opened issues. Comments on pull requests fall instead into the REVIEWER category above.
- REACTOR : As any other social platform, GitHub allows users to react to contributions by others (i.e., issues, pull requests, reviews and comments) via emojis (e.g., thumbs-up, heart, etc.). This kind of reactions serves as a quick acknowledgment on a task (e.g., attaching a thumbs-up to a request in an issue comment) and can be considered a less thoughtful contribution than a comment as they do not enrich the discussion but express support (or disagreement) to a current line of thought. We call REACTORs to anybody who uses this reaction feature.
Role-based activity distribution
We first study the activity distribution in OSS projects, grouping the activities according to the above fixed set of roles for a better analysis of the main driving forces in OSS. A precise mapping of action sequences to roles is provided in the paper, but the roles’ definition we have just seen is enough to get the idea.
Activity distribution analysis
The following figure shows this distribution. As can be seen, more than a half of the actions map to developer and commenter roles, being the merger and reviewer roles the ones with less presence. Nevertheless, we can observe how no single role is dominant and that, therefore, OSS is really a collective effort involving a significant number of all types of actions around the project. Note also that as the commenter role definition specifically refers to those users commenting on others’ issues, our results highlight the importance of collaboration among project’s members.
Note that commenters’ actions are even higher than developer ones as soon as the project reaches a minimal size. Reviewers’ and reactors’ actions also grow as the community does.OSS development cannot be reduced to code-related, not even technically-related, activities. Click To Tweet
Prototypical contributor profile
As an alternative representation of the importance of each role and to better understand the distributions described above, we now characterize the typical profile of an Open Source contributor. The profile is built by depicting the expected number of actions per role of this prototypical contributor.
The following figure shows the results as a radar plot. These are the overall values, so the radar is based on the actions and users across all projects in the dataset.
We now analyze whether the prototypical contributor profile characterized in the previous section has, in fact, any resemblance to the reality of OSS contributors.
More specifically, we want to study whether each role is mostly played by a specialized group of people or, on the contrary, there is a large overlapping between the subcommunities playing each role, especially including both coding and non-coding roles. If the former, projects may consider putting in place specific onboarding strategies and governance policies to target the users of each specific role so that they all feel part of the project.
Next figure shows the role distribution of each project in the dataset by calculating the ratio of members playing each role (i.e. members that actually perform actions belonging to that role). Members can play more than one role. Differences between this distribution and the action-based one above will signal whether the actions of a role are concentrated in a small number of people (or vice versa).
Globally, we can observe that there is a high presence of reactors. This is somehow surprising as reactor actions were important but not the most dominant ones when we analyzed the previous reserach question. This implies that while a large number of members of a project play the role of a reactor they only play it very occasionally, not amounting for a lot of reactor activity overall. The contrary happens for developers, as results show a relative low presence of developers but they amount to a large number of project actions.
Most common role configuration
From the previous analysis, it is clear that some project members play different roles. We now analyze what are the most common role configurations for those “multi-role” users. We believe knowing what roles are typically played together, especially to see if the most common configurations mix coding and non-coding roles, helps to understand the community composition of OSS projects.
Our results reveal that one-role configurations, in particular reactors, reporters and commenters, are the most common in all projects and groups. This finding states that for many contributors, the first and only way to contribute to a project is by performing one of these tasks. Note that, especially the reactor one, is also the easiest one (as it is just reacting to somebody else contribution) showing that these roles are a good way to detect new project members that could later (with the right onboarding strategies in place) migrate to more involving roles.
We have also observed a lack of cross-role configuratoins combining coding and non-coding roles, thus showing the existence of members that specialize in non-coding tasks in the project.
This paper has analyzed the different roles participating in Open Source development by providing a precise role definition for a quantitative role-based analysis of Open Source projects. This opens the door to a number of other quantitative analysis of Open Source communities to complement existing qualitative studies. Among these analysis, we have focused on this paper in the study of non-coding roles visible in GitHub. For instance, our results show that non-coding roles (e.g., commenter or reactor) have a high presence in the analyzed projects and that those roles are often taken by people that specialize in contributing to the project only on non-coding activities, complementing the work of coding contributors that, on the contrary, have little involvement in non-coding tasks. This specialization highlights the importance of all types of roles in an OSS project, demystifying the topic of few core coders driving the full spectrum of actions in the project. But at the same time, the limited migration of members from non-coding to coding roles emphasizes the need of better onboarding and governance strategies that facilitate this role evolution or, at the very least, a better collaboration between the different roles.
We believe these results would be even more evident if we had analyzed other sources of project data outside of the GitHub ecosystem (e.g., mailing lists, forums, twitter discussions, etc) where other non-coding roles are also more visible. This is part of our future work, together with the replication of the study on other sets of projects. In this sense, we are especially interested in studying how these observations evolve when moving to project ecosystems instead of single projects.
Hello， how can I use the graph file generated in sourcecred ?