General Programming Languages (GPLs), like Java, PHP, C++,… continuously evolve to adapt to the ever changing technology landscape. The evolution is rooted on technical aspects but it is ultimately decided by the group of people governing the language and working together to solve, vote and approve the new language extensions and modifications. As in any community, governance rules are used to manage the community, help to prioritize their tasks and come to a decision.
Typically, these rules specify the decision-making mechanism used in the project, thus contributing to its long-term sustainability by clarifying how core language developers (external contributors and even end-users of the language) can work together. Despite their importance, this core topic has been largely ignored in the study of GPLs.
In this post, we present our paper “Analysis and Modeling of the Governance in General Programming Languages” accepted at the ACM SIGPLAN International Conference on Software Language Engineering (SLE’19) where we study eight well-known GPLs and analyze how they govern their evolution. We believe this study helps to clarify the different approaches GPLs use in this regard. These governance models, depicted as a feature model, can then be reused and mixed by developers of new languages to define their own governance.
1. Introduction
Evolution of General Programming Languages (GPLs) is a highly technical activity that may involve improvements on the language syntax, parsing, conformance and performance, among others. Similar to any (open source) software project, this evolution largely depends on the work of a community of developers and end-users of the language willing to contribute to its evolution, some times on a volunteer basis (contributors may also be paid by companies that sponsor the project or, for some languages, employed by the companies that drive the language forward). While several works have focused on the technical challenges of language evolution [11, 13, 14], little attention has been paid to the problem of better understanding how the community organizes itself to evolve the language.
Ideally, the community organization should be transparent and explicitly explained in a set of governance rules, where each rule partially describes how to contribute to certain aspects of the language evolution and how decisions for the acceptance/rejection of such contributions are made. Governance rules could be as simple as stating the project follows a BDFL model (i.e., a dictatorship model) or complex enough to model more democratic and participatory processes.
So far, little is known regarding the governance practices in current GPLs. Even if there is no one-fits-all solution, having a broad perspective of the governance models in place (and how these models relate to other characteristics of the language) can shed some light on the complex problem of language evolution from a new and fresh perspective. We believe this would also help developers of new languages to make informed decisions when creating or developing languages. This paper, where we analyze the governance models of eight well-known GPLs and build a feature model characterizing them and facilitating the configuration of governance rules for new languages, is a first step in this direction. We focus on GPLs, which usually have larger audiences and facilitate the analysis, but our work can also help when developing Domain-Specific Languages (DSLs) [1].
The rest of the paper is structured as follows. Section 2 introduces the languages considered in our study. Section 3 analyzes the selected languages and Section 4 summarizes the results as a feature model. Section 5 presents threats to validity. Section 6 shows the related work and Section 7 finalizes the paper and discusses our ideas for future work.
2. Language Selection
To perform our initial study we selected eight well-known GPLs representing the diverse set of governance models in language development. As all the selected languages are well-known, we only comment on the most relevant governance or evolution aspect that we took into account to include the language in our study.
C++ The language was created in 1985 by Bjarne Stroustrup as an extension of C. It has been included in the study as an example of a language standardized by the International Organization for Standardization (ISO; in 1998).
Go Google designed Go in 2009 mainly to address multicore and networked machines. We included Go in our study as an example of a company-driven language.
Java The language was created in 1995 by Oracle Corporation, being James Gosling one of the main designers. The language relies on a decision-making mechanism were membership levels play an important role, which we considered interesting for our study.
Kotlin The language was created by JetBrains in 2011 and adopted by Google to support the development of Android applications. As for Go, we selected this language as an example of a company-driven language. But, different from Go, the company behind Kotlin has a business model directly related to the GPL field.
PHP The language was created in 1995 by Rasmus Lerdorf and later developed by PHP Group. The development process relies on a participatory governance model which we found interesting to consider in our study.
Python Guido van Rossum created this language in 1990 and ruled the development process until 2018. The Python Software Foundation is currently in charge of its development. We selected this language due to the evolution of its governance model in the last years, which started with a classic BDFL model.
R The language was created by several developers in 1993. We selected this language as it is mainly designed for statistical and data analysis, a field different from the other languages considered in our study.
Scala The language was designed by Martin Odersky at the École Polytechnique Fédérale de Lausanne in 2004. We selected this language as example of a language with clear origins in academia and a visible leader.
3. Analysis of Governance in GPLs
The selected GPLs were analyzed according to four dimensions, namely: transparency, membership, language changes (with two main subdimensions: structural changes and language improvements) and decision-making process to move the language forward. Next we describe each dimension and present the results obtained in our study. Table 1 shows the dimensions and summarizes the results.
Table 1. Analysis of the selected GPLs.
Transparency This dimension examines the existence and content of the informative resources helpful to understand the development process of a GPL. We analyze whether (1) the language is developed under a specific license (column License), (2) there is a code of conduct document (column CoC) and (3) there is a clear and public release cycle (column RC).
Interestingly enough, our results reveal that only half of the analyzed languages are transparent and provide this kind of information. Not having a clear description of the code of conduct or the release cycle sets the entry barrier for future contributors rather at a high level.
Membership This dimension studies membership levels in the GPL community and whether contributors have to be members (and, if so, what kind of members and how they can progress from one membership level to the upper one) before being able to drive the development.
Only three of the analyzed languages used a membership organization in their evolution processes, all them with specificities:
- Being an standardized language, the development community behind C++ is clearly defined according to ISO regulations, thus any developer interested in participating has to contact the national ISO committee and pay the corresponding fee.
- According to the Java Community Process (JCP), Java defines different membership levels, ranging from free membership levels to paid ones. Any member level can suggest evolution improvements but only paying members have a saying in the decision.
- Python defines two main member roles: (1) core developers, who have to demonstrate knowledge in the language and can participate in main language improvements and (2) council members, who mainly participate in the decisions involving structural changes. While the former is accessed by merits, the latter is done via elections.
Language Changes This dimension analyzes how language changes are addressed in the evolution process. We identify two main change types: (1) structural changes, which involve modifications in the syntax/semantics of the language; and (2) improvements, which generally fix bugs or improve the performance of the current version of the language (i.e., its current reference implementation, if existing). For each type we are interesting in studying: (1) who can propose change requests; (2) what is proposed, that is, the artifact representing the change request (e.g., a document or an issue); (3) how the change request is accepted/rejected (i.e., the decision-making mechanism); (4) the method used to coordinate the change request; and (5) the tool used to implement the change request.
The results reveal that structural changes are generally addressed via “formal” requests (e.g., JSR, PEP or SIP) and accepted via a voting mechanism. Contributors are instructed about what they have to provide, the steps the request will follow for its acceptance and who will be in charge of making the final decision. Nevertheless, the way these changes are coordinated or implemented varies among the evaluated GPLs. For instance, in Python structural changes are described by Python Enhancement Proposals (PEPs) which clearly identifies the author, the type and its status, among others. In Python PEPs are also used to provide documentation about libraries or good practices. PEPs are voted and eventually accepted by the steering council (the description of how decisions are made is specified in PEP 13 at https://www.python.org/dev/peps/pep-0013).
Regarding the treatment of language improvements, all languages rely on the use of issues or pull requests to track the changes. However, only a few of them specify how such issues and pull requests will be treated (and eventually accepted). In Java, Python and Scala we observed the use of some kind of reviewing process where core developers take the responsibility of reviewing the issue or pull request and accept/reject it. Also, some languages require signing a Contributor License Agreement (CLA) (or something similar) to be able to participate in the process.
Decision-making Model This dimension classifies the decision-making mechanism used in the project, which can be: (1) dictator, when the decisions are made by a person (normally the lead developer); (2) committee, when there is a group of elected developers who decide by consensus; (3) company, when the decisions are made by a company; and (4) community, when the group of developers can have a word in the development process.
Most of the analyzed GPLs rely on a committee or the community. We found cases like Python, which followed a dictatorship model until Guido van Rossum gave up; or Scala, where Martin Odersky has veto power. On the other hand, Go and Kotlin are driven by a company and most of the decisions they make are not public.
4. Modeling the Governance in GPLs
We use feature modeling [4] to represent the governance models of GPLs. Feature models were proposed to represent valid products of Software Product Lines, where a product is represented by a particular configuration of the features in the model. In our proposal, the feature model enables the configuration of governance models for languages.
Figure 1 shows the governance feature model we propose, which includes five features covering the main dimensions described in Section 3. To simplify the model, features representing the language changes dimension (i.e., Structural Changes and Improvements) reuse the Changes feature.
Figure 1: Feature Model to represent Governance rules in GPLs.
Then, the governance model of any specific GPL (or DSL for that matter) can be expressed as a particular combination/configuration of these features. As an example, Figure 2 configures the model and shows only those features conforming to the Java governance model.
Figure 2: Configuration of the feature model for modeling Java governance.
Once a GPL community defines their specific governance model, such model can be used in a variety of scenarios. First of all, for transparency and documentation purposes. The governance model clearly defines who, what and how decides about language issues. In this sense, the model can be regarded as a kind of public contract for the GPL community.
But the governance model can also be used, for instance, to parametrize the collaboration and development tools used by the GPL community (e.g., Jira o Bugzilla) and even to monitor them to enforce the governance model (e.g., issues are voted for acceptance) when possible.
5. Threats to Validity
Our work is subjected to both (1) internal validity (i.e., related to the inferences we made); and (2) external validity (i.e., generalization of our findings). Regarding the internal validity, our subjectivity in selecting and classifying the GPLs may have affected the outcome of our study. In particular, a wrong perception or misunderstanding on our side may have resulted in a misclasification. As for the external validity, our study is based on a subset of GPLs and therefore our results should not be generalized to any other GPL or DSL.
6. Related Work
The study of how people work together to develop software systems has been a research topic for a long time [3, 5, 10, 12, 15], including governance in OSS [6, 7] and the broader topic of IT governance [2, 16, 18, 19]. However, to the best of our knowledge, little attention has been paid to the analysis and modeling of the governance in the particular field of software programming languages, in particular, in GPLs.
There are a couple of exceptions studying the decision-making processes in individual languages. The works by Keertipati et al [9] and Sharma et al [17] study how PEPs are treated in Python. In particular, the former analyzes whether the steps followed by the actual PEPs during the decision-making process is aligned with the official one; while the latter analyzes the level of interest and discussion levels in PEPs. In the context of Java, the work by Kaschesky et al. [8] investigates whether deliberation arisen in the development process has an impact on how decisions are made. These works have helped us to shape some dimensions of our study (e.g., structural changes for these languages).
7. Conclusions
We have analyzed a set of GPLs to understand how their evolution processes are governed. Based on our results we have built a feature model that describes the different governance models and enables the selection and configuration of new governance models for future GPLs, a topic that, so far, has received little attention from the language community.
As future work, we would like to keep validating our results by covering other GPLs (and well-known DSLs), which may result in the need to enrich the proposed feature model. Furthermore, a follow-up qualitative study, comprising interviews with actual users and contributors of languages will also help to explore their opinions and views on the needs and expectations they have from language development processes, in particular, when the process has been evolving along the years. We also plan to study how governance models may be inferred from the way developers used the language (e.g., applying crowdsourcing techniques [20]).
At the very least, we hope this work triggers a deeper discussion in the language community regarding the role and importance of governance issues to attract new contributors to a language community and make sure that each contributor understands how she can best contribute to the sustainability and future evolution of the language. This discussion may be similar to the one raised in general software development regarding the importance of governance models [6]. In this sense, it would also be interesting to study how language governance differs from software governance.
References
[1] Marco Brambilla, Jordi Cabot, and Manuel Wimmer. 2017. Model- Driven Software Engineering in Practice, Second Edition. Morgan & Claypool Publishers.
[2] Sunita Chulani, Clay Williams, and Avi Yaeli. 2008. Software Development Governance and Its Concerns. In Workshop on Software Development Governance. 3–6.
[3] Kevin Crowston, Kangning Wei, Qing Li, U. Yeliz Eseryel, and James Howison. 2005. Coordination of Free/Libre Open Source Software Development. In Int. Conf. on Information Systems.
[4] Krzysztof Czarnecki. 2000. Generative Programming: Methods, Tools, and Applications: Methods, Techniques and Applications.
[5] James D. Herbsleb and Rebecca E. Grinter. 1999. Splitting the Organization and Integrating the Code: Conway’s Law Revisited. In Int. Conf. on Software Engineering. 85–95.
[6] Javier Luis Cánovas Izquierdo and Jordi Cabot. 2015. Enabling the Definition and Enforcement of Governance Rules in Open Source Systems. In Int. Conf. on Software Engineering: Software Engineering in Society. 505–514.
[7] Javier Luis Cánovas Izquierdo and Jordi Cabot. 2018. The Role of Foundations in Open Source Projects. In Int. Conf. on Software Engineering: Software Engineering in Society. 3–12.
[8] Michael Kaschesky and Reinhard Riedl. 2009. Top-level Decisions through Public Deliberation on the Internet: Evidence from the Evolution of Java Governance. In Int. Conf on Digital Government Research, Partnerships for Public Innovation. 42–55.
[9] Smitha Keertipati, Sherlock A. Licorish, and Bastin Tony Roy Savarimuthu. 2016. Exploring Decision-making Processes in Python. In Int. Conf on Evaluation and Assessment in Software Engineering. 43:1–43:10.
[10] Robert E. Kraut and Lynn A. Streeter. 1995. Coordination in Software Development. Comm. ACM 28, 3 (1995), 69–81.
[11] Brian A. Malloy and James F. Power. 2019. An Empirical Analysis of the Transition from Python 2 to Python 3. Emp. Softw. Eng. 24, 2 (2019), 751–778.
[12] Mary Lynne Markus. 2007. The Governance of Free/Open Source Software Projects: Monolithic, Multidimensional, or Configurational? J. of Manag. & Gov. 11, 2 (2007), 151–163.
[13] Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016. Into the Depths of C: Elaborating the de Facto Standards. In Int. Conf. on Programming Language Design and Implementation. 1–15.
[14] Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the Design of the R Language – Objects and Functions for Data Analysis. In Europ. Conf. on Object-Oriented Programming. 104–131.
[15] David Lorge Parnas. 1972. On the Criteria to be Used in Decomposing Systems into Modules. Comm. ACM 15, 12 (1972), 1053–1058.
[16] Narayan Ramasubbu and Rajesh Krishna Balan. 2008. Towards Governance Schemes for Distributed Software Development Projects. In Workshop on Software Development Governance. 11–14.
[17] Pankajeshwara Sharma, Bastin Tony Roy Savarimuthu, Nigel Stanger, Sherlock A. Licorish, and Austen Rainer. 2017. Investigating Developers’ Email Discussions during Decision-making in Python Language Evolution. In Int. Conf on Evaluation and Assessment in Software Engineering. 286–291.
[18] Wim Van Grembergen. 2003. Strategies for Information Technology Governance. IGI Publishing.
[19] Phyl Webb, Carol Pollard, and Gail Ridley. 2006. Attempting to Define IT Governance: Wisdom or Folly?. In Int. Conf. on Systems Science.
[20] Preston Tunnell Wilson, Justin Pombrio, and Shriram Krishnamurthi. 2017. Can we crowdsource language design?. In Int. Symp. New Ideas, New Paradigms, and Reflections on Programming and Software. 1–17.
Feature Photo by Markus Spiske on Unsplash
Associate Professor at Universitat Oberta de Catalunya and researcher at SOM Research Team, in Barcelona, Spain, he likes investigating on how software is developed, in particular how open-source software is developed and how people collaboratively drives the creation process. He has been working mainly in the area of programming & domain-specific languages, modeling, modernization and model-driven engineering.