Software Ecosystems: Tooling and Analytics

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book highlights recent research advances in various domains related to software ecosystems such as library reuse, collaborative development, cloud computing, open science, sentiment analysis and machine learning. A key aspect of software ecosystems is that software products belong to ever more interdependent networks of co-evolving software components. The ever-increasing importance of social coding platforms has made software ecosystems indispensable to software practitioners, in commercial as well as open-source settings. The book starts with an introductory chapter that provides a historical account of the origins of software ecosystems. It provides the necessary context about the domain of software ecosystems by highlighting its different perspectives, definitions, and representations. It also exemplifies the variety of software ecosystems that have emerged during the previous decades. The remaining book is composed of five parts: Part I contains two chapters on software ecosystem representations, Part II two chapters that focus on complementary ways and techniques of analyzing software ecosystems. Next, Part III includes two chapters that focus on aspects related to the evolution within software ecosystems, while Part IV looks at workflow automation and infrastructure-as-code ecosystems. Finally, Part V focuses on ecosystems for software modeling and for data-intensive software. This book is intended for researchers and practitioners interested in data mining, tooling, and empirical analysis of software ecosystems. The reader will appreciate chapters that cover a wide spectrum of social and technical aspects of software ecosystems, each including an overview of the state of the art. Chapter 2 The Software Heritage Open Science Ecosystem is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

Author(s): Tom Mens, Coen De Roover, Anthony Cleve
Publisher: Springer
Year: 2023

Language: English
Pages: 336

Foreword
Preface
How This Book Originated
Who Contributed to This Book
What This Book Is Not About
Who This Book Is Intended For
How This Book Is Structured
Acknowledgments
Contents
Contributors
Acronyms
1 An Introduction to Software Ecosystems
1.1 The Origins of Software Ecosystems
1.2 Perspectives and Definitions of Software Ecosystems
1.3 Examples of Software Ecosystems
1.3.1 Digital Platform Ecosystems
1.3.2 Component-Based Software Ecosystems
1.3.3 Web-Based Code Hosting Platforms
1.3.4 Open-Source Software Communities
1.3.5 Communication-Oriented Ecosystems
1.3.6 Software Automation Ecosystems
1.4 Data Sources for Mining Software Ecosystems
1.4.1 Mining the GitHub Ecosystem
1.4.2 Mining the Java Ecosystem
1.4.3 Mining Software Library Ecosystems
1.4.4 Mining Other Software Ecosystems
1.5 The CHAOSS Project
1.6 Summary
References
Part I Software Ecosystem Representations
2 The Software Heritage Open Science Ecosystem
2.1 The Software Heritage Archive
2.1.1 Data Model
2.1.2 Software Heritage Persistent Identifiers (SWHIDs)
2.2 Large Open Datasets for Empirical Software Engineering
2.2.1 The Software Heritage Datasets
2.2.1.1 The Software Heritage Graph Dataset
2.2.1.2 Accessing Source Code Files
2.2.1.3 License Dataset
2.3 Research Highlights
2.3.1 Enabling Artifact Access and (Large-Scale) Analysis
2.3.2 Software Provenance and Evolution
2.3.3 Software Forks
2.3.4 Diversity, Equity, and Inclusion
2.4 Building the Software Pillar of Open Science
2.4.1 Software in the Scholarly Ecosystem
2.4.2 Extending the Scholarly Ecosystem Architecture to Software
2.4.3 Growing Technical and Policy Support
2.4.4 Supporting Researchers
2.5 Conclusions and Perspectives
References
3 Promises and Perils of Mining Software Package Ecosystem Data
3.1 Introduction
3.2 Software Package Ecosystem
3.3 Data Sources
3.4 Promises and Perils
3.4.1 Planning What Information to Mine
3.4.2 Defining Components and Their Dependencies
3.4.3 Defining Boundaries and Completeness
3.4.4 Analyzing and Visualizing the Data
3.5 Application: When to Apply Which Peril
3.5.1 Two Case Studies
3.5.2 Applying Perils and Their Mitigation Strategies
3.6 Chapter Summary
References
Part II Analyzing Software Ecosystems
4 Mining for Software Library Usage Patterns Within an Ecosystem: Are We There Yet?
4.1 Introduction
4.2 Example of API Usage Patterns in Software Libraries
4.3 Usages as Sets of Frequent Co-occurrences
4.4 Usages as Pairs or Subsequences of APIs via Software Mining
4.5 Graph Representation for Usage Patterns via Static Analysis
4.5.1 Object Usage Representation
4.5.2 Graph-Based API Usage Pattern Mining Algorithm
4.5.2.1 Important Concepts in Graph-Based Usage Pattern Mining
4.5.2.2 Overview of GrouMiner Algorithm
4.5.2.3 Detailed GrouMiner Algorithm
4.5.3 API Usage Graph Pattern Mining
4.5.3.1 Semantic-Aware API Usage Pattern Mining with MUDetect
4.5.4 Cooperative API Usage Pattern Mining Approach
4.5.5 Probabilistic API Usage Mining
4.5.6 API Usage Mining via Topic Modeling
4.5.7 Mining for Less Frequent API Usage Patterns
4.6 Applications of Usage Patterns
4.6.1 Graph-Based API Usage Anomaly Detection
4.6.2 Pattern-Oriented Code Completion
4.6.3 Integration of API Usage Patterns
4.7 Conclusion
References
5 Emotion Analysis in Software Ecosystems
5.1 What Is a Software Ecosystem?
5.2 What Is Emotion?
5.3 Why Would One Study Emotions in Software Engineering?
5.4 How to Measure Emotion?
5.4.1 Tools
5.4.2 Datasets
5.5 What Do We Know About Emotions and Software Ecosystems?
5.5.1 Ecosystems as Communication Platforms
5.5.1.1 Stack Overflow
5.5.1.2 GitHub
5.5.2 Ecosystems as Interrelated Projects
5.5.2.1 GitHub
5.5.2.2 Apache
5.5.2.3 Other Ecosystems
5.6 What Next?
5.7 What Have We Discussed in This Chapter?
References
Part III Evolution Within Software Ecosystems
6 Analyzing Variant Forks of Software Repositories from Social Coding Platforms
6.1 Introduction
6.2 State of the Art
6.3 Motivations for Variant Forking on Social Coding Platforms
6.3.1 Technical
6.3.2 Governance
6.3.3 Legal
6.3.4 Other Categories
6.4 Mining Variant Forks on GitHub
6.4.1 The Different Types of Variant Forks
6.4.2 How to Mine Variant Forks?
6.4.3 What Are Divergent Variants?
6.5 Challenges of Maintaining Variant Forks
6.6 Research Roadmap
6.6.1 Recommendation Tools
6.6.2 Shareable Updates Among Variants
6.6.3 Transplantation Tools
6.7 Conclusion
References
7 Supporting Collateral Evolution in Software Ecosystems
7.1 Introduction
7.2 Supporting Collateral Evolution in Linux Kernel
7.2.1 Recommending Code Changes for Automatic Backporting of Linux Device Drivers
7.2.2 Spinfer: Inferring Semantic Patches for the Linux Kernel
7.2.3 Other Studies
7.3 Supporting Collateral Evolution in Android
7.3.1 An Empirical Study on Deprecated-API Usage Update in Android
7.3.1.1 Datasets
7.3.1.2 Results
7.3.2 Example-Based Automatic Android Deprecated-API Usage Update
7.3.2.1 Design of CocciEvolve
7.3.2.2 Dataset and Evaluation Results
7.3.3 Data-Flow Analysis and Variable Denormalization-Based Automated Android API Update
7.3.3.1 AndroEvolve Architecture
7.3.3.2 Evaluation of AndroEvolve
7.3.4 Other Studies
7.4 Supporting Collateral Evolution in ML Libraries
7.4.1 Characterizing the Updates of Deprecated ML API Usages
7.4.1.1 Datasets
7.4.1.2 Update Operations to Migrate Deprecated API Usages
7.4.2 Automated Update of Deprecated Machine Learning APIs
7.4.2.1 Architecture of MLCatchUp
7.4.2.2 Evaluating MLCatchUp on Updating Deprecated APIs
7.4.3 Other Studies
7.5 Open Problems and Future Work
7.6 Conclusion
References
Part IV Software Automation Ecosystems
8 The GitHub Development Workflow Automation Ecosystems
8.1 Introduction
8.1.1 Collaborative Software Development and Social Coding
8.1.2 The GitHub Social Coding Platform
8.1.3 Continuous Integration and Deployment
8.1.4 The Workflow Automation Ecosystems of GitHub
8.2 Workflow Automation Through Development Bots
8.2.1 What Are Development Bots?
8.2.2 The Role of Bots in GitHub's Socio-technical Ecosystem
8.2.3 Advantages of Using Development Bots
8.2.4 Challenges of Using Development Bots
8.3 Workflow Automation Through GitHub Actions
8.3.1 What Is GitHub Actions?
8.3.2 Empirical Studies on GitHub Actions
8.3.3 The GitHub Actions Ecosystem
8.3.4 Challenges of the GitHub Actions Ecosystem
8.4 Discussion
References
9 Infrastructure-as-Code Ecosystems
9.1 Introduction
9.2 Docker and Its Docker Hub Ecosystem
9.2.1 Introduction to Containerization
9.2.2 The Docker Containerization Tool
9.2.3 The Docker Hub Ecosystem
9.2.3.1 Types of Images Collected on Docker Hub
9.2.3.2 Image Metadata Maintained on Docker Hub
9.2.4 Approaches to Analyzing Docker Hub Images
9.2.4.1 Docker Hub Metadata Analysis
9.2.4.2 Static Analysis of Dockerfiles and Docker Images
9.2.4.3 Dynamic Analysis of Dockerfiles and Docker Images
9.2.5 Empirical Insights from Analyzing Docker Hub Images
9.2.5.1 Technical Lag and Security in the Docker Hub Ecosystem
9.2.5.2 Technical Debt and Code Smells in Dockerfiles
9.2.5.3 Challenges in Maintaining and Evolving Dockerfiles
9.3 Ansible and Its Ansible Galaxy Ecosystem
9.3.1 Introduction to Configuration Management
9.3.2 The Ansible Configuration Management Tool
9.3.2.1 Ansible Plays and Playbooks
9.3.2.2 Ansible Roles
9.3.3 The Ansible Galaxy Ecosystem
9.3.3.1 Types of Ansible Galaxy Content
9.3.3.2 Types of Metadata Maintained by Ansible Galaxy
9.3.4 Approaches to Analyzing Ansible Galaxy
9.3.4.1 Ansible Galaxy Metadata Analysis
9.3.4.2 Static Analysis of Ansible Infrastructure Code
9.3.4.3 Dynamic Analysis of Ansible Infrastructure Code
9.3.5 Empirical Insights from Analyzing Ansible Infrastructure Code
9.3.5.1 Code Smells and Quality in the Ansible Galaxy Ecosystem
9.3.5.2 Defect Prediction for the Ansible Galaxy Ecosystem
9.3.5.3 Evolution Within the Ansible Galaxy Ecosystem
9.4 Conclusion
References
Part V Model-Centered Software Ecosystems
10 Machine Learning for Managing Modeling Ecosystems: Techniques, Applications, and a Research Vision
10.1 Introduction
10.2 Background in Machine Learning
10.2.1 Supervised Learning
10.2.2 Unsupervised Learning
10.2.3 Reinforcement Learning (RL)
10.3 Literature Review
10.3.1 Methodology
10.3.2 Query String
10.3.3 Inclusion and Exclusion Criteria
10.3.4 Manual Labelling
10.3.5 Results
10.4 Existing Machine Learning Applications in MDE
10.4.1 Model Assistants
10.4.2 Model Classification
10.4.3 Model Refactoring
10.4.4 Model Repair
10.4.5 Model Requirements
10.4.6 Model Search
10.4.7 Model Synthesis
10.4.8 Model Transformation Development
10.4.9 Others
10.5 A Roadmap for the Deployment of ML in MDE
10.5.1 Data Privacy Management
10.5.2 Detecting Technical Debt
10.5.3 Adversarial Machine Learning
10.5.4 Mining Time Series Data
10.6 Conclusion
References
11 Mining, Analyzing, and Evolving Data-Intensive Software Ecosystems
11.1 Introduction
11.2 Mining Techniques
11.2.1 Introduction
11.2.2 Static Analysis of Relational Database Accesses
11.2.3 Static Analysis of NoSQL Database Accesses
11.2.4 Reflections
11.3 Analysis Techniques
11.3.1 Introduction
11.3.2 Static Analysis Techniques
11.3.2.1 Example 1: SQLInspect—A Static Analyzer
11.3.2.2 Example 2: Preventing Program Inconsistencies
11.3.3 Visualization
11.3.3.1 Introduction
11.3.3.2 Example 1: DAHLIA
11.3.3.3 Example 2: m3triCity
11.3.4 Reflections
11.4 Empirical Studies
11.4.1 Introduction
11.4.2 The (Joint) Use of Data Models and Technologies
11.4.3 Prevalence, Impact, and Evolution of SQL Bad Smells
11.4.4 Self-Admitted Technical Debt in Database Access Code
11.4.5 Database Code Testing (Best) Practices
11.4.6 Reflections
11.5 Conclusion
References