Research Software Engineering: A Guide to the Open Source Ecosystem

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Research Software Engineering: A Guide to the Open Source Ecosystem strives to give a big-picture overview and an understanding of the opportunities of programming as an approach to analytics and statistics. The book argues that a solid "programming" skill level is not only well within reach for many but also worth pursuing for researchers and business analysts. The ability to write a program leverages field-specific expertise and fosters interdisciplinary collaboration as source code continues to become an important communication channel. Given the pace of the development in data science, many senior researchers and mentors, alongside non-computer science curricula lack a basic software engineering component. This book fills the gap by providing a dedicated programming-with-data resource to both academic scholars and practitioners. Key Features overview: breakdown of complex data science software stacks into core components applied: source code of figures, tables and examples available and reproducible solely with license cost-free, open source software reader guidance: different entry points and rich references to deepen the understanding of selected aspects

Author(s): Matthias Bannert
Series: data science series
Publisher: CRC Pressr
Year: 2024

Language: English
Pages: 201

Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
List of Figures
List of Tables
Preface
Acknowledgments
1. Introduction
1.1. Why Work Like a Software Engineer?
1.2. Why Work Like an Operations Engineer?
1.3. How To Read This Book?
1.4. Backlog
1.5. Requirements
2. Stack: A Developer's Toolkit
2.1. Programming Language
2.2. Interaction Environment
2.3. Version Control
2.4. Data Management
2.5. Infrastructure
2.6. Automation
2.7. Communication Tools
2.8. Publishing and Reporting
3. Programming 101
3.1. The Choice That Doesn't Matter
3.2. Plan Your Program
3.2.1. Think Library!
3.2.2. Documentation
3.2.3. Design Your Interface
3.2.4. Dependencies
3.2.5. Folder Structure
3.3. Naming Conventions: Snake, Camel or Kebab Case
3.4. Testing
3.5. Debugging
3.5.1. Read Code from the Inside Out
3.5.2. Debugger, Breakpoints, Traceback
3.6. A Word on Peer Programming
4. Interaction Environment
4.1. Integrated Development Environments
4.1.1. RStudio
4.1.2. Visual Studio Code
4.1.3. Editors on Steroids
4.2. Notebooks
4.3. Console/Terminal
4.3.1. Remote Connections SSH, SCP
4.3.2. Git Through the Console
5. Git Version Control
5.1. What Is Git Version Control?
5.2. Why Use Version Control in Research?
5.3. How Does Git Work?
5.4. Moving Around
5.5. Collaboration Workflow
5.5.1. Feature Branches
5.5.2. Pull Requests from Forks
5.5.3. Rebase vs. Merge
6. Data Management
6.1. Forms of Data
6.2. Representing Data in Files
6.2.1. Spreadsheets
6.2.2. File Formats for Nested Information
6.2.3. A Word on Binaries
6.2.4. Interoperable File Formats
6.3. Databases
6.3.1. Relational database Management Systems (RDBMS)
6.3.2. A Word on Non-Relational databases
6.4. Non-Technical Aspects of Managing Data
6.4.1. Etiquette
6.4.2. Security
6.4.3. Privacy
6.4.4. Data Publications
7. Infrastructure
7.1. Why Go Beyond a Local Notebook?
7.2. Hosting Options
7.2.1. Software-as-a-Service
7.2.2. Self-Hosted
7.3. Building Blocks
7.3.1. Virtual Machines
7.3.2. Containers and Images
7.3.3. Kubernetes
7.4. Applied Containerization Basics
7.4.1. DOCKERFILEs
7.4.2. Building and Running Containers
7.4.3. Docker Compose – Manage Multiple Containers
7.4.4. A Little Docker Debugging Tip
8. Automation
8.1. Continuous Integration/Continuous Deployment
8.2. Cron Jobs
8.3. Workflow Scheduling: Apache Airflow DAGs
8.4. Make-Like Workflows
8.5. Infrastructure as Code
9. Community
9.1. Stay Up-to-Date in a Vastly Evolving Field – Social Media
9.2. Knowledge-Sharing Platforms
9.3. Look Out for Local Community Group
9.4. Attend Conferences - Online Can Be a Viable Option!
9.5. Join a Chat Space
10. Publishing and Reporting
10.1. Getting Started with a Simple Report
10.1.1. This is a level three header
10.1.2. This is another level three header
10.2. Static Website Generators
10.3. Hosting Static Websites
10.3.1. GitHub Pages
10.3.2. GitHub Actions
10.3.3. Netlify
10.4. Visualization
10.4.1. Rendered Graphs
10.4.2. JavaScript Visualization Libraries
10.5. Data Publications
11. Case Studies
11.1. SSH Key Pair Authentication
11.2. Application Programming Interfaces
11.2.1. Example 1: The {kofdata} R Package
11.2.2. Build Your Own API Wrapper
11.3. Create Your Own API
11.3.1. GitHub to Serve Static Files
11.3.2. Simple Dynamic APIs
11.4. A Minimal Webscraper: Extracting Publication Dates
11.5. Automate Script Execution: An Example with GitHub Actions
11.6. Choropleth Map: Link Data to a GeoJSON Map File
11.7. Web Applications with R Shiny
11.7.1. The Web Frontend
11.7.2. Backend
11.7.3. Put Things Together and Run Your App
11.7.4. Serve Your App
11.7.5. Shiny Resources
11.8. Project Management Basics
11.9. Parallel Computation
11.10. Good Practice
Glossary
References
Index