This book discusses what is currently known about software engineering, based on an analysis of all the publicly available data. This aim is not as ambitious as it sounds, because there is not a great deal of data publicly available.
The intent is to provide material that is useful to professional developers working in industry; until recently researchers in software engineering have been more interested in vanity work, promoted by ego and bluster.
The material is organized in two parts, the first covering software engineering and the second the statistics likely to be needed for the analysis of software engineering data.
http://www.knosof.co.uk/ESEUR/
Author(s): Derek M. Jones
Publisher: Knowledge Software, Ltd
Year: 2020
Language: English
Pages: 454
Introduction
What has been learned?
Replication
Software markets
The primary activities of software engineering
History of software engineering research
Folklore
Research ecosystems
Overview of contents
Why use R?
Terminology, concepts and notation
Further reading
Human cognition
Introduction
Modeling human cognition
Embodied cognition
Perfection is not cost-effective
Motivation
Built-in behaviors
Cognitive effort
Attention
Visual processing
Reading
Memory systems
Short term memory
Episodic memory
Recognition and recall
Serial order information
Forgetting
Learning and experience
Belief
Expertise
Category knowledge
Categorization consistency
Reasoning
Deductive reasoning
Linear reasoning
Causal reasoning
Number processing
Numeric preferences
Symbolic distance and problem size effect
Estimating event likelihood
High-level functionality
Personality & intelligence
Risk taking
Decision-making
Expected utility and Prospect theory
Overconfidence
Time discounting
Developer performance
Miscellaneous
Cognitive capitalism
Introduction
Investment decisions
Discounting for time
Taking risk into account
Incremental investments and returns
Investment under uncertainty
Real options
Capturing cognitive output
Intellectual property
Bumbling through life
Expertise
Group dynamics
Maximizing generated surplus
Motivating members
Social status
Social learning
Group learning and forgetting
Information asymmetry
Moral hazard
Group survival
Group problem solving
Cooperative competition
Software reuse
Company economics
Cost accounting
The shape of money
Valuing software
Maximizing ROI
Value creation
Product/service pricing
Predicting sales volume
Managing customers as investments
Commons-based peer-production
Ecosystems
Introduction
Funding
Hardware
Evolution
Diversity
Lifespan
Entering a market
Population dynamics
Growth processes
Estimating population size
Closed populations
Open populations
Organizations
Customers
Culture
Software vendors
Career paths
Applications and Platforms
Platforms
Pounding the treadmill
Users' computers
Software development
Programming languages
Libraries and packages
Tools
Information sources
Projects
Introduction
Project culture
Project lifespan
Pitching for projects
Contracts
Resource estimation
Estimation models
Time
Size
Paths to delivery
Development methodologies
The Waterfall/iterative approach
The Agile approach
Managing progress
Discovering functionality needed for acceptance
Implementation
Supporting multiple markets
Refactoring
Documentation
Acceptance
Deployment
Development teams
New staff
Ongoing staffing
Post-delivery updates
Database evolution
Reliability
Introduction
It's not a fault, it's a feature
Why do fault experiences occur?
Fault report data
Cultural outlook
Maximizing ROI
Experiencing a fault
Input profile
Propagation of mistakes
Remaining faults: closed populations
Remaining faults: open populations
Where is the mistake?
Requirements
Source code
Libraries and tools
Documentation
Non-software causes of unreliability
System availability
Checking for intended behavior
Code review
Testing
Creating tests
Beta testing
Estimating test effectiveness
Cost of testing
Source code
Introduction
Quantity of source
Experiments
Exponential or Power law
Folklore metrics
Desirable characteristics
The need to know
Narrative structures
Explaining code
Memory for material read
Integrating information
Visual organization
Consistency
Identifier names
Programming languages
Build bureaucracy
Patterns of use
Language characteristics
Runtime characteristics
Statements
Control flow
Loops
Expressions
Literal values
Use of variables
Calls
Declarations
Unused identifiers
Ordering of definitions within aggregate types
Evolution of source code
Function/method modification
Stories told by data
Introduction
Finding patterns in data
Initial data exploration
Guiding the eye through data
Smoothing data
Densely populated measurement points
Visualizing a single column of values
Relationships between items
3-dimensions
Communicating a story
What kind of story?
Technicalities should go unnoticed
People have color vision
Color palette selection
Plot axis: what and how
Communicating numeric values
Communicating fitted models
Probability
Introduction
Useful rules of thumb
Measurement scales
Probability distributions
Are two sample drawn from the same distribution?
Fitting a probability distribution to a sample
Zero-truncated and zero-inflated distributions
Mixtures of distributions
Heavy/Fat tails
Markov chains
A Markov chain example
Social network analysis
Combinatorics
A combinatorial example
Generating functions
Statistics
Introduction
Statistical inference
Samples and populations
Effect-size
Sampling error
Statistical power
Describing a sample
A central location
Sensitivity of central location algorithms
Geometric mean
Harmonic mean
Contaminated distributions
Compositional data
Meta-Analysis
Statistical error
Hypothesis testing
p-value
Confidence intervals
The bootstrap
Permutation tests
Comparing samples
Building regression models
Comparing sample means
Comparing standard deviation
Correlation
Contingency tables
ANOVA
Regression modeling
Introduction
Linear regression
Scattered measurement values
Discrete measurement values
Uncertainty only exists in the response variable
Modeling data that curves
Visualizing the general trend
Influential observations and Outliers
Diagnosing problems in a regression model
A model's goodness of fit
Abrupt changes in a sequence of values
Low signal-to-noise ratio
Moving beyond the default Normal error
Count data
Continuous response variable having a lower bound
Transforming the response variable
Binary response variable
Multinomial data
Rates and proportions response variables
Multiple explanatory variables
Interaction between variables
Correlated explanatory variables
Penalized regression
Non-linear regression
Power laws
Mixed-effects models
Generalised Additive Models
Miscellaneous
Advantages of using lm
Very large datasets
Alternative residual metrics
Quantile regression
Extreme value statistics
Time series
Cleaning time series data
Modeling time series
Building an ARMA model
Non-constant variance
Smoothing and filtering
Spectral analysis
Relationships between time series
Miscellaneous
Survival analysis
Kinds of censoring
Input data format
Survival curve
Regression modeling
Cox proportional-hazards model
Time varying explanatory variables
Competing risks
Multi-state models
Circular statistics
Circular distributions
Fitting a regression model
Linear response with a circular explanatory variable
Compositional data
Miscellaneous techniques
Introduction
Machine learning
Decision trees
Clustering
Sequence mining
Ordering of items
Seriation
Preferred item ordering
Agreement between raters
Simulation
Experiments
Introduction
Measurement uncertainty
Design of experiments
Subjects
The task
What is actually being measured?
Adapting an ongoing experiment
Selecting experimental options
Factorial designs
Benchmarking
Following the herd
Variability in today's computing systems
Hardware variation
Software variation
The cloud
End user systems
Surveys
Data preparation
Introduction
Documenting cleaning operations
Outliers
Malformed file contents
Missing data
Handling missing values
NA handling by library functions
Restructuring data
Reorganizing rows/columns
Miscellaneous issues
Application specific cleaning
Different name, same meaning
Multiple sources of signals
Duplicate data
Default values
Resolution limit of measurements
Detecting fabricated data
Overview of R
Your first R program
Language overview
Differences between R and widely used languages
Objects
Operations on vectors
Creating a vector/array/matrix
Indexing
Lists
Data frames
Symbolic forms
Factors and levels
Operators
Testing for equality
Assignment
The R type (mode) system
Converting the type (mode) of a value
Statements
Defining a function
Commonly used functions
Input/Output
Graphical output
Non-statistical uses of R
Very large datasets