This book provides a complete practical guide of processing data in public health with R language. On the basis of the author’s research and teaching experiences, this book serves either as a textbook for undergraduates and graduates in public health or as a tutorial for self-learning. Many first-hand examples are presented with source data, R scripts, and graphs, as well as detailed explanations, which could be easily reproduced by readers so as to better understand the data processing principles and procedures. Popular and novel R packages in public health are introduced as well.
Years of teaching R to undergraduates has struck me with the realization that even if students are well equipped with a lot of skills in R, they may find it difficult to pick out the proper tools when they need them. Conversely, if they conduct research without learning R, they may have no idea what R can achieve for them. This book is neither a book for the R language, nor for statistics, nor for Data Science, nor for public health. It is a mixture of them. In this book, I would like to demonstrate how to apply R language to processing data in public health with practical examples. Thus, the preferable usage of this book is a textbook for teaching or learning R language or data science in public health, although it can also be used across disciplines. Data are a set of values of numerical or categorical variables about one or more persons or objects. In scientific research, we process a lot of data before extending our knowledge and making decisions.
As a result of reading this book, readers will not only gain a solid understanding of the R language and its applications in public health but will also develop the skills necessary to work with data in a rigorous and reproducible manner. I hope that this book will serve as a valuable resource for students and practitioners alike and will inspire readers to continue exploring the rich intersection between data analysis and public health.
Data work can be done traditionally in a pen-and-paper mode. Statistical hypothesis testing, for example, might be performed using simple calculation with the assistance of look-up tables. Thanks to the rapid development of modern computer science and technology, it is now possible to select a preferred tool from among a large number of software packages. Microsoft Excel and comparable tools, such as LibreOffice Calc, are frequently used to manage spreadsheet data. In the field of public health, professional data analysis tools such as SPSS, Epi Info, and NVivo are particularly popular with graphic user interfaces (GUI). Aside from GUI-dominated software, programming languages such as MATLAB, Python, and R offer more powerful and versatile alternatives. This book focuses on R because of the extensive ecosystem of R packages and friendly community for wide support. The open-source project RStudio offers an enhanced integrated development interface (IDE) to R user, which simplifies the usage of R. When students’ work is required to be reproducible, R Markdown is the preferred writing tool.
We presume that the readers of this book have a fundamental understanding of R. That is, readers should be familiar with the basics of R, such as variable assignment, vectors, lists, data frames, and functions. Therefore, the purpose of the Chapter 1 is to provide the readers a rapid review of R in order to keep them on track.
Author(s): Peng Zhao
Publisher: Springer/Xian Jiaotong University Press
Year: 2023
Language: English
Pages: 201
1. Preparing Tools
R
RStudio
2. Planning Data
Relevant R Packages
3. Collecting Data
Databases and Tables
4. Importing and Exporting Data
Using RStudio Dialogues
Default Formats in R
5. Cleaning Data
6. Describing Data
7. Analyzing Data
Probability Distribution Functions in R
R Functions and Common Steps
8. Visualizing Data
Plotting Systems in R
Base R
ggplot2
plotly
9. Presenting Data
R Markdown
Set Up R Markdown
10. Managing Data
Data Management Framework
The prodigenr Package
The rosr Package
Metadata in R Packages