Healthcare is the next frontier for data science. Using the latest in machine learning, deep learning, and natural language processing, you'll be able to solve healthcare's most pressing problems: reducing cost of care, ensuring patients get the best treatment, and increasing accessibility for the underserved. But first, you have to learn how to access and make sense of all that data.
This book provides pragmatic and hands-on solutions for working with healthcare data, from data extraction to cleaning and harmonization to feature engineering. Author Andrew Nguyen covers specific ML and deep learning examples with a focus on producing high-quality data. You'll discover how graph technologies help you connect disparate data sources so you can solve healthcare's most challenging problems using advanced analytics.
You'll learn:
• Different types of healthcare data: electronic health records, clinical registries and trials, digital health tools, and claims data
• The challenges of working with healthcare data, especially when trying to aggregate data from multiple sources
• Current options for extracting structured data from clinical text
• How to make trade-offs when using tools and frameworks for normalizing structured healthcare data
• How to harmonize healthcare data using terminologies, ontologies, and mappings and crosswalks
Author(s): Andrew Nguyen
Edition: 1
Publisher: O'Reilly Media
Year: 2022
Language: English
Commentary: Publisher's PDF
Pages: 242
City: Sebastopol, CA
Tags: Machine Learning; Data Analysis; Natural Language Processing; Analytics; Graphs; Ontologies; Data Modeling; Compliance; Healthcare; Federated Learning; Medicine; Graph Embeddings; Data Normalization
Cover
Copyright
Table of Contents
Foreword
Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Introduction to Healthcare Data
The Enterprise Mindset
The Complexity of Healthcare Data
Sources of Healthcare Data
Electronic Health Records
Claims Data
Clinical/Disease Registries
Clinical Trials Data
Data Collection and How That Affects Data Scientists
Prospective studies
Retrospective studies
Conclusion
Chapter 2. Technical Introduction
Basic Introduction to Docker and Containers
Installing and Testing Docker
Conceptual Introduction to Databases
ACID Compliance
OLTP Systems
OLAP Systems
SQL Versus NoSQL
SQL Databases
(Labeled) Property Graph Databases
Hypergraph Databases
Resource Description Framework Databases
Conclusion
Chapter 3. Standardized Vocabularies in Healthcare
Controlled Vocabularies, Terminologies, and Ontologies
Key Considerations
Pre-coordination Versus Post-coordination
Case Study Example: EHR Data
Common Terminologies
CPT
ICD-9 and ICD-10
LOINC
RxNorm
SNOMED CT
Key Takeaways
Using the Unified Medical Language System
Some Basic Definitions
Concept Orientation
Working with the UMLS
UMLS and Relational Databases
Preprocessing the UMLS
UMLS and Property Graph Databases
UMLS and Hypergraph Databases
Review of the UMLS
Conclusion
Chapter 4. Deep Dive: Electronic Health Records Data
Publicly Accessible Data
Medical Information Mart for Intensive Care
Synthea
Data Models
Goals
Examples of Data Models
Case Study: Medications
The Medication Harmonization Problem
Technical Deep Dive
Connecting to the UMLS
Difficulties Normalizing Structured Medical Data
Conclusion
Chapter 5. Deep Dive: Claims Data
Publicly Accessible Data—SynPUF
Data Models
Choosing a Data Model
Combining Claims and EHR Data
Case Study: Combining Diagnoses and Medications
OMOP Versus Graphs
Considerations When Combining Different Sources of Healthcare Data
Conclusion
Chapter 6. Machine Learning and Analytics
A Primer on Machine Learning
What Is Feature Engineering?
Graph-Based Deep Learning
Extracting Data as a Table
To SQL or Not to SQL
Querying OMOP Data
From Graphs to Dataframes
Why Add the Complexity of Graphs?
Machine Learning and Feature Engineering with Graphs
Graph Embeddings
node2vec
cui2vec
med2vec
snomed2vec
Some Final Thoughts About Embeddings
Making the Case for Graph-Based Analysis
Conclusion
Chapter 7. Trends in Healthcare Analytics
Federated Learning and Federated Analytics
How Does Federated Learning Work?
Why Federated Analytics/Learning?
The Data Harmonization Challenge in a Federated Context
Graphs and Federated Approaches
Natural Language Processing
Concept Extraction
Beyond Concept Extraction
Clinical NLP Tools
Commercial Clinical NLP Solutions
Key Differences Between Clinical NLP and Other Applications of NLP
Conclusion
Chapter 8. Graphs, Harmonization, and Some Final Thoughts
Other Types of Healthcare RWD
Data Normalization and Harmonization
Merging Datasets
Bridging IT and the Business
It’s a Human, Not Technical, Problem
Graphs Can Be Part of the Solution
Graphs Are Not a Silver Bullet
Conclusion
Index
About the Author
Colophon