Data Science for Genomics

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data Science for Genomics presents the foundational concepts of data science as they pertain to genomics, encompassing the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making. Sections cover Data Science, Machine Learning, Deep Learning, data analysis, and visualization techniques. The authors then present the fundamentals of Genomics, Genetics, Transcriptomes and Proteomes as basic concepts of molecular biology, along with DNA and key features of the human genome, as well as the genomes of eukaryotes and prokaryotes.

Techniques that are more specifically used for studying genomes are then described in the order in which they are used in a genome project, including methods for constructing genetic and physical maps. DNA sequencing methodology and the strategies used to assemble a contiguous genome sequence and methods for identifying genes in a genome sequence and determining the functions of those genes in the cell. Readers will learn how the information contained in the genome is released and made available to the cell, as well as methods centered on cloning and PCR.

Author(s): Amit Kumar Tyagi, Ajith Abraham
Publisher: Academic Press
Year: 2022

Language: English
Pages: 312
City: London

Front Cover
Data Science for Genomics
Data Science for Genomics
Copyright
Contents
Contributors
Preface
Acknowledgment
1 - Genomics and neural networks in electrical load forecasting with computational intelligence
1. Introduction
2. Methodology
2.1 RNN
2.2 Long short-term memory
3. Experiment evaluation
3.1 Testing methods effectiveness for PGVCL data
3.2 Testing methods effectiveness for NYISO data
4. Conclusion
References
2 - Application of ensemble learning–based classifiers for genetic expression data classification
1. Introduction
2. Ensemble learning–based classifiers for genetic data classification
2.1 Bagging
2.2 Boosting
2.3 Stacking
3. Stacked ensemble classifier for leukemia classification
3.1 Proposed classification model
3.2 Deep-stacked ensemble classifier
3.3 SVM meta classifier
3.4 Gradient boosting meta classifier
4. Results and discussion
5. Conclusion
References
3 - Machine learning in genomics: identification and modeling of anticancer peptides
1. Introduction
2. Materials and methods
2.1 Google Colaboratory
2.2 Data sets
2.3 Pfeature package
2.4 Feature extraction functions
2.4.1 Amino acid composition
2.4.2 Dipeptide composition
2.4.3 Conjoint triad calculation
2.4.4 Distance distribution of residues
2.5 Machine learning implementation
2.5.1 Data preprocessing
2.5.2 Lazy predict
2.5.3 Modeling
2.5.3.1 Model evaluation when AAC feature function is used
2.5.3.2 Model evaluation when DPC feature function is used
2.5.3.3 Model evaluation when CTC feature function is used
2.5.3.4 Model evaluation when DDOR feature function is used
2.5.4 Results
2.6 Conclusion
References
4 - Genetic factor analysis for an early diagnosis of autism through machine learning
1. Introduction
2. Review of literature
3. Methodology
3.1 Using KNIME software
3.2 Data set analysis through ML algorithms
3.3 Naive Bayes learner
3.4 Fuzzy rule learner
3.5 Decision tree learner
3.6 RProp MLP learner
3.7 Random forest learner
3.8 SVM learner
3.9 K-nearest neighbors learner
3.10 Gradient boosted trees learner
3.11 K-means clustering
4. Results
4.1 Graphs obtained
4.2 Inference
5. Conclusion
Appendix
References
5 - Artificial intelligence and data science in pharmacogenomics-based drug discovery: Future of medicines
1. Introduction
2. Artificial intelligence
3. Artificial intelligence in drug research
4. Drug discovery
4.1 Drug screening
4.2 Drug designing
4.3 Drug repurposing
4.4 ADME prediction
4.5 Dosage form and delivery system
4.6 PK/PD correlation
5. Pharmacogenomics
6. Pharmacogenomics and AI
7. Integration of pharmacogenomics and AI
8. Pharmacogenomic-based clinical evaluation and AI
9. Discussion
10. Conclusion
Abbreviations
References
6 - Recent challenges, opportunities, and issues in various data analytics
1. Introduction
2. Big data
3. Data analytics
4. Challenges in data analytics
5. Various sectors in data analytics
6. Conclusion
References
7 - In silico application of data science, genomics, and bioinformatics in screening drug candidates against COVID- ...
1. Introduction
1.1 A brief overview of SARS-CoV-2
1.2 Compounds reported with antiviral activities
1.3 Herb extracts with antiviral property in India
2. Materials and method
2.1 Target protein preparation
2.2 Ligand preparation
2.3 Binding site/catalytic site prediction
2.4 Structure minimization
2.5 Grid generation
2.6 Molecular docking of protein–ligand using Autodock software
2.7 Hydrogen bond interaction using LigPlot software
2.8 Screening of compounds for drug likeness
2.9 Screening of compounds for toxicity
3. Results and discussion
4. Conclusion
Declaration
Nomenclature
Acknowledgments
References
8 - Toward automated machine learning for genomics: evaluation and comparison of state-of-the-art AutoML approaches
1. Into the world of genomics
2. Need and purpose of analytics in genomics
3. Literature review
4. Research design
4.1 Research design methodology
4.2 AutoML tools used: PyCaret and AutoViML
5. AutoML
5.1 Why AutoML and why it should be democratized
5.2 Architectural design of AutoML
5.3 Democratization of AutoML and beyond
6. Research outcome
6.1 Exploratory data analysis
6.2 Analysis using PyCaret
6.3 Analysis using AutoViML
6.4 Model comparison: PyCaret and AutoViML
7. Business implications
8. Conclusion
References
Further reading
9 - Effective dimensionality reduction model with machine learning classification for microarray gene expression data
1. Introduction
2. Related work
3. Materials and methods
3.1 Feature selection
3.2 Principal component analysis
3.3 Logistic regression
3.4 Extremely randomized trees classifier
3.5 Ridge classifier
3.6 Adaboost
3.7 Linear discriminant analysis
3.8 Random forest
3.9 Gradient boosting machine
3.10 K-nearest neighbors
3.11 Data set used for analysis
4. Results and discussion
4.1 Experimental analysis on 10-fold cross-validation
4.2 Experimental analysis on eightfold cross-validation
4.3 Comparison of our findings with some earlier studies
5. Conclusion and future work
References
10 - Analysis the structural, electronic and effect of light on PIN photodiode achievement through SILVACO software ...
1. Introduction
1.1 Photodiode
1.2 Effect of light on the I–V characteristics of photodiodes [1,7,8]
1.3 I–V characteristics of a photodiode
1.4 Types of photodiodes [11,19,20,21]
1.5 Modes of operation of a photodiode
1.6 Effect of temperature on I–V char of photodiodes
1.7 Signal-to-noise ratio in a photodiode
1.8 Responsivity of a photodiode
1.9 Responsivity versus wavelength
2. PIN photodiode [23,24]
2.1 Operation of PIN photodiode
2.2 Key PIN diode characteristics
2.3 PIN diodes uses and advantages
2.4 PIN photodiode applications
3. Results and simulations
3.1 Effect of light on a PIN photodiode
3.2 Procedure to design and observe the effect of light
3.3 V–I characteristic of a PIN photodiode
4. Conclusion
Appendix (Silvaco Code)
Effect of light on the characteristics of pin diode code
Effect of light on the characteristics of SDD diode code
References
11 - One step to enhancement the performance of XGBoost through GSK for prediction ethanol, ethylene, ammonia, acet ...
1. Introduction
2. Related work
3. Main tools
3.1 Internet of Things (IoTs) [27,28]
3.2 Optimization techniques [9-11]
3.2.1 Particle swarm optimization (PSO)algorithm
3.2.2 Genetic algorithm (GA) [14,22]
3.2.3 Ant lion optimizer (ALO) [13]
3.2.4 Gaining-sharing knowledge-based algorithm (GSK) [14,22]
3.3 Prediction techniques
3.3.1 Decision tree (DT) [15,23]
3.3.2 Extra trees classifier (ETC) [16]
3.3.3 Random forest (RF)
3.3.4 Extreme gradient boosting (XGBoost) [18,25]
3.3.4.1 What is the secret behind XGBoost's success?
3.3.4.2 What are the advantages of the XGBoost algorithm over the regular GBM method?
3.3.4.3 Evaluation measures [19,20]
3.3.4.3.1 Accuracy (A)
3.3.4.3.2 Precision (P)
3.3.4.3.3 Recall (R)
3.3.4.3.4 F-measurement (F)
3.3.4.3.5 Fβ
4. Result of implementation
4.1 Description of dataset
4.2 Result of preprocessing
4.3 Checking missing values
4.3.1 Correlation
4.3.2 Results of DXGBoost-GSK
5. Conclusions
References
12 - A predictive model for classifying colorectal cancer using principal component analysis
1. Introduction
2. Related works
3. Methodology
3.1 Experimental dataset
3.2 Dimensionality reduction tool
3.3 Classification
3.3.1 Support vector machine (SVM)
3.3.2 K-nearest neighbor
3.3.3 Random forest
3.4 Research tool
3.5 Performance evaluation metrics
4. Results and discussions
5. Conclusion
References
13 - Genomic data science systems of Prediction and prevention of pneumonia from chest X-ray images using a two-cha ...
1. Introduction
2. Review of literature
2.1 Introduction
2.2 Convolutional neural networks (CNNs)
3. Materials and methods
3.1 Dataset
3.2 The proposed architecture: two-channel dual-stream CNN (TCDSCNN) model
3.3 Performance matrix for classification
3.3.1 Confusion matrix
4. Result and discussion
4.1 Visualizing the intermediate layer output of CNN
4.2 Model feature map
4.3 Model accuracy
5. Conclusion and future work
References
14 - Predictive analytics of genetic variation in the COVID-19 genome sequence: a data science perspective
1. Introduction
1.1 Objectives
2. Related work
3. The COVID-19 genomic sequence
3.1 The relevance of genome sequences to disease analyses
3.2 Utilization of COVID-19 genome sequencing for processing
4. Methodology
4.1 Implementation analysis
Lung epithelial similarity
5. Discussion
6. Conclusion
7. Future outlook
References
Further reading
15 - Genomic privacy: performance analysis, open issues, and future research directions
1. Introduction
1.1 Genome data
1.2 Genomic data versus other types of data
2. Related work
3. Motivation
4. Importance of genomic data/privacy in real life
5. Techniques for protecting genetic privacy
5.1 Controlled access
5.2 Differential privacy preservation
5.3 Cryptographic solutions
5.4 Other approaches
5.5 Some useful suggestions for protecting genomic data
6. Genomic privacy: use case
7. Challenges in protecting genomic data
8. Opportunities in genomic data privacy
9. Arguments about genetic privacy with several other privacy areas
10. Conclusion with future scope
Appendix A
Authors' contributions
Acknowledgments
References
16 - Automated and intelligent systems for next-generation-based smart applications
1. Introduction
2. Background work
3. Intelligent systems for smart applications
4. Automated systems for smart applications
5. Automated and intelligent systems for smart applications
6. Machine learning and AI technologies for smart applications
7. Analytics for advancements
8. Cloud strategies: hybrid, containerization, serverless, microservices
9. Edge intelligence
10. Data governance and quality for smart applications
11. Digital Ops including DataOps, AIOps, and CloudSecOps
12. AI in healthcare—from data to intelligence
13. Big data analytics in IoT-based smart applications
14. Big data applications in a smart city
15. Big data intelligence for cyber-physical systems
16. Big data science solutions for real-life applications
17. Big data analytics for cybersecurity and privacy
18. Data analytics for privacy-by-design in smart health
19. Case studies and innovative applications
19.1 Innovative bioceramics
19.1.1 Innovative urban surface flooding modeling in the United Kingdom
19.1.2 Innovative solutions for solar-powered electric mobility applications
20. Conclusion and future scope
Acknowledgments
References
Further reading
17 - Machine learning applications for COVID-19: a state-of-the-art review
1. Introduction
2. Forecasting
3. Medical diagnostics
4. Drug development
5. Contact tracing
6. Conclusion
References
Index
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Back Cover