Biomedical Data Science
Welcome
Preface
Introduction for readers
What you will learn from this course/book
What we recommend you do while reading this book
Other reference books
Acknowledgements
I Data Science Foundations
1
Introduction to R programming
1.1
Data types
1.1.1
nemeric (or double)
1.1.2
integer
1.1.3
logical
1.1.4
character
1.1.5
Memeory usage
1.2
Data structures
1.2.1
Vector
1.2.2
Matrix
1.2.3
List
1.2.4
Data Frame
1.2.5
Factor vs vector
1.3
Read and write files (tables)
1.3.1
Read file
1.3.2
Write file
1.4
Functions and Packages
1.4.1
Install packages
1.4.2
Apply function repeatly
1.4.3
Pattern match
1.5
Flow Control
1.5.1
Logical operator
1.5.2
if-else statements
1.5.3
for-loop
1.6
Plotting
1.6.1
datasets
1.6.2
Basic plotting
1.6.3
ggplot2
1.7
Scientific computating
1.7.1
Orders of operators
1.7.2
Functions for statistics
1.7.3
Correlation
1.7.4
Hypothesis testing (t test)
1.7.5
Regression
1.7.6
Resource links
1.7.7
Coding styling
1.8
Exercises
1.8.1
Part 1. Basics (~40min)
1.8.2
Part 2. Making plotting (~40min)
1.8.3
Part 3. For loop and repeating processing (~40min)
2
Introduction to Hypothesis testing
2.1
Hypothesis testing and
p
value
2.1.1
Example 1: probability of rolling a six?
2.2
Permutation test
2.2.1
Example 2: difference in birth weight
2.2.2
Null distribution approximated by resampling
2.3
t
test
2.3.1
Derivation of t distribution
2.3.2
Direct use of
t.test()
2.4
regression-based test
2.5
Multiple testing
2.5.1
Null distribution (of test statistic)
2.5.2
Null distribution of p value
2.5.3
Minimal p values in 10 tests
2.6
Explore power and sample size (optional) { power }
3
Introduction to Linear Regression
3.1
Linear Regression Using Simulated Data
3.1.1
Simulating data:
3.1.2
Model efficacy
3.1.3
R-Squared
3.2
Least Squares Using Simulated Data
3.3
Diagnostic check of a fitted regression model
3.3.1
Residual Standard Errors (RSE)
3.3.2
p-values
3.3.3
F-statistics
3.4
Simple Linear Regression with
lm
function
3.5
Multiple Regression with
lm
function
4
Introduction to Classification
4.1
Visualise logistic and logit functions
4.1.1
Logistic function
4.1.2
Logit function
4.1.3
Visualise the distribution
4.2
Logistic regression on Diabetes
4.2.1
Load Pima Indians Diabetes Database
4.2.2
Fit logistic regression
4.2.3
Assess on test data
4.2.4
Model selection and diagnosis
4.3
Cross-validation
4.4
More assessment metrics
4.4.1
Two types of error
4.4.2
ROC curve
4.4.3
Homework
II Biomedical Data Modules
5
Medical Image and Digital Health
6
Cancer genomics and epidemiology
6.1
Case study 1: analysis of cBioportal mutation data
6.1.1
Exploratory analysis
6.1.2
Statistical analysis
6.1.3
Literature search
6.2
Case study 2: Cancer Epidemiology
6.2.1
Scenario
6.2.2
Hong Kong population
6.2.3
Cancer registry data
6.2.4
Existing cancer funding and publication data
6.2.5
Open discussion
7
Population Genetics and Diseases
7.1
Case study 1: Heritability and human traits
7.1.1
Part 1
7.1.2
Part 2
7.1.3
References
III Appendix
Appendix A: Install R & RStudio
A.1 Install R (>=4.3.1)
R on Windows
R on macOS
R on Linux (Ubuntu)
A.2 Install RStudio
A.3 Use R inside RStudio
R studio
Set working directory
Some general knowledge
Install packages
A4. Cloud computing
References
Published with bookdown
Biomedical Data Science - introduction with case studies
References