Last updated: 2020-06-23
Checks: 7 0
Knit directory: MSTPsummerstatistics/
This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20180927)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: data/.DS_Store
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 61e3892 | Anthony Hung | 2020-06-23 | add cheatsheet links |
html | 2a37983 | Anthony Hung | 2020-06-23 | Build site. |
html | a1a0bd4 | Anthony Hung | 2020-06-23 | Build site. |
html | cdeb6c5 | Anthony Hung | 2020-06-22 | Build site. |
html | 14c0094 | Anthony Hung | 2020-06-12 | Build site. |
html | 1378fca | Anthony Hung | 2020-06-12 | Build site. |
html | 9bb0ed6 | Anthony Hung | 2020-05-12 | Build site. |
html | 2114e6c | Anthony Hung | 2020-05-10 | Build site. |
html | 29c91df | Anthony Hung | 2020-05-10 | Build site. |
html | a6d0787 | Anthony Hung | 2020-05-09 | Build site. |
html | e18c369 | Anthony Hung | 2020-05-02 | Build site. |
html | 0e6b6d0 | Anthony Hung | 2020-04-30 | Build site. |
html | 5cbe42c | Anthony Hung | 2020-04-23 | Build site. |
html | 4e08935 | Anthony Hung | 2020-03-30 | Build site. |
html | f15db48 | Anthony Hung | 2020-03-30 | Build site. |
html | 310d040 | Anthony Hung | 2020-02-20 | Build site. |
html | 5a37a3e | Anthony Hung | 2020-02-14 | Build site. |
html | 96722bd | Anthony Hung | 2019-08-07 | Build site. |
html | 15ca1f1 | Anthony Hung | 2019-07-18 | Build site. |
html | a3aa9e0 | Anthony Hung | 2019-07-18 | Build site. |
html | ceb577e | Anthony Hung | 2019-07-12 | Build site. |
Rmd | 6234571 | Anthony Hung | 2019-07-12 | commit changes |
html | 6234571 | Anthony Hung | 2019-07-12 | commit changes |
html | 397882b | Anthony Hung | 2019-05-30 | Build site. |
Rmd | 2debade | Anthony Hung | 2019-05-30 | commit before republish |
html | 2debade | Anthony Hung | 2019-05-30 | commit before republish |
html | 6d3e1c8 | Anthony Hung | 2019-05-28 | Build site. |
Rmd | 4fcd1d0 | Anthony Hung | 2019-05-28 | commit before republish |
html | 4fcd1d0 | Anthony Hung | 2019-05-28 | commit before republish |
html | c117ef1 | Anthony Hung | 2019-05-28 | Build site. |
Rmd | c3c7b6e | Anthony Hung | 2019-05-28 | Bayesian |
html | c3c7b6e | Anthony Hung | 2019-05-28 | Bayesian |
html | b291d24 | Anthony Hung | 2019-05-24 | Build site. |
html | 4e210d6 | Anthony Hung | 2019-05-24 | Build site. |
html | c4bdfdc | Anthony Hung | 2019-05-22 | Build site. |
Rmd | 4ce8e85 | Anthony Hung | 2019-05-21 | bandersnatch add |
html | 096760a | Anthony Hung | 2019-05-19 | Build site. |
Rmd | 193ab25 | Anthony Hung | 2019-05-19 | additions to complete mult testing |
html | 193ab25 | Anthony Hung | 2019-05-19 | additions to complete mult testing |
html | da98ae8 | Anthony Hung | 2019-05-18 | Build site. |
Rmd | 239723e | Anthony Hung | 2019-05-08 | Update learning objectives |
html | 2ec7944 | Anthony Hung | 2019-05-06 | Build site. |
Rmd | d45dca4 | Anthony Hung | 2019-05-06 | Republish |
html | d45dca4 | Anthony Hung | 2019-05-06 | Republish |
Rmd | ee75486 | Anthony Hung | 2019-05-05 | Build site. |
Markov chains are stochastic processes that describe the sequence of possible countable events for a system in which the probability of transitions from each event to the next is dependent only on the event immediately preceeding that event. Markov chains are a staple in computational statistics. Our objective today is to learn the basics behind discrete time Markov Chains and their long-run behavior.
The Markov assumption assumes that in order to predict the future behavior of a system, all that is required is knowledge of the present state of the system and not the past state of the system. For example, given a set of times \(t_1, t_2, t_3, t_4\) and states \(X_1, X_2, X_3, X_4\), under the Markov assumption or Markov property:
\[P(X_4=1|X_3=0, X_2=1, X_1=1) = P(X_4=1|X_3=0)\]
In other words, “the past and the future are conditionally independent given the present”. If we have knowledge about the present, then knowing the past does not give us any more information to predict what will happen in the future. Another term that is commonly used to describe Markov chains is “memorylessness.”
Question: What distribution that we have discussed in probability is also described by the property of “memorylessness”?
The central dogma of biology describes how information moves from DNA to RNA to Protein.
\[DNA \rightarrow RNA \rightarrow Protein\]
The assumption under the central dogma is that information flows only in one direction, and never backwards. Under a Markov chain model of the central dogma, the amount of RNA you observe in a cell is some function of the genetic variations seen at the DNA sequence level (in coding and noncoding regulatory regions), and the amount of protein you see in the cell is some function of the abundance of RNA transcripts in the cell coding for that protein. If you know the amount of RNA in the cell, then knowing the underlying DNA sequence of the cell at the gene encoding the protein does not give you more information to better predict the amount of protein in the cell. Obviously, there are exceptions to such a simple model of biology, but in the vast majority of cases this model does a very good job of describing biological networks.
A Markov chain describing the states that a time-dependent random variable X_t takes on at each time-step t is fully determined by two elements:
A transition probability matrix (P) that defines the transition probabilities between each pair of states i and j (\(P_{ij} = P(X_t = j | X_{t-1}=i)\)).
An initial probability distribution vector (\(x_0\)) containing values for \(P(X_0 = i)\) for each state i.
With these two quantities, we can compute the probability that a Markov chain takes on any given state at any given time.
Let’s explore an example of a Markov Chain. For example, let’s say that we live in a world where the weather is very predictable. There are only 3 states of weather (X=s when it’s Sunny, X=c when it’s Cloudy, and X=r when it’s rainy), and the weather is always the same for the whole day. On a day when it is sunny, the probability that it will be sunny the next day is 0.6, the probabilty that it will be cloudy the next day is 0.3, and the probability that it will be rainy the next day is 0.1. On a day when it is cloudy, the probability that the next day is sunny is 0.2, the probability that the next day is cloudy is 0.3, and the probability that the next day is rainy is 0.5. On a day when it is rainy, the probabilty that the next day will be rainy is 0.5, the probability that next day will be sunny is 0.4, and the probability that the next day will be cloudy is 0.1. A visual depiction of these transition probabilities can be found here: https://stackoverflow.com/questions/36574814/creating-three-state-markov-chain-plot. We can also create a transition probability matrix in R, assuming that the order of the states is s,c,r:
P <- matrix(c(c(0.6,0.2,0.4),c(0.3,0.3,0.1),c(0.1,0.5,0.5)),nrow=3)
P
[,1] [,2] [,3]
[1,] 0.6 0.3 0.1
[2,] 0.2 0.3 0.5
[3,] 0.4 0.1 0.5
The matrix is constructed with i in the rows, and j in the columns. Each cell is filled in with the values for \(P_{ij} = P(X_t = j | X_{t-1}=i)\). Each of the rows must sum to 1, since the matrix describes all possible transitions.
Let’s also assume that we have an initial probability vector \(x_0\):
x_0 <- c(0.8, 0.1, 0.1)
In this case, this means on day 0 there is a 0.8 probability that it is sunny, 0.1 probability that it is cloudy, and 0.1 probability that it is rainy. The vector must also sum to 1, since it must fully describe all possible states on day 0.
If we want to compute the probability that on day 1, the weather will be sunny, we must solve the following equation:
\[P(X_1 = sunny) = \sum\limits_i P(X_0 = i)P(X_1 = sunny|X_0 = i)\]
We could do this manually using our vector \(x_0\) and P:
\[P(X_1 = sunny) = 0.8*0.6 + 0.1*0.2 + 0.1*0.4 = 0.54\]
The probability that day 1 is sunny is 0.54. But what if we want to calculate the probability that day 2 is sunny? We would need to first find the state probabilities for all three states for day 1 (repeating what we did above two more times), then plug them in to the following equation:
\[P(X_2 = sunny) = \sum\limits_i P(X_1 = i)P(X_2 = sunny|X_1 = i)\]
You can appreciate that manually summing over all possible states would quickly get unwieldy, multiplying the number of calculations you need to perform by 3 for each increasing day.
Fortunately, we can use matrix algebra to do all this multiplying and summing for us. In R, if we would like to calculate \(x_1\), our vector of state probabilities on day 1 is:
x_0%*%P
[,1] [,2] [,3]
[1,] 0.54 0.28 0.18
We can do this for any number of days. For example, for the vector of state probabilities after 10 days:
library(expm) #the expm package allows us to raise a matrix to a power
Loading required package: Matrix
Attaching package: 'expm'
The following object is masked from 'package:Matrix':
expm
x_0 %*% (P %^% 10)
[,1] [,2] [,3]
[1,] 0.4411769 0.2352953 0.3235278
Note that in our simple example, you may ask yourself “what if the probabilty that it rains tomorrow depends not only on the weather today, but also yesterday?” You can actually slightly change our Markov chain to account for this! Instead of representing your states as S, C, and R, you can easily imagine changing our states to SS, SC, SR, CS, CC, CR, RS, RC, and RR to now create a Markov chain in which the probability of weather tomorrow only depends on the weather over the past two days. Now you can see why Markov chains are so powerful!
For a certain class of Markov chains, called ergodic Markov chains (https://brilliant.org/wiki/stationary-distributions/), there exists a stationary probability distribution, or an equillibrium distribution of possible states that a Markov chain will converge upon after many many iterations. To see what this means, let’s see what the probability distributions are for the weather on the 100th day given our values for P and \(x_0\).
x_0 %*% (P %^% 100)
[,1] [,2] [,3]
[1,] 0.4411765 0.2352941 0.3235294
What about on the 101st day? 102nd?
x_0 %*% (P %^% 101)
[,1] [,2] [,3]
[1,] 0.4411765 0.2352941 0.3235294
x_0 %*% (P %^% 102)
[,1] [,2] [,3]
[1,] 0.4411765 0.2352941 0.3235294
You can see that the probabilities have convereged upon an equillibrium vector which does not change even as we step forward into the future. In fact, this equillibrium or stationary distribution is unique to the transition probability matrix and does not depend on \(x_0\):
c(1, 0, 0) %*% (P %^% 101)
[,1] [,2] [,3]
[1,] 0.4411765 0.2352941 0.3235294
c(0, 1, 0) %*% (P %^% 102)
[,1] [,2] [,3]
[1,] 0.4411765 0.2352941 0.3235294
This makes sense because if you go far enough into the future, the current state becomes less and less important and gives you less and less information to predict what the states will look like in the future. If I know it is cloudy today, I can probably do a good job of predicting if it will rain tomorrow. But I probably will not be better at predicting if it will rain 10 years from now compared to if I knew that today was sunny.
If we did not want to simply raise a matrix to a large power in order to find a stationary distribution, eigenvalue decomposition (beyond the scope of this class) could be used to determine the stationary probability.
Note that not all Markov chains will have a unique stationary distribution. For example, consider the following transition matrix describing the transitions between two states:
matrix(c(0,1,1,0), ncol =2, byrow=F)
[,1] [,2]
[1,] 0 1
[2,] 1 0
This is called a periodic Markov Chain.
The Markov chain with this transition probability matrix will also not reach a unique stationary distribution. Why?
matrix(c(0.5,0.5, 0, 0, 0, 0.5, 0.5, 0, 0, 0, 0, 0, 0.3, 0.4, 0.3, 0, 0, 0.3, 0.3, 0.4, 0, 0, 0.3, 0.3, 0.4), ncol =5, byrow=T)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.5 0.5 0.0 0.0 0.0
[2,] 0.5 0.5 0.0 0.0 0.0
[3,] 0.0 0.0 0.3 0.4 0.3
[4,] 0.0 0.0 0.3 0.3 0.4
[5,] 0.0 0.0 0.3 0.3 0.4
Markov Chains like these are known as reducible Markov chains.
Markov chains that reach a stationary distribution are Ergodic, or aperiodic and irreducible.
Why do we care about Markov chains? Besides being able to predict the behavior and states of systems, there are many practical applications. For example, if you’ve ever used a smartphone keyboard and seen the suggestions for the next word, you’ve made use of a Markov chain! The subreddit /r/SubredditSimulator generates complete posts using Markov Chains created from transition matrices generated using words included in the last 500 submissions in a variety of subreddit communities on reddit.
Perhaps more relevant to us in Biology and Bayesian statistics, Markov chains are an important element of Markov Chain Monte Carlo methods.
As the name may suggest, MCMC methods bring together two concepts: Markov chains and monte carlo methods.
Markov chains are stochastic processes where future states only depend on the last state (the markov property)
Monte Carlo methods rely on simulating a large number of random numbers to estimate properties of a distribution. An example is here, where we use monte carlo methods to estimate pi: https://academo.org/demos/estimating-pi-monte-carlo/. We also use Monte Carlo methods to estimate population means through using sample means. If we were to take a larger and larger number of samples from the population (similar to simulating more and more points on the square), then our estimate of the population mean (the sample mean) would become more and more precice (Question: what mathematical law predicts this behavior?
Markov Chain Monte Carlo techniques use Markov chains to sample repeatedly from a complicated or unnamed distribution, then use Monte Carlo methods to estimate properties of that distribution. Why would we ever need to do this? Let’s return to our previous exploration of Bayes theorem
Recall that under Bayes theorem,
\[Posterior\_probability \propto prior\_probability * Likelihood\] Although there are cases where multiplying two distributions (our prior probability distribution and the probability distribution of our likelihood) will result in a posterior probability that is named and well-defined (e.g. the normal distribution, or binomial distribution), many times it will actually result in an unnamed distribution that we cannot easily work with. In these cases, MCMC methods are used to sample from the unnamed posterior probability distribution, and statistics are calculated on the samples to determine the mean of the distribution, the median, the 95% credible interval, etc of this unnamed distribution.
MCMC methods construct a Markov chain that tends to stay in states that are of higher probability in the posterior distribution, and tends to avoid states that are of lower probability. The equillibrium distribution of the MCMC will therefore be the posterior distribution, and in the long term the distribution of states that are sampled by that MCMC will correspond to the posterior distribution.
The algorithm that is commonly used to construct MCMCs is called the Metropolis-Hastings algorithm: https://stephens999.github.io/fiveMinuteStats/MH_intro.html
Use R and matrix multiplication to solve these problems From Sheldon Ross’s Introduction to Probability Models, 11th ed.:
Coin 1 comes up heads with probability 0.6 and coin 2 with probability 0.5. A coin is continually flipped until it comes up tails, at which time that coin is put aside and we start flipping the other one. If we start the process with coin 1 what is the probability that coin 2 is used on the fifth flip?
Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2 has probability 0.6 of coming up heads. If the coin flipped today comes up heads, then we select coin 1 to flip tomorrow, and if it comes up tails, then we select coin 2 to flip tomorrow. If the coin initially flipped is equally likely to be coin 1 or coin 2, then what is the probability that the coin flipped on the third day after the initial flip is coin 1? Suppose that the coin flipped on Monday comes up heads. What is the probability that the coin flipped on Friday of the same week also comes up heads?
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] expm_0.999-4 Matrix_1.2-18
loaded via a namespace (and not attached):
[1] workflowr_1.5.0 Rcpp_1.0.4.6 lattice_0.20-38 rprojroot_1.3-2
[5] digest_0.6.25 later_1.0.0 grid_3.6.3 R6_2.4.1
[9] backports_1.1.6 git2r_0.26.1 magrittr_1.5 evaluate_0.14
[13] stringi_1.4.5 rlang_0.4.5 fs_1.3.1 promises_1.1.0
[17] whisker_0.4 rmarkdown_1.18 tools_3.6.3 stringr_1.4.0
[21] glue_1.4.0 httpuv_1.5.2 xfun_0.12 yaml_2.2.1
[25] compiler_3.6.3 htmltools_0.4.0 knitr_1.26