This vignette illustrates the usage of the SNPknock
package for creating knockoff copies of variables distributed as discrete Markov chains and hidden Markov models (Sesia, Sabatti, and Candès 2017). For simplicity, we will use synthetic data.
The SNPknock
package also provides a simple interface to the genotype imputation software fastPhase
, which can be used to fit hidden Markov models for genotype data. Since fastPhase
is not available as an R package, this particular functionality of SNPknock
cannot be demonstrated here. A tutorial showing how to use a combination of SNPknock
and fastPhase
to create knockoff copies of genotype data can be found here: https://web.stanford.edu/~msesia/software.html.
First, we verify that the SNPknock
can be loaded.
library(SNPknock)
We define a Markov chain model with 50 variables, each taking one of 5 possible values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries.
p=50; # Number of variables in the model
K=5; # Number of possible states for each variable
# Marginal distribution for the first variable
pInit = rep(1/K,K)
# Create p-1 transition matrices
Q = array(stats::runif((p-1)*K*K),c(p-1,K,K))
for(j in 1:(p-1)) {
Q[j,,] = Q[j,,] + diag(rep(1,K))
Q[j,,] = Q[j,,] / rowSums(Q[j,,])
}
We can sample 100 independent observations of this Markov chain using the SNPknock
package.
set.seed(1234)
X = SNPknock.models.sampleDMC(pInit, Q, n=100)
print(X[1:5,1:10])
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0 0 2 1 2 3 2 0 0 2
## [2,] 3 3 2 4 2 3 1 3 3 3
## [3,] 3 0 1 2 3 4 0 2 2 4
## [4,] 3 0 3 3 3 0 2 2 1 1
## [5,] 4 0 2 4 3 3 3 1 1 4
Above, each row of X
contains an independent realization of the Markov chain.
A knockoff copy of X
can be sampled as follows.
Xk = SNPknock.knockoffDMC(X, pInit, Q)
print(Xk[1:5,1:10])
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 3 2 2 2 2 2 1 3 0 0
## [2,] 3 3 2 2 1 1 3 3 3 3
## [3,] 0 3 0 3 3 0 0 2 2 2
## [4,] 4 3 3 4 2 0 2 2 2 1
## [5,] 0 0 3 3 3 3 1 3 4 2
If you want to see how to use SNPknock
to create knockoff copies of genotype data, see the genotypes vignette.
Sesia, M., C. Sabatti, and E. J. Candès. 2017. “Gene Hunting with Knockoffs for Hidden Markov Models.” ArXiv E-Prints, June. https://arxiv.org/abs/1706.04677.