Install from CRAN:
install.packages("survMS")
Or install the development version from Github:
install.packages("devtools")
devtools::install_github("mathildesautreuil/survMS")
And load the library:
library(survMS)
#> Loading required package: ggplot2
#> Loading required package: ComplexHeatmap
#> Loading required package: grid
#> ========================================
#> ComplexHeatmap version 2.13.1
#> Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
#> Github page: https://github.com/jokergoo/ComplexHeatmap
#> Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
#>
#> If you use it in published research, please cite:
#> Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
#> genomic data. Bioinformatics 2016.
#>
#> The new InteractiveComplexHeatmap package can directly export static
#> complex heatmaps into an interactive Shiny app with zero effort. Have a try!
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(ComplexHeatmap))
#> ========================================
#> Loading required package: circlize
#> ========================================
#> circlize version 0.4.15
#> CRAN page: https://cran.r-project.org/package=circlize
#> Github page: https://github.com/jokergoo/circlize
#> Documentation: https://jokergoo.github.io/circlize_book/book/
#>
#> If you use it in published research, please cite:
#> Gu, Z. circlize implements and enhances circular visualization
#> in R. Bioinformatics 2014.
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(circlize))
#> ========================================
#>
#> Attaching package: 'survMS'
#> The following object is masked from 'package:ComplexHeatmap':
#>
#> Heatmap
survMS is an R package that generates survival data from different survival models with varying levels of complexity. Three survival models are implemented in the package to simulate survival data: a Cox model, an Accelerated Failure Time (AFT) model, and an Accelerated Hazard (AH) model. These models have their specificities. Indeed, a Cox model is a proportional risk model, not an AFT and an AH model. A proportional risk model means that the ratio of the risks between two individuals with different explanatory variables does not depend on the time. In an AFT model, the explanatory variables have an accelerating or decelerating effect on individuals’ survival. In both models, a Cox model and an AFT model, the survival function’s curves never intersect. When one is interested in more complex data with intersecting survival curves, for example, to make it more difficult for methods to predict survival, two approaches are implemented in the package. The first approach consists of modifying an AFT model to have intersecting survival curves. The second approach concerns using of an AH model (for Accelerated Hazards) to generate the survival data. an AH model is more flexible than the two models mentioned above. In an AH model, the variables accelerate or decelerate the hazard risk. The survival curves of an AH model can therefore cross each other. The generation of survival times carried out from the different models mentioned above. The baseline risk function of the models is assumed to be known and follows a particular probability distribution. For these different models, we use parametric distribution of the baseline risk for the generation of survival time.
Note that we stayed in a linear dependency framework without interaction. We will adapt the package in a second step to be in a nonlinear framework to simulate interactions. The package’s implementation is easily adapted to the nonlinear framework with interactions by replacing the explanatory variables’ linear part with a nonlinear function of the explanatory variables. All the functions in the package are coded to introduce this nonlinear function of the explanatory variables easily.
For details on methodology, see Vignette.
The package survMS can simulate data from a Cox model. We have chosen to carry out this simulation to generate survival data that respects the proportional risk hypothesis. For more details, see Vignette. Survival times can follow different distributions: Weibull, log-normal, exponential (in progress) and Gompertz (in progress).
To simulate survival data checking the proportional risk hypothesis, we use a Cox model with survival times following a Weibull distribution.
res_paramW = get_param_weib(med = 1062, mu = 1134)
# c(res_paramW$a, res_paramW$lambda)
listCoxWSim_n500_p1000 <- modelSim(model = "cox", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 1000, pnonull = 20, betaDistr = 1, hazDistr = "weibull",
hazParams = c(3, 4), seed = 1, d = 0)
#> Warning in modelSim(model = "cox", matDistr = "unif", matParam = c(-1, 1), :
#> Options "non-linearity with interactions" not available
hist(listCoxWSim_n500_p1000)
To simulate survival data checking the proportional risk hypothesis, we use a Cox model with survival times following a log-normal distribution.
res_paramLN = get_param_ln(var = 170000, mu = 2320)
listCoxSim_n500_p1000 <- modelSim(model = "cox", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 1000, pnonull = 20, betaDistr = 1, hazDistr = "log-normal",
hazParams = c(res_paramLN$a, res_paramLN$lambda), seed = 1, d = 0)
#> Warning in modelSim(model = "cox", matDistr = "unif", matParam = c(-1, 1), :
#> Options "non-linearity with interactions" not available
hist(listCoxSim_n500_p1000)
For this simulation, we consider the same design as 2.1 to simulate survival times. However, the simulated covariate matrix is correlated. The pertinent covariates are those correlated. We split this set of pertinent covariates in two subsets: one with a correlation parameter (matParam[2]) and the other one with (1 - matParam[2]). The non-pertinent covariates are uncorrelated.
res_paramW = get_param_weib(med = 2.5, mu = 1.2)
listCoxSimCor_n500_p1000 <- modelSim(model = "cox", matDistr = "mvnorm", matParam = c(0,0.6), n = 500,
p = 1000, pnonull = 10, betaDistr = 1, hazDistr = "weibull",
hazParams = c(res_paramW$a, res_paramW$lambda), seed = 1, d = 0)
#> Warning in modelSim(model = "cox", matDistr = "mvnorm", matParam = c(0, :
#> Options "non-linearity with interactions" not available
Heatmap(listCoxSimCor_n500_p1000, ind = 1:15, k = 3)
From survMS package, we can simulate the data from an AFT model. We have chosen to carry out this simulation to generate survival data that does not respect the proportional risk hypothesis. For more details, see Vignette. Survival times can follow different distributions: Weibull, log-normal, exponential (in progress) and Gompertz (in progress).
To simulate survival data not checking the proportional risk hypothesis, we use an AFT model with survival times following a weibull distribution.
res_paramW = get_param_weib(med = 1062, mu = 1134)
# c(res_paramW$a, res_paramW$lambda)
listAFTWSim_n500_p1000 <- modelSim(model = "AFT", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 1000, pnonull = 20, betaDistr = 1, hazDistr = "weibull",
hazParams = c(3, 2), seed = 1, d = 0)
#> Warning in modelSim(model = "AFT", matDistr = "unif", matParam = c(-1, 1), :
#> Options "non-linearity with interactions" not available
hist(listAFTWSim_n500_p1000)
To simulate survival data not checking the proportional risk hypothesis, we use an AFT model with survival times following a log-normal distribution.
res_paramLN = get_param_ln(var = 170000, mu = 2325)
listAFTLNSim_n500_p1000 <- modelSim(model = "AFT", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 10, pnonull = 10, betaDistr = 1, hazDistr = "log-normal",
hazParams = c(res_paramLN$a*3, res_paramLN$lambda), seed = 1, d = 0)
#> Warning in modelSim(model = "AFT", matDistr = "unif", matParam = c(-1, 1), :
#> Options "non-linearity with interactions" not available
hist(listAFTLNSim_n500_p1000)
To obtain survival data from a model that does not respect the proportional risk hypothesis and that their survival curves can not cross, we have modified the AFT model by adding a term betaDistr in the model. For more details, see Vignette. Survival times can follow different distributions: Weibull, log-normal, exponential (in progress) and Gompertz (in progress).
res_paramLN = get_param_ln(var = 170000, mu = 2325)
listAFTsSim_n500_p1000 <- modelSim(model = "AFTshift", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 10, pnonull = 10, betaDistr = "unif", hazDistr = "log-normal",
hazParams = c(0.3, res_paramLN$lambda), seed = 1, d = 0)
#> Warning in modelSim(model = "AFTshift", matDistr = "unif", matParam = c(-1, :
#> Options "non-linearity with interactions" not available
#> Warning in log(mat_grille_ti * (1/as.vector(eta_i)) - matrix(phi2, nrow(Z), :
#> production de NaN
hist(listAFTsSim_n500_p1000)
From survMS package, we can also simulated survival data from another model, an AH model. We have chosen to carry out this simulation to generate survival data that does not respect the proportional risk hypothesis and survival curves can cross. For more details, see Vignette.
res_paramW = get_param_weib(med = 1062, mu = 1134)
listAHwSim_n500_p1000 <- modelSim(model = "AH", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 100, pnonull = 20, betaDistr = 1, hazDistr = "weibull",
hazParams = c(res_paramW$a, res_paramW$lambda),
Phi = 0, seed = 1, d = 0)
#> Warning in modelSim(model = "AH", matDistr = "unif", matParam = c(-1, 1), :
#> Options "non-linearity with interactions" not available
hist(listAHwSim_n500_p1000)
res_paramLN = get_param_ln(var = 170000, mu = 2325)
listAHSim_n500_p1000 <- modelSim(model = "AH", matDistr = "unif", matParam = c(-1,1), n = 500,
p = 100, pnonull = 100, betaDistr = 1.5, hazDistr = "log-normal",
hazParams = c(res_paramLN$a*4, res_paramLN$lambda), Phi = 0,
seed = 1, d = 0)
#> Warning in modelSim(model = "AH", matDistr = "unif", matParam = c(-1, 1), :
#> Options "non-linearity with interactions" not available
# summary(listAHSim_n500_p1000)
# print(listAHSim_n500_p1000)
hist(listAHSim_n500_p1000)
# Weibull distribution
res_paramW = get_param_weib(med = 1062, mu = 1134)
res_paramW
#> $a
#> [1] 1.969765
#>
#> $lambda
#> [1] 7.586963e-07
# Log-normale distribution
res_paramLN = get_param_ln(var = 600000, mu = 1134)
res_paramLN
#> $a
#> [1] 0.6188154
#>
#> $lambda
#> [1] 6.84204
set.seed(1234)
ind = sample(x = 1:500, 8)
## Cox/Weibull model
# df_p1000_n500[1:6,1:10]
plot(listCoxWSim_n500_p1000, ind = ind)
## AFT/Weibull model
# df_p1000_n500[1:6,1:10]
plot(listAFTLNSim_n500_p1000, ind = ind)
## Shifted AFT/LN model
# df_p1000_n500[1:6,1:10]
plot(listAFTsSim_n500_p1000, ind = ind)
## Cox/Weibull model
# df_p1000_n500[1:6,1:10]
plot(listCoxWSim_n500_p1000, ind = ind, type = "hazard")
## AFT/Weibull model
# df_p1000_n500[1:6,1:10]
plot(listAFTWSim_n500_p1000, ind = ind, type = "hazard")
## AFT/LN model
# df_p1000_n500[1:6,1:10]
plot(listAFTLNSim_n500_p1000, ind = ind, type = "hazard")
## Shifted AFT/LN model
# df_p1000_n500[1:6,1:10]
plot(listAFTsSim_n500_p1000, ind = ind, type = "hazard")
## Cox/Weibull model
# df_p1000_n500[1:6,1:10]
plot(listAHSim_n500_p1000, ind = ind, type = "hazard")
Please also see the package vignette for more detailed examples and a description of the methodology underpinning the survMS package.