Data simulation from different survival models

modelSim(
  model = "cox",
  matDistr,
  matParam,
  n,
  p,
  pnonull,
  betaDistr,
  hazDistr,
  hazParams,
  seed,
  Phi = NULL,
  d = 0,
  pourc = 0.9
)

Arguments

model

Survival model: "cox", "AFT", "AFTshift" or "AH"

matDistr

Distribution of matrix

matParam

Parameters of matrix

n

size of sample

p

number of parameters

pnonull

number of partinent covariates

betaDistr

Distribution of beta or vector of beta

hazDistr

distribution of baseline hazard

hazParams

Parameters of baseline hazard

seed

seed

Phi

nonlinearity (not coded)

d

censorship

pourc

pourcents

Value

modelSim returns a list containing:

  • model model (Cox, AFT, AFTshift, AH)

  • Z Matrix of covariates

  • Y random covariates

  • TC Vector of survival times

  • delta Vector of censorship indicator

  • betanorm Vector of normalized regression parameter

  • crate Censorship rate

  • crate_delta Censorship rate

  • vecY Vector of number of individuals at risk at time \(t_i\)

  • hazParams Vector of parameter distribution of the baseline hazard function

  • hazDistr Distribution of the baseline hazard function

  • St Matrix of survival functions

  • ht Matrix of hazard risk functions

  • grilleTi Time grid

Details

This function simulates survival data from different models: Cox model, AFT model and AH model. 1. The Cox model is defined as: \( \lambda(t|X) = \alpha_0(t) \exp(\beta^T X_{i.}), \) with \(\alpha_0(t)\) is the baseline risk and \(\beta\) is the vector of coefficients. Two distributions are considered for the baseline risk:

  • Weibull: \(\alpha_0(t) = \lambda a t^{(a-1)}\);

  • Log-normal: \(\alpha_0(t) = (1/(\sigma\sqrt(2\pi t) \exp[-(\log t - \mu)^2 /2 \sigma^2]))/(1 - \Phi[(\log t - \mu)/\sigma\)]);

  • Exponential: \(\alpha_0(t) = \lambda\);

  • Gompertz: \(\alpha_0(t) = \lambda \exp(\alpha t)\).

To Simulate the covariates, two distributions are also proposed:

  • Uniform

  • Normal

and the choice of parameters The Phi parameter enables to simulate survival data in a linear framework with no interaction, but its future implementation will take into account a non-linear framework with interactions. If the parameter Phi is NULL (to complete...).

2. The AFT model is defined from a linear regression of the interest covariate: \( Y_i = X_{i.} \beta + W_i, \) with \(X_{i.}\) the covariates, \(\beta\) the vector of regression coefficients et \(\epsilon_i\) the error term AFT model can also be defined from the baseline survival function \(S_0(t)\), corresponding distribution tail \(\exp(\epsilon_i)\). Survival function of AFT model is written as: \( S(t|{X_{i.}}) = S_0(t\exp{(\beta^T X_{i.})}), \) and the expression of hazard risk is the form of: \( \lambda(t|X_{i.}) = \exp(\beta^T X_{i.}) \alpha_0(t\exp(\beta^T X_{i.})). \label{eq:riskAFT} \) with \(\alpha_0(t)\) is the baseline risk and \(\beta\) is the vector of coefficients. The advantage of AFT model is that the variables have a multiplicative effect on \(t\) rather than on the risk function, as is the case in Cox model. Two distributions are considered for the baseline risk:

  • Weibull: \(\alpha_0(t) = \lambda a t^{(a-1)}\);

  • Log-normal: \(\alpha_0(t) = (1/(\sigma\sqrt(2\pi t) \exp[-(\log t - \mu)^2 /2 \sigma^2]))/(1 - \Phi[(\log t - \mu)/\sigma\)])

.

To Simulate the covariates, two distributions are also proposed:

  • Uniform

  • Normal

and the choice of parameters The Phi parameter enables to simulate survival data in a linear framework with no interaction, but its future implementation will take into account a non-linear framework with interactions. If the parameter Phi is NULL (to complete...). 3. The hazard risk of the AH model is defined for an individual \(i\) as: \( \lambda_{AH}(t|X_{i.}) = \alpha_0(t\exp(\beta^T X_{i.})), \) with \(\alpha_0\) the baseline risk and \(\beta\) the vector of regression parameters. In a model with only one binary variable considered that corresponds to the treatment, the hazard risk is written as follows: \( \lambda_1(t) = \alpha_0(\beta t). \) with \(\alpha_0\) the baseline risk and \(\beta\) the vector of regression parameters. In a model with only one binary variable considered that corresponds to the treatment, the hazard risk is written as follows: \( \lambda_1(t) = \alpha_0(\beta t). \) The regression vector \(\beta\) characterizes the influence of variables on the survival time of individuals, and \(\exp(\beta^TX_{i.})\) is a factor altering the time scale on hazard risk. The positive or negative value of \(\beta^T X_{i.}\) will respectively imply an acceleration or deceleration of the risk.The AH model is defined from a linear regression of the interest covariate: Two distributions are considered for the baseline risk:

  • Weibull: \(\alpha_0(t) = \lambda a t^{(a-1)}\);

  • Log-normal: \(\alpha_0(t) = (1/(\sigma\sqrt(2\pi t) \exp[-(\log t - \mu)^2 /2 \sigma^2]))/(1 - \Phi[(\log t - \mu)/\sigma])\).

To Simulate the covariates, two distributions are also proposed:

  • Uniform

  • Normal

and the choice of parameters The Phi parameter enables to simulate survival data in a linear framework with no interaction, but its future implementation will take into account a non-linear framework with interactions. If the parameter Phi is NULL (to complete...).

sim$model <- model

Author

Mathilde Sautreuil

Examples

if (FALSE) {
library(survMS)
### Survival data simulated from Cox model
res_paramW = get_param_weib(med = 2228, mu = 2325)
listCoxSim_n500_p1000 <- modelSim(model = "cox", matDistr = "unif", matParam = c(-1,1), n = 500,
                                p = 1000, pnonull = 20, betaDistr = 1, hazDistr = "weibull",
                                hazParams = c(res_paramW$a, res_paramW$lambda), seed = 1, d = 0)
print(listCoxSim_n500_p1000)
hist(listCoxSim_n500_p1000)
plot(listCoxSim_n500_p1000, ind = sample(1:500, 5))
plot(listCoxSim_n500_p1000, ind = sample(1:500, 5), type = "hazard")

df_p1000_n500 = data.frame(time = listCoxSim_n500_p1000$TC,
                          event = listCoxSim_n500_p1000$delta,
                          listCoxSim_n500_p1000$Z)
df_p1000_n500[1:6,1:10]
dim(df_p1000_n500)
### Survival data simulated from AFT model
res_paramLN = get_param_ln(var = 200000, mu = 1134)
listAFTSim_n500_p1000 <- modelSim(model = "AFT", matDistr = "unif", matParam = c(-1,1), n = 500,
                                p = 100, pnonull = 100, betaDistr = 1, hazDistr = "log-normal",
                                hazParams = c(res_paramLN$a, res_paramLN$lambda),
                                Phi = 0, seed = 1, d = 0)
hist(listAFTSim_n500_p1000)
plot(listAFTSim_n500_p1000, ind = sample(1:500, 5))
df_p1000_n500 = data.frame(time = listAFTSim_n500_p1000$TC,
                           event = listAFTSim_n500_p1000$delta,
                           listAFTSim_n500_p1000$Z)
df_p1000_n500[1:6,1:10]
dim(df_p1000_n500)

### Survival data simulated from AH model
res_paramLN = get_param_ln(var=170000, mu=2325)
listAHSim_n500_p1000 <- modelSim(model = "AH", matDistr = "unif", matParam = c(-1,1), n = 500, 
                                 p = 100, pnonull = 100, betaDistr = 1.5, hazDistr = "log-normal",
                                 hazParams = c(res_paramLN$a*4, res_paramLN$lambda),
                                 Phi = 0, seed = 1, d = 0)
                                 
print(listAHSim_n500_p1000)
hist(listAHSim_n500_p1000)
plot(listAHSim_n500_p1000, ind = sample(1:500, 5))
plot(listAHSim_n500_p1000, ind = sample(1:500, 5), type = "hazard")
}