Maximum Likelihood Estimation for parametric linear regression models
Source:R/maxlogLreg.R
maxlogLreg.RdFunction to compute maximum likelihood estimators (MLE) of regression parameters
of any distribution implemented in R with covariates (linear predictors).
Usage
maxlogLreg(
formulas,
y_dist,
support = NULL,
data = NULL,
subset = NULL,
fixed = NULL,
link = NULL,
optimizer = "nlminb",
start = NULL,
lower = NULL,
upper = NULL,
inequalities = NULL,
control = NULL,
silent = FALSE,
StdE_method = c("optim", "numDeriv"),
...
)Arguments
- formulas
a list of formula objects. Each element must have an
~, with the terms on the right separated by+operators. The response variable on the left side is optional. Linear predictor of each parameter must be specified with the name of the parameter followed by the suffix'.fo'. See the examples below for further illustration.- y_dist
a formula object that specifies the distribution of the response variable. On the left side of
~must be the response, and in the right side must be the name o de probability density/mass function. See the section Details and the examples below for further illustration.- support
a list with the following entries:
interval: a two dimensional atomic vector indicating the set of possible values of a random variable having the distribution specified iny_dist.type: character indicating if distribution has adiscreteor acontinousrandom variable.
- data
an optional data frame containing the variables in the model. If data is not specified, the variables are taken from the environment from which
maxlogLregis called.- subset
an optional vector specifying a subset of observations to be used in the fitting process.
- fixed
a list with fixed/known parameters of distribution of interest. Fixed parameters must be passed with its name and its value (known).
- link
a list with names of parameters to be linked, and names of the link function object. For names of parameters, please visit documentation of density/mass function. There are three link functions available:
log_link,logit_linkandNegInv_link. Take into account: the order used in argumentovercorresponds to the order in argumentlink.- optimizer
a length-one character vector with the name of optimization routine.
nlminb,optimandDEoptimare available;nlminbis the default routine.- start
a numeric vector with initial values for the parameters to be estimated. Zero is the default value.
- lower
a numeric vector with lower bounds, with the same lenght of argument
start(for box-constrained optimization).-Infis the default value.- upper
a numeric vector with upper bounds, with the same lenght of argument
start(for box-constrained optimization).Infis the default value.- inequalities
a character vector with the inequality constrains for the distribution parameters.
- control
control parameters of the optimization routine. Please, visit documentation of selected optimizer for further information.
- silent
logical. If TRUE, warnings of
maxlogLare suppressed.- StdE_method
a length-one character vector with the routine for Hessian matrix computation. The This is needed for standard error estimation. The options available are
"optim"and"numDeriv". For further information, visitoptimorhessian.- ...
Further arguments to be supplied to the optimization routine.
Value
A list with class maxlogL containing the following
lists:
- fit
A list with output information about estimation and method used.
- inputs
A list with all input arguments.
- outputs
A list with additional information. The most important outputs are:
npar: number of parameters.n: sample sizeStde_method: standard error computation method.b_lenght: a list with the number of regression parameters.design_matrix: a list with the \(\mathbf{X}\) matrix for each parameter, the response values (calledy) and the censorship matrix (calledstatus). See the Details section for further information.
Details
maxlogLreg computes programmatically the log-likelihood
(log L) function corresponding for the following model:
$$ y_i \stackrel{iid.}{\sim} \mathcal{D}(\theta_{i1},\theta_{i2},\dots, \theta_{ij}, \dots, \theta_{ik}) $$ $$ g(\boldsymbol{\theta}_{j}) = \boldsymbol{\eta}_{j} = \mathbf{X}_j^\top \boldsymbol{\beta}_j, $$
where,
\(g_k(\cdot)\) is the \(k\)-th link function.
\(\boldsymbol{\eta}_{j}\) is the value of the linear predictor for the \(j^{th}\) for all the observations.
\(\boldsymbol{\beta}_j = (\beta_{0j}, \beta_{1j},\dots, \beta_{(p_j-1)j})^\top\) are the fixed effects vector, where \(p_j\) is the number of parameters in linear predictor \(j\) and \(\mathbf{X}_j\) is a known design matrix of order \(n\times p_j\). These terms are specified in
formulasargument.\(\mathcal{D}\) is the distribution specified in argument
y_dist.
Then, maxlogLreg maximizes the log L through
optim, nlminb or
DEoptim. maxlogLreg generates an S3 obj.
Estimation with censorship can be handled with Surv objects
(see example 2). The output object stores the corresponding censorship matrix,
defined as \(r_{il} = 1\) if sample unit \(i\) has status \(l\), or
\(r_{il} = 0\) in other case. \(i=1,2,\dots,n\) and \(l=1,2,3\)
(\(l=1\): observation status, \(l=2\): right censorship status,
\(l=3\): left censorship status).
Note
The following generic functions can be used with a
maxlogLobject:summary, print, logLik, AIC.Noncentrality parameters must be named as
ncpin the distribution.
References
Nelder JA, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313. ISSN 0010-4620, doi:10.1093/comjnl/7.4.308 , https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/7.4.308.
Fox PA, Hall AP, Schryer NL (1978). “The PORT Mathematical Subroutine Library.” ACM Transactions on Mathematical Software, 4(2), 104–126. ISSN 00983500, doi:10.1145/355780.355783 , https://dl.acm.org/doi/10.1145/355780.355783.
Nash JC (1979). Compact Numerical Methods for Computers. Linear Algebra and Function Minimisation, 2nd Edition edition. Adam Hilger, Bristol.
Dennis JE, Gay DM, Walsh RE (1981). “An Adaptive Nonlinear Least-Squares Algorithm.” ACM Transactions on Mathematical Software, 7(3), 348–368. ISSN 00983500, doi:10.1145/355958.355965 , https://dl.acm.org/doi/10.1145/355958.355965.
See also
summary.maxlogL,
optim,
nlminb,
DEoptim,
DEoptim.control,
maxlogL,
bootstrap_maxlogL
Other maxlogL:
cum_hazard.maxlogL(),
expected_value.maxlogL(),
maxlogL()
Author
Jaime Mosquera Gutiérrez, jmosquerag@unal.edu.co
Examples
library(EstimationTools)
#--------------------------------------------------------------------------------
# Example 1: Estimation in simulated normal distribution
n <- 1000
x <- runif(n = n, -5, 6)
y <- rnorm(n = n, mean = -2 + 3 * x, sd = exp(1 + 0.3* x))
norm_data <- data.frame(y = y, x = x)
# It does not matter the order of distribution parameters
formulas <- list(sd.fo = ~ x, mean.fo = ~ x)
support <- list(interval = c(-Inf, Inf), type = 'continuous')
norm_mod <- maxlogLreg(formulas, y_dist = y ~ dnorm, support = support,
data = norm_data,
link = list(over = "sd", fun = "log_link"))
summary(norm_mod)
#> _______________________________________________________________
#> Optimization routine: nlminb
#> Standard Error calculation: Hessian from optim
#> _______________________________________________________________
#> AIC BIC
#> 5026.671 5046.301
#> _______________________________________________________________
#> Fixed effects for mean
#> ---------------------------------------------------------------
#> Estimate Std. Error Z value Pr(>|z|)
#> (Intercept) -1.961867 0.109050 -17.991 < 2.2e-16 ***
#> x 3.027674 0.029349 103.160 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> _______________________________________________________________
#> Fixed effects for log(sd)
#> ---------------------------------------------------------------
#> Estimate Std. Error Z value Pr(>|z|)
#> (Intercept) 0.9581734 0.0225813 42.432 < 2.2e-16 ***
#> x 0.2950347 0.0070254 41.995 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> _______________________________________________________________
#> Note: p-values valid under asymptotic normality of estimators
#> ---
#--------------------------------------------------------------------------------
# Example 2: Fitting with censorship
# (data from https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm)
failures <- c(55, 187, 216, 240, 244, 335, 361, 373, 375, 386)
fails <- c(failures, rep(500, 10))
status <- c(rep(1, length(failures)), rep(0, 10))
Wei_data <- data.frame(fails = fails, status = status)
# Formulas with linear predictors
formulas <- list(scale.fo=~1, shape.fo=~1)
support <- list(interval = c(0, Inf), type = 'continuous')
# Bounds for optimization. Upper bound set with default values (Inf)
start <- list(
scale = list(Intercept = 100),
shape = list(Intercept = 10)
)
lower <- list(
scale = list(Intercept = 0),
shape = list(Intercept = 0)
)
mod_weibull <- maxlogLreg(formulas, y_dist = Surv(fails, status) ~ dweibull,
support = c(0, Inf), start = start,
lower = lower, data = Wei_data)
summary(mod_weibull)
#> _______________________________________________________________
#> Optimization routine: nlminb
#> Standard Error calculation: Hessian from optim
#> _______________________________________________________________
#> AIC BIC
#> 154.2437 156.2352
#> _______________________________________________________________
#> Fixed effects for shape
#> ---------------------------------------------------------------
#> Estimate Std. Error Z value Pr(>|z|)
#> (Intercept) 1.7256 0.5034 3.4279 0.0006083 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> _______________________________________________________________
#> Fixed effects for scale
#> ---------------------------------------------------------------
#> Estimate Std. Error Z value Pr(>|z|)
#> (Intercept) 606.00 124.43 4.8703 1.115e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> _______________________________________________________________
#> Note: p-values valid under asymptotic normality of estimators
#> ---
#--------------------------------------------------------------------------------