| Title: | Calibration Performance |
|---|---|
| Description: | Plots calibration curves and computes statistics for assessing calibration performance. See Lasai et al. (2025) <doi:10.48550/arXiv.2503.08389>, De Cock Campo (2023) <doi:10.48550/arXiv.2309.08559> and Van Calster et al. (2016) <doi:10.1016/j.jclinepi.2015.12.005>. |
| Authors: | Bavo De Cock [aut, cre], Lasai Barrenada [aut], Daan Nieboer [aut], Ben Van Calster [aut], Ewout Steyerberg [aut], Yvonne Vergouwe [aut] |
| Maintainer: | Bavo De Cock <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 3.1.0 |
| Built: | 2026-05-26 07:27:04 UTC |
| Source: | https://github.com/cran/CalibrationCurves |
Adjusted version of the rcspline.plot function where only the output is returned and no plot is made
.rcspline.plot( x, y, model = c("logistic", "cox", "ols"), xrange, event, nk = 5, knots = NULL, show = c("xbeta", "prob"), adj = NULL, xlab, ylab, ylim, plim = c(0, 1), plotcl = TRUE, showknots = TRUE, add = FALSE, plot = TRUE, subset, lty = 1, noprint = FALSE, m, smooth = FALSE, bass = 1, main = "auto", statloc ).rcspline.plot( x, y, model = c("logistic", "cox", "ols"), xrange, event, nk = 5, knots = NULL, show = c("xbeta", "prob"), adj = NULL, xlab, ylab, ylim, plim = c(0, 1), plotcl = TRUE, showknots = TRUE, add = FALSE, plot = TRUE, subset, lty = 1, noprint = FALSE, m, smooth = FALSE, bass = 1, main = "auto", statloc )
x |
a numeric predictor |
y |
a numeric response. For binary logistic regression, |
model |
|
xrange |
range for evaluating |
event |
event/censoring indicator if |
nk |
number of knots |
knots |
knot locations, default based on quantiles of |
show |
|
adj |
optional matrix of adjustment variables |
xlab |
|
ylab |
|
ylim |
|
plim |
|
plotcl |
plot confidence limits |
showknots |
show knot locations with arrows |
add |
add this plot to an already existing plot |
plot |
logical to indicate whether a plot has to be made. |
subset |
subset of observations to process, e.g. |
lty |
line type for plotting estimated spline function |
noprint |
suppress printing regression coefficients and standard errors |
m |
for |
smooth |
plot nonparametric estimate if |
bass |
smoothing parameter (see |
main |
main title, default is |
statloc |
location of summary statistics. Default positioning by clicking left mouse button where upper left corner of statistics should appear.
Alternative is |
list with components (‘knots’, ‘x’, ‘xbeta’, ‘lower’, ‘upper’) which are respectively the knot locations, design matrix, linear predictor, and lower and upper confidence limits
lrm, cph, rcspline.eval, plot, supsmu,
coxph.fit, lrm.fit
This infix operator can be used to create a background job for a block of code in RStudio/Posit and, once completed, all objects created in the block of code are imported into the global environment.
lhs %{}% rhslhs %{}% rhs
lhs |
not used, see details and examples |
rhs |
the block of code that you want to run |
You can use this infix operator in two different ways. Either you set the left-hand side to NULL or you use the syntax
`%{}%` ({BlockOfCode})
prints the ID of the background job in the console and, once completed, the objects created in the block of code are imported into the global environment
# Can only be executed in Rstudio ## Not run: NULL %{}% { x = rnorm(1e7) y = rnorm(1e7) } `%{}%` ({ x = rnorm(1e7) y = rnorm(1e7) }) ## End(Not run)# Can only be executed in Rstudio ## Not run: NULL %{}% { x = rnorm(1e7) y = rnorm(1e7) } `%{}%` ({ x = rnorm(1e7) y = rnorm(1e7) }) ## End(Not run)
This infix operator can be used to create a background job in RStudio/Posit and, once completed, the value of rhs is assigned to lhs.
lhs %<=% rhslhs %<=% rhs
lhs |
the object that the rhs value is assigned to |
rhs |
the value you want to assign to lhs |
prints the ID of the background job in the console and, once completed, the value of lhs is assigned to rhs
# Can only be executed in Rstudio ## Not run: x %<=% rnorm(1e7)# Can only be executed in Rstudio ## Not run: x %<=% rnorm(1e7)
Obtain the point estimate and the confidence interval of the AUC by various methods based on the Mann-Whitney statistic.
auc.nonpara.mw(x, y, conf.level=0.95, method=c("newcombe", "pepe", "delong", "jackknife", "bootstrapP", "bootstrapBCa"), nboot)auc.nonpara.mw(x, y, conf.level=0.95, method=c("newcombe", "pepe", "delong", "jackknife", "bootstrapP", "bootstrapBCa"), nboot)
x |
a vector of observations from class P. |
y |
a vector of observations from class N. |
conf.level |
confidence level of the interval. The default is 0.95. |
method |
a method used to construct the CI. |
nboot |
number of bootstrap iterations. |
The function implements various methods based on the Mann-Whitney statistic.
Point estimate and lower and upper bounds of the CI of the AUC.
The observations from class P tend to have larger values than that from class N.
This help-file is a copy of the original help-file of the function auc.nonpara.mw from the auRoc-package. It is important to note
that, when using method="pepe", the confidence interval is computed as documented in Qin and Hotilovac (2008) and that this is
different from the original function.
Elizabeth R Delong, David M Delong, and Daniel L Clarke-Pearson (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44 837-845
Dai Feng, Giuliana Cortese, and Richard Baumgartner (2015) A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size. Statistical Methods in Medical Research DOI: 10.1177/0962280215602040
Robert G Newcombe (2006) Confidence intervals for an effect size measure based on the Mann-Whitney statistic. Part 2: asymptotic methods and evaluation. Statistics in medicine 25(4) 559-573
Margaret Sullivan Pepe (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press
Qin, G., & Hotilovac, L. (2008). Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Statistical Methods in Medical Research, 17(2), pp. 207-21
The CalibrationCurves package provides tools to assess and visualize the calibration performance of prediction models. Calibration refers to the agreement between predicted probabilities or values and what is actually observed.
The package covers a broad range of outcome types and modelling settings:
Binary outcomes — val.prob.ci.2 (base R graphics) and
valProbggplot (ggplot2) compute flexible calibration curves (loess or
restricted cubic splines) with pointwise 95% confidence intervals, logistic calibration
slope and intercept, c-statistic, Brier score, and other statistics.
Clustered binary outcomes — valProbCluster assesses
calibration while accounting for clustering via three approaches: Clustered Grouped
Calibration (CGC), Meta-Analytical Calibration Curve (MAC2),
and Mixed-Effects Model Calibration (MIXC). See Barreñada et al. (2025).
Generalized outcomes (exponential family) — genCalCurve
extends the calibration framework to outcomes whose distribution belongs to the
exponential family (e.g., Poisson, Gamma). It estimates the generalized calibration
slope and intercept and plots the generalized calibration curve. See De Cock Campo (2023).
Survival outcomes — valProbSurvival evaluates calibration
for a fitted Cox proportional hazards model at a given time horizon, producing
calibration curves and summary statistics for time-to-event predictions.
A vignette is available that provides a comprehensive overview of the theory and illustrates the functions with worked examples. Further background is available in the linked papers below.
History
Some years ago, Yvonne Vergouwe and Ewout Steyerberg adapted the function val.prob from the rms-package (https://cran.r-project.org/package=rms) into val.prob.ci and added the following features:
Scaled Brier score by relating to max for average calibrated Null model
Risk distribution according to outcome
0 and 1 to indicate outcome label; set with d1lab="..", d0lab=".."
Labels: y axis: "Observed Frequency"; Triangle: "Grouped observations"
Confidence intervals around triangles
A cut-off can be plotted; set x coordinate
In December 2015, Bavo De Cock, Daan Nieboer, and Ben Van Calster adapted
this to val.prob.ci.2:
Flexible calibration curves using loess (default) or restricted cubic splines, with pointwise 95% confidence intervals.
Loess: confidence intervals can be obtained in closed form or using bootstrapping
(CL.BT=TRUE uses 2000 bootstrap samples).
RCS: 3 to 5 knots; knot locations estimated via default quantiles of the
predictor (by rcspline.eval).
Plot customization through standard plot arguments (cex.axis, etc.);
legend size controlled via cex.leg.
Label y-axis: "Observed proportion".
Added the Estimated Calibration Index (ECI) to quantify lack of calibration (Van Hoorde et al., 2015).
By default shows the "abc" of model performance: calibration intercept, calibration slope, and c-statistic (Steyerberg et al., 2011).
Vectors p, y and logit no longer have to be sorted.
A ggplot2-based equivalent, valProbggplot, was subsequently added, offering
the same functionality with ggplot2 graphics.
In 2023, Bavo De Cock (Campo) introduced the generalized calibration framework
(De Cock Campo, 2023), extending logistic
calibration to prediction models with outcomes from any distribution in the exponential
family, implemented in genCalCurve.
Support for survival models was added via valProbSurvival, enabling
calibration assessment of Cox proportional hazards model predictions at a specified time
horizon.
In 2025, methods for clustered data were introduced
(Barreñada et al., 2025), accessible through
valProbCluster, which supports CGC, MAC2, and MIXC approaches.
The most current version of this package can be found on https://github.com/BavoDC/CalibrationCurves.
Barreñada, L., De Cock Campo, B., Wynants, L., Van Calster, B. (2025). Clustered Flexible Calibration Plots for Binary Outcomes Using Random Effects Modeling. arXiv:2503.08389, available at https://arxiv.org/abs/2503.08389.
De Cock Campo, B. (2023). Towards reliable predictive analytics: a generalized calibration framework. arXiv:2309.08559, available at https://arxiv.org/abs/2309.08559.
Steyerberg, E.W., Van Calster, B., Pencina, M.J. (2011). Performance measures for prediction models and markers: evaluation of predictions and classifications. Revista Espanola de Cardiologia, 64(9), pp. 788-794
Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina M., Steyerberg E.W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of Clinical Epidemiology, 74, pp. 167-176
Van Hoorde, K., Van Huffel, S., Timmerman, D., Bourne, T., Van Calster, B. (2015). A spline-based tool to assess and visualize the calibration of multiclass risk predictions. Journal of Biomedical Informatics, 54, pp. 283-93
van Geloven, N., Giardiello, D., Bonneville, E.F., Teece, L., Ramspek, C.L., van Smeden, M. et al. (2022). Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ, 377:e069249, doi:10.1136/bmj-2021-069249
Estimates the calibration curves using the CGC approach. The function supports two grouping methods:
equal-sized groups ("grouped") or interval-based groups ("interval").
Optionally, a calibration plot can be produced with cluster-specific curves.
CGC( data = NULL, p, y, cluster, cl.level = 0.95, ntiles = 10, cluster_curves = FALSE, plot = TRUE, size = 1, linewidth = 0.4, univariate = FALSE, method = c("grouped", "interval") )CGC( data = NULL, p, y, cluster, cl.level = 0.95, ntiles = 10, cluster_curves = FALSE, plot = TRUE, size = 1, linewidth = 0.4, univariate = FALSE, method = c("grouped", "interval") )
data |
optional data frame containing the variables |
p |
predicted probabilities (numeric vector) or name of the column in
|
y |
binary outcome variable or the name of the column in |
cluster |
cluster identifier (factor, character, or integer) or name of
the column in |
cl.level |
the confidence level for the calculation of the confidence interval. Default is |
ntiles |
integer, number of groups (tiles) for calibration. Default is |
cluster_curves |
logical, whether to include cluster-specific calibration
curves in the plot. Default is |
plot |
logical, whether to return a calibration plot. Default is |
size |
numeric, point size for plotted curves. Default is |
linewidth |
numeric, line width for plotted curves. Default is |
univariate |
logical, whether to use univariate meta-analysis. Default is |
method |
character, grouping method: |
When method = "grouped", the predictions are divided into equal-sized bins using quantiles.
Conversely, if method ="interval", the predictions are divided into fixed-width bins across [0, 1].
The function performs a meta-analysis within each group. This can be either a univariate or bivariate analysis,
which is specified in the univariate argument. The univariate analysis is performed using the
metaprop function and the bivariate analysis employs the rma.mv function.
Hereafter, the results are aggregated and plotted as calibration curves.
A list containing:
plot_dataData frame of meta-analysis calibration estimates.
trad_groupedData frame with traditional grouped calibration results.
observed_dataData frame with per-observation calibration data.
cluster_dataData frame with cluster-specific calibration summaries.
plotA ggplot2 object if plot = TRUE, otherwise NULL.
Function to assess the calibration performance of a prediction model where the outcome's distribution is a member of the exponential family (De Cock Campo, 2023). The function plots the generalized calibration curve and computes the generalized calibration slope and intercept.
genCalCurve( y, yHat, family, plot = TRUE, Smooth = FALSE, GLMCal = TRUE, lwdIdeal = 2, colIdeal = "gray", ltyIdeal = 1, lwdSmooth = 1, colSmooth = "blue", ltySmooth = 1, argzSmooth = alist(degree = 2), lwdGLMCal = 1, colGLMCal = "red", ltyGLMCal = 1, AddStats = TRUE, Digits = 3, cexStats = 1, lwdLeg = 1.5, Legend = TRUE, legendPos = "bottomright", xLim = NULL, yLim = NULL, posStats = NULL, confLimitsSmooth = c("none", "bootstrap", "pointwise"), confLevel = 0.95, Title = "Calibration plot", xlab = "Predicted value", ylab = "Empirical average", EmpiricalDistribution = TRUE, length.seg = 1, ... )genCalCurve( y, yHat, family, plot = TRUE, Smooth = FALSE, GLMCal = TRUE, lwdIdeal = 2, colIdeal = "gray", ltyIdeal = 1, lwdSmooth = 1, colSmooth = "blue", ltySmooth = 1, argzSmooth = alist(degree = 2), lwdGLMCal = 1, colGLMCal = "red", ltyGLMCal = 1, AddStats = TRUE, Digits = 3, cexStats = 1, lwdLeg = 1.5, Legend = TRUE, legendPos = "bottomright", xLim = NULL, yLim = NULL, posStats = NULL, confLimitsSmooth = c("none", "bootstrap", "pointwise"), confLevel = 0.95, Title = "Calibration plot", xlab = "Predicted value", ylab = "Empirical average", EmpiricalDistribution = TRUE, length.seg = 1, ... )
y |
a vector with the values for the response variable |
yHat |
a vector with the predicted values |
family |
a description of the type of distribution and link function in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.) |
plot |
logical, indicating if a plot should be made or not. |
Smooth |
logical, indicating if the flexible calibration curve should be estimated. |
GLMCal |
logical, indicating if the GLM calibration curve has to be estimated. |
lwdIdeal |
the line width of the ideal line. |
colIdeal |
the color of the ideal line. |
ltyIdeal |
the line type of the ideal line. |
lwdSmooth |
the line width of the flexible calibration curve. |
colSmooth |
the color of the flexible calibration curve. |
ltySmooth |
the line type of the flexible calibration curve. |
argzSmooth |
arguments passed to |
lwdGLMCal |
the line width of the GLM calibration curve. |
colGLMCal |
the color of the GLM calibration curve. |
ltyGLMCal |
the line type of the GLM calibration curve. |
AddStats |
logical, indicating whether to add the values of the generalized calibration slope and intercept to the plot. |
Digits |
the number of digits of the generalized calibration slope and intercept. |
cexStats |
the font size of the statistics shown on the plot. |
lwdLeg |
the line width in the legend. |
Legend |
logical, indicating whether the legend has to be added. |
legendPos |
the position of the legend on the plot. |
xLim, yLim
|
numeric vectors of length 2, giving the x and y coordinates ranges (see |
posStats |
numeric vector of length 2, specifying the x and y coordinates of the statistics (generalized calibration curve and intercept) printed on the plot. Default is |
confLimitsSmooth |
character vector to indicate if and how the confidence limits for the flexible calibration curve have to be computed. |
confLevel |
the confidence level for the calculation of the pointwise confidence limits of the flexible calibration curve. |
Title |
the title of the plot |
xlab |
x-axis label, default is |
ylab |
y-axis label, default is |
EmpiricalDistribution |
logical, indicating if the empirical distribution of the predicted values has to be added to the bottom of the plot. |
length.seg |
controls the length of the histogram lines. Default is |
... |
An object of type GeneralizedCalibrationCurve with the following slots:
call |
the matched call. |
ggPlot |
the ggplot object. |
stats |
a vector containing performance measures of calibration. |
cl.level |
the confidence level used. |
Calibration |
contains the calibration intercept and slope, together with their confidence intervals. |
Cindex |
the value of the c-statistic, together with its confidence interval. |
warningMessages |
if any, the warning messages that were printed while running the function. |
CalibrationCurves |
The coordinates for plotting the calibration curves. |
De Cock Campo, B. (2023). Towards reliable predictive analytics: a generalized calibration framework. arXiv:2309.08559, available at https://arxiv.org/abs/2309.08559.
library(CalibrationCurves) library(mgcv) data("poissontraindata") data("poissontestdata") glmFit = glm(Y ~ ., data = poissontraindata, family = poisson) # Example of a well calibrated poisson prediction model yOOS = poissontestdata$Y yHat = predict(glmFit, newdata = poissontestdata, type = "response") genCalCurve(yOOS, yHat, family = "poisson", plot = TRUE) # Example of an overfit poisson prediction model gamFit = gam(Y ~ x1 + x3 + x1:x3 + s(x5), data = poissontraindata, family = poisson) yHat = as.vector(predict(gamFit, newdata = poissontestdata, type = "response")) genCalCurve(yOOS, yHat, family = "poisson", plot = TRUE) # Example of an underfit poisson prediction model glmFit = glm(Y ~ x2, data = poissontraindata, family = poisson) yOOS = poissontestdata$Y yHat = predict(glmFit, newdata = poissontestdata, type = "response") genCalCurve(yOOS, yHat, family = "poisson", plot = TRUE)library(CalibrationCurves) library(mgcv) data("poissontraindata") data("poissontestdata") glmFit = glm(Y ~ ., data = poissontraindata, family = poisson) # Example of a well calibrated poisson prediction model yOOS = poissontestdata$Y yHat = predict(glmFit, newdata = poissontestdata, type = "response") genCalCurve(yOOS, yHat, family = "poisson", plot = TRUE) # Example of an overfit poisson prediction model gamFit = gam(Y ~ x1 + x3 + x1:x3 + s(x5), data = poissontraindata, family = poisson) yHat = as.vector(predict(gamFit, newdata = poissontestdata, type = "response")) genCalCurve(yOOS, yHat, family = "poisson", plot = TRUE) # Example of an underfit poisson prediction model glmFit = glm(Y ~ x2, data = poissontraindata, family = poisson) yOOS = poissontestdata$Y yHat = predict(glmFit, newdata = poissontestdata, type = "response") genCalCurve(yOOS, yHat, family = "poisson", plot = TRUE)
Function to load multiple packages at once
LibraryM(...)LibraryM(...)
... |
the packages that you want to load |
invisible NULL
LibraryM(CalibrationCurves)LibraryM(CalibrationCurves)
Computes meta-analytical calibration curves using multiple methods (logistic regression, loess or splines) and performs meta-analysis across clusters to generate aggregated calibration curves with confidence and prediction intervals.
MAC2( data = NULL, p, y, cluster, grid, cl.level = 0.95, alpha.lr = 0.05/3, plot = TRUE, cluster_curves = FALSE, knots = 3, transf = "logit", method_choice = c("splines", "log", "loess"), method.tau = "REML", prediction = TRUE, random = TRUE, sm = "PLOGIT", hakn = FALSE, linewidth = 1, method.predict = "HTS", verbose = FALSE )MAC2( data = NULL, p, y, cluster, grid, cl.level = 0.95, alpha.lr = 0.05/3, plot = TRUE, cluster_curves = FALSE, knots = 3, transf = "logit", method_choice = c("splines", "log", "loess"), method.tau = "REML", prediction = TRUE, random = TRUE, sm = "PLOGIT", hakn = FALSE, linewidth = 1, method.predict = "HTS", verbose = FALSE )
data |
optional data frame containing the variables |
p |
predicted probabilities (numeric vector) or name of the column in
|
y |
binary outcome variable or the name of the column in |
cluster |
Cluster identifier (factor, character, or integer) or name of
the column in |
grid |
the grid for the calibration curve evaluation |
cl.level |
the confidence level for the calculation of the confidence interval. Default is |
alpha.lr |
the alpha-level used for the likelihood ratio test, selecting the number of knots for the restricted cubic splines |
plot |
logical, indicates whether to plot the calibration curves. Default is |
cluster_curves |
logical, whether to include cluster-specific curves in the plot. Default is |
knots |
integer, number of knots for splines. Default is |
transf |
character, transformation for predictions: |
method_choice |
character, which method to use for meta-analysis. Options are:
|
method.tau |
character, method for between-study heterogeneity estimation. Default is |
prediction |
logical, whether to compute prediction intervals. Default is |
random |
logical, whether to use random-effects model. Default is |
sm |
character, summary measure for meta-analysis. Default is |
hakn |
logical, whether to use Hartung-Knapp adjustment. Default is |
linewidth |
numeric, line width for the meta-curve. Default is |
method.predict |
character, method for prediction intervals. Default is |
verbose |
logical, indicates whether progress has to be printed in the console. |
This function estimates the center-specific calibration curves using logistic regression,
loess or splines. Hereafter, it aggregates the calibration curves using meta-analytical techniques.
The meta-analysis is performed using the function metagen from the meta
package. The method_choice argument determines which method is for the meta-analytical aggregation.
A list containing:
cluster_dataData frame with linear predictors and standard errors for each method per cluster
plot_dataData frame with meta-analysis results including predictions and intervals
plotA ggplot2 object if plot = TRUE, otherwise NULL
Estimates the calibration curve using a logistic generalized linear mixed model.
MIXC( data = NULL, p, y, cluster, grid, method = c("slope", "intercept"), plot = TRUE, cluster_curves = FALSE, nsims_pi = 10000, CI = TRUE, CI_method = c("naive", "delta"), cl.level = 0.95 )MIXC( data = NULL, p, y, cluster, grid, method = c("slope", "intercept"), plot = TRUE, cluster_curves = FALSE, nsims_pi = 10000, CI = TRUE, CI_method = c("naive", "delta"), cl.level = 0.95 )
data |
optional data frame containing the variables |
p |
predicted probabilities (numeric vector) or name of the column in
|
y |
binary outcome variable or the name of the column in |
cluster |
Cluster identifier (factor, character, or integer) or name of
the column in |
grid |
the grid for the calibration curve evaluation |
method |
character, type of mixed-effects model: |
plot |
logical, indicating whether to generate a calibration plot. Default is |
cluster_curves |
logical, whether to include cluster-specific curves in the plot.
Default is |
nsims_pi |
integer, number of simulations for prediction intervals. Default is |
CI |
logical, whether to calculate confidence intervals. Default is |
CI_method |
character, method for computing the confidence intervals of the observed proportions.
If |
cl.level |
the confidence level for the calculation of the confidence interval. Default is |
This function estimates the calibration curves using a logistic generalized linear mixed model.
A list containing:
modelThe fitted mixed-effects model object
cluster_dataData frame with calibration data for each cluster
plot_dataData frame with calibration data for the average cluster
observed_dataData frame with calibration data for individual observations
plotA ggplot2 object if plot = TRUE, otherwise NULL
Prints the call, confidence level and values for the performance measures.
## S3 method for class 'CalibrationCurve' print(x, ...)## S3 method for class 'CalibrationCurve' print(x, ...)
x |
an object of type CalibrationCurve, resulting from |
... |
arguments passed to |
The original CalibrationCurve object is returned.
Prints the ggplot, call, confidence level and values for the performance measures.
## S3 method for class 'ClusteredCalibrationCurve' print(x, ...)## S3 method for class 'ClusteredCalibrationCurve' print(x, ...)
x |
an object of type ggplotCalibrationCurve, resulting from |
... |
arguments passed to |
The original ggplotCalibrationCurve object is returned.
Prints the call, confidence level and values for the performance measures.
## S3 method for class 'GeneralizedCalibrationCurve' print(x, ...)## S3 method for class 'GeneralizedCalibrationCurve' print(x, ...)
x |
an object of type GeneralizedCalibrationCurve, resulting from |
... |
arguments passed to |
The original GeneralizedCalibrationCurve object is returned.
Prints the ggplot, call, confidence level and values for the performance measures.
## S3 method for class 'ggplotCalibrationCurve' print(x, ...)## S3 method for class 'ggplotCalibrationCurve' print(x, ...)
x |
an object of type ggplotCalibrationCurve, resulting from |
... |
arguments passed to |
The original ggplotCalibrationCurve object is returned.
Print function for a SurvivalCalibrationCurve object
## S3 method for class 'SurvivalCalibrationCurve' print(x, ...)## S3 method for class 'SurvivalCalibrationCurve' print(x, ...)
x |
an object of type SurvivalCalibrationCurve, resulting from |
... |
arguments passed to |
The original SurvivalCalibrationCurve object is returned.
Both the clusteredtraindata and clusteredtestdata dataframe are synthetically generated data sets to illustrate the functionality of the package.
The clusteredtraindata has 1000 observations and the clusteredtestdata has 500 observations. The same settings were used to generate both data sets.
data(traindata) data(testdata)data(traindata) data(testdata)
ythe binary outcome variable
clusterthe cluster
x1covariate 1
x2covariate 2
x3covariate 3
x4covariate 4
x5covariate 5
See the examples for how the data sets were generated.
# The data sets were generated as follows lapply(c("magrittr", "dplyr"), library, character.only = TRUE) set.seed(1234) # Simulate training data nClusters = 10 p = 5 Uj = scale(rnorm(nClusters)) nPop = 1e6 nSample = 1e3 nTest = 1e3 X = replicate(p, rnorm(nPop)) Beta = rnorm(p) cluster = sample(seq_len(nClusters), nPop, TRUE) table(cluster) eta = X %*% Beta + Uj[match(cluster, seq_len(nClusters))] y = rbinom(nPop, 1, binomial()$linkinv(eta)) Dt = data.frame(y, X, cluster) colnames(Dt) %<>% tolower clustertraindata = Dt %>% filter(cluster %in% 1:5) %>% group_by(cluster) %>% sample_n(size = nSample) %>% as.data.frame clustertestdata = Dt %>% filter(cluster %in% 6:10) %>% group_by(cluster) %>% sample_n(size = nTest) %>% as.data.frame# The data sets were generated as follows lapply(c("magrittr", "dplyr"), library, character.only = TRUE) set.seed(1234) # Simulate training data nClusters = 10 p = 5 Uj = scale(rnorm(nClusters)) nPop = 1e6 nSample = 1e3 nTest = 1e3 X = replicate(p, rnorm(nPop)) Beta = rnorm(p) cluster = sample(seq_len(nClusters), nPop, TRUE) table(cluster) eta = X %*% Beta + Uj[match(cluster, seq_len(nClusters))] y = rbinom(nPop, 1, binomial()$linkinv(eta)) Dt = data.frame(y, X, cluster) colnames(Dt) %<>% tolower clustertraindata = Dt %>% filter(cluster %in% 1:5) %>% group_by(cluster) %>% sample_n(size = nSample) %>% as.data.frame clustertestdata = Dt %>% filter(cluster %in% 6:10) %>% group_by(cluster) %>% sample_n(size = nTest) %>% as.data.frame
Both the traindata and testdata dataframe are synthetically generated data sets to illustrate the functionality of the package. The traindata has 1000 observations and the testdata has 500 observations. The same settings were used to generate both data sets.
data(traindata) data(testdata)data(traindata) data(testdata)
ythe binary outcome variable
x1covariate 1
x2covariate 2
x3covariate 3
x4covariate 4
See the examples for how the data sets were generated.
# The data sets were generated as follows set.seed(1782) # Simulate training data nTrain = 1000 B = c(0.1, 0.5, 1.2, -0.75, 0.8) X = replicate(4, rnorm(nTrain)) p0true = binomial()$linkinv(cbind(1, X) %*% B) y = rbinom(nTrain, 1, p0true) colnames(X) = paste0("x", seq_len(ncol(X))) traindata = data.frame(y, X) # Simulate validation data nTest = 500 X = replicate(4, rnorm(nTest)) p0true = binomial()$linkinv(cbind(1, X) %*% B) y = rbinom(nTest, 1, p0true) colnames(X) = paste0("x", seq_len(ncol(X))) testdata = data.frame(y, X)# The data sets were generated as follows set.seed(1782) # Simulate training data nTrain = 1000 B = c(0.1, 0.5, 1.2, -0.75, 0.8) X = replicate(4, rnorm(nTrain)) p0true = binomial()$linkinv(cbind(1, X) %*% B) y = rbinom(nTrain, 1, p0true) colnames(X) = paste0("x", seq_len(ncol(X))) traindata = data.frame(y, X) # Simulate validation data nTest = 500 X = replicate(4, rnorm(nTest)) p0true = binomial()$linkinv(cbind(1, X) %*% B) y = rbinom(nTest, 1, p0true) colnames(X) = paste0("x", seq_len(ncol(X))) testdata = data.frame(y, X)
Both the traindata and testdata dataframe are synthetically generated data sets to illustrate the functionality of the package. The traindata has 5000 observations and the testdata has 1000 observations. The same settings were used to generate both data sets.
data(poissontraindata) data(poissontestdata)data(poissontraindata) data(poissontestdata)
ythe poisson distributed outcome variable
x1covariate 1
x2covariate 2
x3covariate 3
x4covariate 4
x5covariate 5
See the examples for how the data sets were generated.
# The data sets were generated as follows library(MASS) library(magrittr) ScaleRange <- function(x, xmin = -1, xmax = 1) { xRange = range(x) (x - xRange[1]) / diff(xRange) * (xmax - xmin) + xmin } set.seed(144) p = 5 N = 1e6 n = 5e3 nOOS = 1e3 S = matrix(NA, 5, 5) rho = c(0.025, 0, 0, 0.05, 0.075, 0, 0, 0.025, 0, 0) S[upper.tri(S)] = rho S[lower.tri(S)] = t(S)[lower.tri(S)] diag(S) = 1 Matrix::isSymmetric(S) X = mvrnorm(N, rep(0, p), Sigma = S, empirical = TRUE) X = apply(X, 2, ScaleRange) B = c(-2.3, 1.5, 2, -1, -2, -1.5) mu = poisson()$linkinv(cbind(1, X) %*% B) Y = rpois(N, mu) Df = data.frame(Y, X) colnames(Df)[-1] %<>% tolower() set.seed(2) DfS = Df[sample(1:nrow(Df), n, FALSE), ] DfOOS = Df[sample(1:nrow(Df), nOOS, FALSE), ] poissontraindata = DfS poissontestdata = DfOOS# The data sets were generated as follows library(MASS) library(magrittr) ScaleRange <- function(x, xmin = -1, xmax = 1) { xRange = range(x) (x - xRange[1]) / diff(xRange) * (xmax - xmin) + xmin } set.seed(144) p = 5 N = 1e6 n = 5e3 nOOS = 1e3 S = matrix(NA, 5, 5) rho = c(0.025, 0, 0, 0.05, 0.075, 0, 0, 0.025, 0, 0) S[upper.tri(S)] = rho S[lower.tri(S)] = t(S)[lower.tri(S)] diag(S) = 1 Matrix::isSymmetric(S) X = mvrnorm(N, rep(0, p), Sigma = S, empirical = TRUE) X = apply(X, 2, ScaleRange) B = c(-2.3, 1.5, 2, -1, -2, -1.5) mu = poisson()$linkinv(cbind(1, X) %*% B) Y = rpois(N, mu) Df = data.frame(Y, X) colnames(Df)[-1] %<>% tolower() set.seed(2) DfS = Df[sample(1:nrow(Df), n, FALSE), ] DfOOS = Df[sample(1:nrow(Df), nOOS, FALSE), ] poissontraindata = DfS poissontestdata = DfOOS
The training dataset contains real-life survival data from patients who underwent primary surgery for breast cancer between 1978 and 1993 in Rotterdam. The patients were followed until 2007, resulting in a model development cohort of 2982 patients after exclusions. The primary outcome measured was recurrence-free survival, defined as the time from primary surgery to recurrence or death.
The validation dataset consists of 686 patients with primary node-positive breast cancer from the German Breast Cancer Study Group. In this cohort, 285 patients suffered a recurrence or died within 5 years of follow-up, while 280 were censored before 5 years. Five-year predictions were chosen as that was the lowest median survival from the two cohorts (Rotterdam cohort, 6.7 years; German cohort, 4.9 years).
data(trainDataSurvival) data(testDataSurvival)data(trainDataSurvival) data(testDataSurvival)
A data frame with observations on the following 26 variables.
patient identifier
year of surgery
age at surgery
menopausal status (0 = premenopausal, 1 = postmenopausal)
tumor size, a factor with levels <= 20, 20-50, >50
differentiation grade
number of positive lymph nodes
progesterone receptors (fmol/l)
estrogen receptors (fmol/l)
hormonal treatment (0 = no, 1 = yes)
chemotherapy
days to relapse or last follow-up
0 = no relapse, 1 = relapse
days to death or last follow-up
0 = alive, 1 = dead
Follow-up time for RFS, in years (numeric)
Recurrence-free survival status (0 = no event, 1 = event) (numeric)
Winsorized progesterone receptor level (numeric)
Winsorized node count (numeric)
Categorized tumor size, copied from size (factor)
Categorized node involvement (factor: "0", "1-3", ">3")
Recoded grade factor (levels: "1-2", "3")
Restricted cubic spline basis for nodes2 (numeric)
Restricted cubic spline basis for original pgr (numeric)
Follow-up epoch indicator after splitting at 5 years (numeric)
The data sets are based on the publicly available code and data used in the repository Prediction_performance_survival by Giardiello et al. (2023), which accompanies the Annals of Internal Medicine article "Assessing Performance and Clinical Usefulness in Prediction Models With Survival Outcomes: Practical Guidance for Cox Proportional Hazards Models".
All preprocessing steps, such as converting survival time to years, defining recurrence-free survival status via 'rfs = pmax(recur, death)', correcting 43 discordant cases using death time, 99th-percentile winsorization of 'pgr' and 'nodes', spline transformations ('nodes3', 'pgr3'), splitting follow-up at 5 years ('epoch'), and recoding categorical variables ('csize', 'cnode', 'grade3')—were performed exactly as in the Giardiello code.
The training dataset, trainDataSurvival, consists of 2982 patients, with 1713 events occurring over a maximum
follow-up time of 19.3 years. The estimated median potential follow-up time, calculated using the reverse Kaplan-
method, was 9.3 years. Out of these patients, 1275 suffered a recurrence or death within the follow-up time of interest
(5 years), and 126 were censored before 5 years.
The validation dataset, testDataSurvival, consists of 686 patients with primary node-positive breast cancer
from the German Breast Cancer Study Group. In this cohort, 285 patients suffered a recurrence or died within 5 years
of follow-up, while 280 were censored before 5 years. Five-year predictions were chosen as that was the lowest median
survival from the two cohorts (Rotterdam cohort, 6.7 years; German cohort, 4.9 years).
David J. McLernon, Daniele Giardiello, Ben Van Calster, et al. (2023). Assessing Performance and Clinical Usefulness in Prediction Models With Survival Outcomes: Practical Guidance for Cox Proportional Hazards Models. Annals of Internal Medicine, 176(1), pp. 105-114, doi:10.7326/M22-0844
data(testDataSurvival) ## Explore the structure of the dataset str(testDataSurvival)data(testDataSurvival) ## Explore the structure of the dataset str(testDataSurvival)
The function val.prob.ci.2 is an adaptation of val.prob from Frank Harrell's rms package,
https://cran.r-project.org/package=rms. Hence, the description of some of the functions of val.prob.ci.2
come from the the original val.prob.
The key feature of val.prob.ci.2 is the generation of logistic and flexible calibration curves and related statistics.
When using this code, please cite: Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M.J., Steyerberg,
E.W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of Clinical Epidemiology,
74, pp. 167-176
val.prob.ci.2( p, y, logit, group, weights = rep(1, length(y)), normwt = FALSE, pl = TRUE, smooth = c("loess", "rcs", "none"), CL.smooth = "fill", CL.BT = FALSE, lty.smooth = 1, col.smooth = "black", lwd.smooth = 1, nr.knots = 5, logistic.cal = FALSE, lty.log = 1, col.log = "black", lwd.log = 1, xlab = "Predicted probability", ylab = "Observed proportion", xlim = c(-0.02, 1), ylim = c(-0.15, 1), m, g, cuts, emax.lim = c(0, 1), legendloc = c(0.5, 0.27), statloc = c(0, 0.85), dostats = TRUE, cl.level = 0.95, method.ci = "pepe", roundstats = 2, riskdist = "predicted", cex = 0.75, cex.leg = 0.75, connect.group = FALSE, connect.smooth = TRUE, g.group = 4, evaluate = 100, nmin = 0, d0lab = "0", d1lab = "1", cex.d01 = 0.7, dist.label = 0.04, line.bins = -0.05, dist.label2 = 0.03, cutoff, las = 1, length.seg = 1, y.intersp = 1, lty.ideal = 1, col.ideal = "red", lwd.ideal = 1, allowPerfectPredictions = FALSE, argzLoess = alist(degree = 2), ... )val.prob.ci.2( p, y, logit, group, weights = rep(1, length(y)), normwt = FALSE, pl = TRUE, smooth = c("loess", "rcs", "none"), CL.smooth = "fill", CL.BT = FALSE, lty.smooth = 1, col.smooth = "black", lwd.smooth = 1, nr.knots = 5, logistic.cal = FALSE, lty.log = 1, col.log = "black", lwd.log = 1, xlab = "Predicted probability", ylab = "Observed proportion", xlim = c(-0.02, 1), ylim = c(-0.15, 1), m, g, cuts, emax.lim = c(0, 1), legendloc = c(0.5, 0.27), statloc = c(0, 0.85), dostats = TRUE, cl.level = 0.95, method.ci = "pepe", roundstats = 2, riskdist = "predicted", cex = 0.75, cex.leg = 0.75, connect.group = FALSE, connect.smooth = TRUE, g.group = 4, evaluate = 100, nmin = 0, d0lab = "0", d1lab = "1", cex.d01 = 0.7, dist.label = 0.04, line.bins = -0.05, dist.label2 = 0.03, cutoff, las = 1, length.seg = 1, y.intersp = 1, lty.ideal = 1, col.ideal = "red", lwd.ideal = 1, allowPerfectPredictions = FALSE, argzLoess = alist(degree = 2), ... )
p |
predicted probability |
y |
vector of binary outcomes |
logit |
predicted log odds of outcome. Specify either |
group |
a grouping variable. If numeric this variable is grouped into
|
weights |
an optional numeric vector of per-observation weights (usually frequencies),
used only if |
normwt |
set to |
pl |
|
smooth |
|
CL.smooth |
|
CL.BT |
|
lty.smooth |
the linetype of the flexible calibration curve. Default is |
col.smooth |
the color of the flexible calibration curve. Default is |
lwd.smooth |
the line width of the flexible calibration curve. Default is |
nr.knots |
specifies the number of knots for rcs-based calibration curve. The default as well as the highest allowed value is 5. In case the specified number of knots leads to estimation problems, then the number of knots is automatically reduced to the closest value without estimation problems. |
logistic.cal |
|
lty.log |
if |
col.log |
if |
lwd.log |
if |
xlab |
x-axis label, default is |
ylab |
y-axis label, default is |
xlim, ylim
|
numeric vectors of length 2, giving the x and y coordinates ranges (see |
m |
If grouped proportions are desired, minimum no. observations per group |
g |
If grouped proportions are desired, number of quantile groups |
cuts |
If grouped proportions are desired, actual cut points for constructing
intervals, e.g. |
emax.lim |
Vector containing lowest and highest predicted probability over which to
compute |
legendloc |
if |
statloc |
the "abc" of model performance (Steyerberg et al., 2011)-calibration intercept, calibration slope,
and c statistic-will be added to the plot, using statloc as the upper left corner of a box (default is c(0,.85).
You can specify a list or a vector. Use locator(1) for the mouse, |
dostats |
specifies whether and which performance measures are shown in the figure.
|
cl.level |
if |
method.ci |
method to calculate the confidence interval of the c-statistic. The argument is passed to |
roundstats |
specifies the number of decimals to which the statistics are rounded when shown in the plot. Default is 2. |
riskdist |
Use |
cex, cex.leg
|
controls the font size of the statistics ( |
connect.group |
Defaults to |
connect.smooth |
Defaults to |
g.group |
number of quantile groups to use when |
evaluate |
number of points at which to store the |
nmin |
applies when |
d0lab, d1lab
|
controls the labels for events and non-events (i.e. outcome y) for the histograms.
Defaults are |
cex.d01 |
controls the size of the labels for events and non-events. Default is 0.7. |
dist.label |
controls the horizontal position of the labels for events and non-events. Default is 0.04. |
line.bins |
controls the horizontal (y-axis) position of the histograms. Default is -0.05. |
dist.label2 |
controls the vertical distance between the labels for events and non-events. Default is 0.03. |
cutoff |
puts an arrow at the specified risk cut-off(s). Default is none. |
las |
controls whether y-axis values are shown horizontally (1) or vertically (0). |
length.seg |
controls the length of the histogram lines. Default is |
y.intersp |
character interspacing for vertical line distances of the legend ( |
lty.ideal |
linetype of the ideal line. Default is |
col.ideal |
controls the color of the ideal line on the plot. Default is |
lwd.ideal |
controls the line width of the ideal line on the plot. Default is |
allowPerfectPredictions |
Logical, indicates whether perfect predictions (i.e. values of either 0 or 1) are allowed. Default is |
argzLoess |
a list with arguments passed to the |
... |
When using the predicted probabilities of an uninformative model (i.e. equal probabilities for all observations), the model has no predictive value. Consequently, where applicable, the value of the performance measure corresponds to the worst possible theoretical value. For the ECI, for example, this equals 1 (Edlinger et al., 2022).
An object of type CalibrationCurve with the following slots:
call |
the matched call. |
stats |
a vector containing performance measures of calibration. |
cl.level |
the confidence level used. |
Calibration |
contains the calibration intercept and slope, together with their confidence intervals. |
Cindex |
the value of the c-statistic, together with its confidence interval. |
warningMessages |
if any, the warning messages that were printed while running the function. |
CalibrationCurves |
The coordinates for plotting the calibration curves. |
In order to make use (of the functions) of the package auRoc, the user needs to install JAGS. However, since our package only uses the
auc.nonpara.mw function which does not depend on the use of JAGS, we therefore copied the code and slightly adjusted it when
method="pepe".
Edlinger, M, van Smeden, M, Alber, HF, Wanitschek, M, Van Calster, B. (2022). Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption. Statistics in Medicine, 41( 8), pp. 1334– 1360
Qin, G., & Hotilovac, L. (2008). Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Statistical Methods in Medical Research, 17(2), pp. 207-21
Steyerberg, E.W., Van Calster, B., Pencina, M.J. (2011). Performance measures for prediction models and markers : evaluation of predictions and classifications. Revista Espanola de Cardiologia, 64(9), pp. 788-794
Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina M., Steyerberg E.W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of Clinical Epidemiology, 74, pp. 167-176
Van Hoorde, K., Van Huffel, S., Timmerman, D., Bourne, T., Van Calster, B. (2015). A spline-based tool to assess and visualize the calibration of multiclass risk predictions. Journal of Biomedical Informatics, 54, pp. 283-93
# Load package library(CalibrationCurves) set.seed(1783) # Simulate training data X = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, X) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) y = rbinom(5e2, 1, p0true) Df = data.frame(y, X) # Fit logistic model FitLog = glm(y ~ ., Df, family = binomial) # Simulate validation data Xval = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, Xval) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) yval = rbinom(5e2, 1, p0true) Pred = binomial()$linkinv(cbind(1, Xval) %*% coef(FitLog)) # Default calibration plot val.prob.ci.2(Pred, yval) # Adding logistic calibration curves and other additional features val.prob.ci.2(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 2, col.log = "red", lwd.log = 1.5) val.prob.ci.2(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 9, col.log = "red", lwd.log = 1.5, col.ideal = colors()[10], lwd.ideal = 0.5)# Load package library(CalibrationCurves) set.seed(1783) # Simulate training data X = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, X) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) y = rbinom(5e2, 1, p0true) Df = data.frame(y, X) # Fit logistic model FitLog = glm(y ~ ., Df, family = binomial) # Simulate validation data Xval = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, Xval) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) yval = rbinom(5e2, 1, p0true) Pred = binomial()$linkinv(cbind(1, Xval) %*% coef(FitLog)) # Default calibration plot val.prob.ci.2(Pred, yval) # Adding logistic calibration curves and other additional features val.prob.ci.2(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 2, col.log = "red", lwd.log = 1.5) val.prob.ci.2(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 9, col.log = "red", lwd.log = 1.5, col.ideal = colors()[10], lwd.ideal = 0.5)
This function evaluates the calibration performance of a model's predicted probabilities whilst accounting for clustering. The function supports multiple approaches ('"default"', '"CGC"', '"MAC2"', '"MIXC"') and returns the results as well as a 'ggplot' object.
valProbCluster( data = NULL, p, y, cluster, plot = TRUE, approach = c("default", "MIXC", "CGC", "MAC2"), cl.level = 0.95, xlab = "Predicted probability", ylab = "Observed proportion", grid_l = 100, rangeGrid = range(p), ... )valProbCluster( data = NULL, p, y, cluster, plot = TRUE, approach = c("default", "MIXC", "CGC", "MAC2"), cl.level = 0.95, xlab = "Predicted probability", ylab = "Observed proportion", grid_l = 100, rangeGrid = range(p), ... )
data |
optional, a data frame containing the variables |
p |
predicted probabilities (numeric vector) or name of the column in |
y |
binary outcome variable or the name of the column in |
cluster |
cluster identifier (factor, character, or integer) or name of the column in |
plot |
logical, indicates whether a plot needs to be produced. If |
approach |
character string specifying which calibration method to use. Must be one of the following:
Defaults to |
cl.level |
the confidence level for the calculation of the confidence intervals. Default is |
xlab |
label for the x-axis of the plot (default is |
ylab |
label for the y-axis of the plot (default is |
grid_l |
integer. Number of points in the probability grid for plotting
(default is |
rangeGrid |
the range of the grid. Default is |
... |
additional arguments to be passed to the selected subfunction
( |
The function internally calls one of the following subfunctions:
CGC(p, y, cluster, plot, ...)
MAC2(p, y, cluster, plot, grid, ...)
MIXC(p, y, cluster, plot, CI, grid, ...)
Extra arguments supplied via the ellipsis argument ... are passed directly to the chosen
subfunction. Please check the additional documentation of
CGC, MAC2 and MIXC for detailed information on the arguments.
An object of class "ClusteredCalibrationCurve" containing:
call: the matched call.
approach: the chosen approach.
cl.level: the confidence level used.
grid: probability grid used for plotting.
ggPlot: a ggplot object if plot = TRUE,
otherwise NULL.
results: results from the chosen subfunction. For
approach = "default", this is a list with two elements:
overall (MAC2 results with the overall curve data) and
clusters (MIXC results with cluster-specific data).
Barreñada, L., De Cock Campo, B., Wynants, L., Van Calster, B. (2025). Clustered Flexible Calibration Plots for Binary Outcomes Using Random Effects Modeling. Research Synthesis Methods. Published online 2025:1-22. doi:10.1017/rsm.2025.10046
library(lme4) data("clustertraindata") data("clustertestdata") mFit = glmer(y ~ x1 + x2 + x3 + x5 + (1 | cluster), data = clustertraindata, family = "binomial") preds = predict(mFit, clustertestdata, type = "response", re.form = NA) y = clustertestdata$y cluster = clustertestdata$cluster valClusterData = data.frame(y = y, preds = preds, center = cluster) # Assess calibration performance Results = valProbCluster( p = valClusterData$preds, y = valClusterData$y, cluster = valClusterData$center, plot = TRUE, approach = "MIXC", method = "slope", grid_l = 100 ) Resultslibrary(lme4) data("clustertraindata") data("clustertestdata") mFit = glmer(y ~ x1 + x2 + x3 + x5 + (1 | cluster), data = clustertraindata, family = "binomial") preds = predict(mFit, clustertestdata, type = "response", re.form = NA) y = clustertestdata$y cluster = clustertestdata$cluster valClusterData = data.frame(y = y, preds = preds, center = cluster) # Assess calibration performance Results = valProbCluster( p = valClusterData$preds, y = valClusterData$y, cluster = valClusterData$center, plot = TRUE, approach = "MIXC", method = "slope", grid_l = 100 ) Results
The function valProbggplot is an adaptation of val.prob from Frank Harrell's rms package,
https://cran.r-project.org/package=rms. Hence, the description of some of the functions of valProbggplot
come from the the original val.prob.
The key feature of valProbggplot is the generation of logistic and flexible calibration curves and related statistics.
When using this code, please cite: Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M.J., Steyerberg,
E.W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of Clinical Epidemiology,
74, pp. 167-176
valProbggplot( p, y, logit, group, weights = rep(1, length(y)), normwt = FALSE, pl = TRUE, smooth = c("loess", "rcs", "none"), CL.smooth = "fill", CL.BT = FALSE, lty.smooth = 1, col.smooth = "black", lwd.smooth = 1, nr.knots = 5, logistic.cal = FALSE, lty.log = 1, col.log = "black", lwd.log = 1, xlab = "Predicted probability", ylab = "Observed proportion", xlim = c(-0.02, 1), ylim = c(-0.15, 1), m, g, cuts, emax.lim = c(0, 1), legendloc = c(0.5, 0.27), statloc = c(0, 0.85), dostats = TRUE, cl.level = 0.95, method.ci = "pepe", roundstats = 2, riskdist = "predicted", size = 3, size.leg = 5, connect.group = FALSE, connect.smooth = TRUE, g.group = 4, evaluate = 100, nmin = 0, d0lab = "0", d1lab = "1", size.d01 = 5, dist.label = 0.01, line.bins = -0.05, dist.label2 = 0.04, cutoff, length.seg = 0.85, lty.ideal = 1, col.ideal = "red", lwd.ideal = 1, allowPerfectPredictions = FALSE, argzLoess = alist(degree = 2) )valProbggplot( p, y, logit, group, weights = rep(1, length(y)), normwt = FALSE, pl = TRUE, smooth = c("loess", "rcs", "none"), CL.smooth = "fill", CL.BT = FALSE, lty.smooth = 1, col.smooth = "black", lwd.smooth = 1, nr.knots = 5, logistic.cal = FALSE, lty.log = 1, col.log = "black", lwd.log = 1, xlab = "Predicted probability", ylab = "Observed proportion", xlim = c(-0.02, 1), ylim = c(-0.15, 1), m, g, cuts, emax.lim = c(0, 1), legendloc = c(0.5, 0.27), statloc = c(0, 0.85), dostats = TRUE, cl.level = 0.95, method.ci = "pepe", roundstats = 2, riskdist = "predicted", size = 3, size.leg = 5, connect.group = FALSE, connect.smooth = TRUE, g.group = 4, evaluate = 100, nmin = 0, d0lab = "0", d1lab = "1", size.d01 = 5, dist.label = 0.01, line.bins = -0.05, dist.label2 = 0.04, cutoff, length.seg = 0.85, lty.ideal = 1, col.ideal = "red", lwd.ideal = 1, allowPerfectPredictions = FALSE, argzLoess = alist(degree = 2) )
p |
predicted probability |
y |
vector of binary outcomes |
logit |
predicted log odds of outcome. Specify either |
group |
a grouping variable. If numeric this variable is grouped into
|
weights |
an optional numeric vector of per-observation weights (usually frequencies),
used only if |
normwt |
set to |
pl |
|
smooth |
|
CL.smooth |
|
CL.BT |
|
lty.smooth |
the linetype of the flexible calibration curve. Default is |
col.smooth |
the color of the flexible calibration curve. Default is |
lwd.smooth |
the line width of the flexible calibration curve. Default is |
nr.knots |
specifies the number of knots for rcs-based calibration curve. The default as well as the highest allowed value is 5. In case the specified number of knots leads to estimation problems, then the number of knots is automatically reduced to the closest value without estimation problems. |
logistic.cal |
|
lty.log |
if |
col.log |
if |
lwd.log |
if |
xlab |
x-axis label, default is |
ylab |
y-axis label, default is |
xlim, ylim
|
numeric vectors of length 2, giving the x and y coordinates ranges (see |
m |
If grouped proportions are desired, minimum no. observations per group |
g |
If grouped proportions are desired, number of quantile groups |
cuts |
If grouped proportions are desired, actual cut points for constructing
intervals, e.g. |
emax.lim |
Vector containing lowest and highest predicted probability over which to
compute |
legendloc |
if |
statloc |
the "abc" of model performance (Steyerberg et al., 2011)-calibration intercept, calibration slope,
and c statistic-will be added to the plot, using statloc as the upper left corner of a box (default is c(0,.85).
You can specify a list or a vector. Use locator(1) for the mouse, |
dostats |
specifies whether and which performance measures are shown in the figure.
|
cl.level |
if |
method.ci |
method to calculate the confidence interval of the c-statistic. The argument is passed to |
roundstats |
specifies the number of decimals to which the statistics are rounded when shown in the plot. Default is 2. |
riskdist |
Use |
size, size.leg
|
controls the font size of the statistics ( |
connect.group |
Defaults to |
connect.smooth |
Defaults to |
g.group |
number of quantile groups to use when |
evaluate |
number of points at which to store the |
nmin |
applies when |
d0lab, d1lab
|
controls the labels for events and non-events (i.e. outcome y) for the histograms.
Defaults are |
size.d01 |
controls the size of the labels for events and non-events. Default is 5. |
dist.label |
controls the horizontal position of the labels for events and non-events. Default is 0.01. |
line.bins |
controls the horizontal (y-axis) position of the histograms. Default is -0.05. |
dist.label2 |
controls the vertical distance between the labels for events and non-events. Default is 0.03. |
cutoff |
puts an arrow at the specified risk cut-off(s). Default is none. |
length.seg |
controls the length of the histogram lines. Default is |
lty.ideal |
linetype of the ideal line. Default is |
col.ideal |
controls the color of the ideal line on the plot. Default is |
lwd.ideal |
controls the line width of the ideal line on the plot. Default is |
allowPerfectPredictions |
Logical, indicates whether perfect predictions (i.e. values of either 0 or 1) are allowed. Default is |
argzLoess |
a list with arguments passed to the |
When using the predicted probabilities of an uninformative model (i.e. equal probabilities for all observations), the model has no predictive value. Consequently, where applicable, the value of the performance measure corresponds to the worst possible theoretical value. For the ECI, for example, this equals 1 (Edlinger et al., 2022).
An object of type ggplotCalibrationCurve with the following slots:
call |
the matched call. |
ggPlot |
the ggplot object. |
stats |
a vector containing performance measures of calibration. |
cl.level |
the confidence level used. |
Calibration |
contains the calibration intercept and slope, together with their confidence intervals. |
Cindex |
the value of the c-statistic, together with its confidence interval. |
warningMessages |
if any, the warning messages that were printed while running the function. |
CalibrationCurves |
The coordinates for plotting the calibration curves. |
In order to make use (of the functions) of the package auRoc, the user needs to install JAGS. However, since our package only uses the
auc.nonpara.mw function which does not depend on the use of JAGS, we therefore copied the code and slightly adjusted it when
method="pepe".
Edlinger, M, van Smeden, M, Alber, HF, Wanitschek, M, Van Calster, B. (2022). Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption. Statistics in Medicine, 41( 8), pp. 1334– 1360
Qin, G., & Hotilovac, L. (2008). Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Statistical Methods in Medical Research, 17(2), pp. 207-21
Steyerberg, E.W., Van Calster, B., Pencina, M.J. (2011). Performance measures for prediction models and markers : evaluation of predictions and classifications. Revista Espanola de Cardiologia, 64(9), pp. 788-794
Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina M., Steyerberg E.W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of Clinical Epidemiology, 74, pp. 167-176
Van Hoorde, K., Van Huffel, S., Timmerman, D., Bourne, T., Van Calster, B. (2015). A spline-based tool to assess and visualize the calibration of multiclass risk predictions. Journal of Biomedical Informatics, 54, pp. 283-93
# Load package library(CalibrationCurves) set.seed(1783) # Simulate training data X = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, X) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) y = rbinom(5e2, 1, p0true) Df = data.frame(y, X) # Fit logistic model FitLog = glm(y ~ ., Df, family = binomial) # Simulate validation data Xval = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, Xval) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) yval = rbinom(5e2, 1, p0true) Pred = binomial()$linkinv(cbind(1, Xval) %*% coef(FitLog)) # Default calibration plot valProbggplot(Pred, yval) # Adding logistic calibration curves and other additional features valProbggplot(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 2, col.log = "red", lwd.log = 1.5) valProbggplot(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 9, col.log = "red", lwd.log = 1.5, col.ideal = colors()[10], lwd.ideal = 0.5)# Load package library(CalibrationCurves) set.seed(1783) # Simulate training data X = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, X) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) y = rbinom(5e2, 1, p0true) Df = data.frame(y, X) # Fit logistic model FitLog = glm(y ~ ., Df, family = binomial) # Simulate validation data Xval = replicate(4, rnorm(5e2)) p0true = binomial()$linkinv(cbind(1, Xval) %*% c(0.1, 0.5, 1.2, -0.75, 0.8)) yval = rbinom(5e2, 1, p0true) Pred = binomial()$linkinv(cbind(1, Xval) %*% coef(FitLog)) # Default calibration plot valProbggplot(Pred, yval) # Adding logistic calibration curves and other additional features valProbggplot(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 2, col.log = "red", lwd.log = 1.5) valProbggplot(Pred, yval, CL.smooth = TRUE, logistic.cal = TRUE, lty.log = 9, col.log = "red", lwd.log = 1.5, col.ideal = colors()[10], lwd.ideal = 0.5)
Plot a calibration curve for a Cox Proportional Hazards model
valProbSurvival( fit, valdata, weights = NULL, alpha = 0.05, timeHorizon = 5, nk = 3, plotCal = c("none", "base", "ggplot"), addCox = FALSE, addRCS = TRUE, CL.cox = c("fill", "line"), CL.rcs = c("fill", "line"), xlab = "Predicted probability", ylab = "Observed proportion", xlim = c(-0.02, 1), ylim = c(-0.15, 1), lty.ideal = 1, col.ideal = "red", lwd.ideal = 1, lty.cox = 1, col.cox = "grey", lwd.cox = 1, fill.cox = "lightgrey", lty.rcs = 1, col.rcs = "black", lwd.rcs = 1, fill.rcs = rgb(177, 177, 177, 177, maxColorValue = 255), riskdist = "predicted", d0lab = "0", d1lab = "1", size.d01 = 5, dist.label = 0.01, line.bins = -0.05, dist.label2 = 0.04, length.seg = 0.85, legendloc = c(0.5, 0.27) )valProbSurvival( fit, valdata, weights = NULL, alpha = 0.05, timeHorizon = 5, nk = 3, plotCal = c("none", "base", "ggplot"), addCox = FALSE, addRCS = TRUE, CL.cox = c("fill", "line"), CL.rcs = c("fill", "line"), xlab = "Predicted probability", ylab = "Observed proportion", xlim = c(-0.02, 1), ylim = c(-0.15, 1), lty.ideal = 1, col.ideal = "red", lwd.ideal = 1, lty.cox = 1, col.cox = "grey", lwd.cox = 1, fill.cox = "lightgrey", lty.rcs = 1, col.rcs = "black", lwd.rcs = 1, fill.rcs = rgb(177, 177, 177, 177, maxColorValue = 255), riskdist = "predicted", d0lab = "0", d1lab = "1", size.d01 = 5, dist.label = 0.01, line.bins = -0.05, dist.label2 = 0.04, length.seg = 0.85, legendloc = c(0.5, 0.27) )
fit |
the model fit, has to be of type |
valdata |
the validation data set |
weights |
vector of case weights |
alpha |
the significance level |
timeHorizon |
the time point at which the predictions have to be evaluated |
nk |
the number of knots, for the restricted cubic splines fit |
plotCal |
indicates if and how the calibration curve has to be plotted.
|
addCox |
logical, indicates if the Cox's estimated calibration curve has to be added to the plot |
addRCS |
logical, indicates if the restricted cubic splines' (RCS) estimated calibration curve has to be added to the plot |
CL.cox |
|
CL.rcs |
|
xlab |
x-axis label, default is |
ylab |
y-axis label, default is |
xlim, ylim
|
numeric vectors of length 2, giving the x and y coordinates ranges (see |
lty.ideal |
linetype of the ideal line. Default is |
col.ideal |
controls the color of the ideal line on the plot. Default is |
lwd.ideal |
controls the line width of the ideal line on the plot. Default is |
lty.cox |
if |
col.cox |
if |
lwd.cox |
if |
fill.cox |
if |
lty.rcs |
if |
col.rcs |
if |
lwd.rcs |
if |
fill.rcs |
if |
riskdist |
Use |
d0lab, d1lab
|
controls the labels for events and non-events (i.e. outcome y) for the histograms.
Defaults are |
size.d01 |
controls the size of the labels for events and non-events. Default is 5 and
this value is multiplied by 0.25 when |
dist.label |
controls the horizontal position of the labels for events and non-events. Default is 0.04. |
line.bins |
controls the horizontal (y-axis) position of the histograms. Default is -0.05. |
dist.label2 |
controls the vertical distance between the labels for events and non-events. Default is 0.03. |
length.seg |
controls the length of the histogram lines. Default is |
legendloc |
if |
An object of type SurvivalCalibrationCurves with the following slots:
call |
the matched call. |
stats |
a list containing performance measures of calibration. |
alpha |
the significance level used. |
Calibration |
contains the estimated calibration slope, together with their confidence intervals. |
CalibrationCurves |
The coordinates for plotting the calibration curves. |
van Geloven N, Giardiello D, Bonneville E F, Teece L, Ramspek C L, van Smeden M et al. (2022). Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ, 377:e069249, doi:10.1136/bmj-2021-069249
## Not run: library(CalibrationCurves) library(survival) data(trainDataSurvival) data(testDataSurvival) sFit = coxph(Surv(ryear, rfs) ~ csize + cnode + grade3, data = trainDataSurvival, x = TRUE, y = TRUE) calPerf = valProbSurvival(sFit, testDataSurvival, plotCal = "base", nk = 5) ## End(Not run)## Not run: library(CalibrationCurves) library(survival) data(trainDataSurvival) data(testDataSurvival) sFit = coxph(Surv(ryear, rfs) ~ csize + cnode + grade3, data = trainDataSurvival, x = TRUE, y = TRUE) calPerf = valProbSurvival(sFit, testDataSurvival, plotCal = "base", nk = 5) ## End(Not run)