Introduction
The assessment of risk, or predictive modelling, is a fundamental strategy across nearly all medical disciplines [
1‐
3], including respiratory conditions like community-acquired pneumonia [
4], asthma [
5] and COPD [
6]. Similarly, risk assessment has several implications on different levels of intensive care medicine. For example, predictive modelling can evaluate population outcomes, such as intensive care unit (ICU) benchmarking, to assess whether the observed mortality matches predicted mortality. Furthermore, predictive models could be employed for personalised applications ranging from identifying patients at risk for preventive measures and clinical trials over ICU allocation strategies in resource-limited settings to severity tailored treatment regimens.
Numerous intensive care scoring systems like the Acute Physiology and Chronic Health Evaluation (APACHE) Score [
7] and the Simplified Acute Physiology Score (SAPS) [
8] were developed over the last decades, many of which have been validated across multiple patient populations [
9]. Subsequently, several risk tools have been updated with improved algorithms and additional predictors [
10,
11]. More recently, due to the surge of big data and machine learning, new well-performing algorithms have been proposed [
12,
13].
Due to limited score performance and evidence supporting severity guided treatment approaches, physicians rarely use ICU scores on individual patients. Still, to inform relatives, physicians subjectively estimate patient outcomes based on clinical parameters, experience and personal factors. Whether accurate or not, these subjective estimates may affect treatment decisions, such as life support limitations [
14]. While subjective clinical assessments are easy to obtain and may inform about pathophysiological features difficult to capture elsewhere, they are also prone to bias.
This prospective international study addresses the strengths and limitations of subjective and objective survival prediction markers in mechanically ventilated critically ill patients. We assess potential predictors across groups of patient characteristics, diseases and biomarkers individually, combine them in models and validate these models for predicting short- and long-term outcome. Finally, we propose how combined subjective probability estimates and objective markers synergise the strengths of different prognostic assessments.
Discussion
This study highlights the strengths and limitations of different ICU risk assessment strategies. While objective predictive measures are generally preferable, prognostic models lack reproducibility and do not sufficiently predict outcome. Subjective estimates of the clinical staff, presumably frequently used on individual patients, perform similarly well; however, subjective assessments overestimate death, potentially affecting patient information and medical decision making. We conclude that subjective individual high-risk estimates need to be interpreted with caution and should be compared to objective risk measures.
Our first goal was to characterise different predictors individually, thereby assessing the contribution of particular pathophysiological aspects regarding outcome. We identified two groups of predictors. Predictors of the first group were poorly associated with other predictors and were considered more specific disease markers, reflecting distinct pathophysiological states. Examples are liver disorders or poor oxygenation (pO
2/FiO
2), which contribute to outcome if present but do not capture prognosis over a wide range of disease states. Predictors of the second group belonged to the group of highly related markers. Rather than reflecting a small disease spectrum, they address a more comprehensive range of disease entities and severity thereof. Several predictors within this group were related to kidney function, such as the requirement for dialysis, urea and urinary output. These markers were performing exceptionally well, emphasising the already well-established role of renal failure regarding outcome [
21,
22]. Other predictors of this group were the newer biomarkers proADM and proANP, for which it was previously shown that they predict outcome across different acute and chronic diseases [
17,
18,
20].
Single parameters do not perform sufficiently well and are commonly combined in prognostic prediction models. However, proposed models are often overly optimistic, primarily due to a small number of outcomes, a large number of predictors and feature selection approaches, also referred to as overfitting [
23]. We tried to overcome these statistical limitations with several strategies. First, we used different feature selection approaches to compare their performance in order to evaluate the contribution of modelling. Furthermore, we validate all models in a predefined independent cohort using the model performance measures of discrimination and calibration. We observed that several models adequately discriminated survivors from non-survivors in the development cohort. However, despite a resampling step, the performance of all models considerably declined from the development to the validation cohort. Only the models including biomarkers or all predictors had a moderate performance in the validation cohort. In contrast, the pooled markers of patient characteristics, diseases and treatment were poorly or not predictive. This finding stresses the reliability of internal cross-validation and highlights the importance of independent validation. A particular goal of our study was to address long-term outcome. It is well known that several acute diseases contribute to mortality after the very immediate phase and hospital stay. However, new unrelated events are more likely to occur during a more extended prediction period and might restrict prediction. Surprisingly, model performance improved from 28-day to 1-year survival, indicating that the initial event, leading to mechanical ventilation, significantly contributes to mortality beyond the acute stage.
While these models provide insights into the pathophysiological compartments related to outcome, these models did not outperform SAPS2 and were insufficient to emphasise their use in individual patients. However, we argue that individual risk assessments are frequently performed to inform relatives and guide treatment decisions, such as deciding whether to initiate resuscitation procedures, life support treatments or palliative care [
14]. Most commonly, ICU physicians subjectively estimate individual patient risk rather than using objective tools. We investigated whether subjective survival estimates could provide additional prognostic information. We observed that these estimates performed similarly to more complicated objective prediction models. Especially junior physicians performed well to discriminate survivors and non-survivors, whereas the performance of nurses was slightly lower. While nurses have better information on the patient's social history and life, nurses mostly do not know all details of the patient's examinations [
14]. Both could negatively influence the accuracy of their estimates. Regardless of the discriminative performance, all of the clinical staff overestimated death in high-risk patients. Death was overestimated in nurses, junior doctors and attending physicians, medical and surgical patients, older and younger patients, and at different study centres. Overestimation of death is of particular concern since treatment and life support may be withheld from patients who are estimated to be at the highest risk for death [
14,
24,
25].
In order to minimise misclassification and poor calibration, we combined subjective and objective prediction tools. We demonstrate that objective tools can refine subjective risk estimates. Objective measures identified patients at lower risk within different subjective groups and improved calibration of subjective assessments. While subjective low-risk estimates are relatively accurate and moderate-risk estimates presumably have a minor impact on clinical management, there should be a focus on subjective high-risk estimates. High-risk estimates need to be interpreted cautiously and if possible, compared to objective risk tools. Whereas concordant subjective and objective prediction measures may reinforce the evaluation, disagreeing results should question the assessment.
We have to report several limitations of our study. Not all eligible ICU patients have been screened throughout the study period, therefore generating the risk of selection bias. However, patient inclusion was mainly driven by available study personal and given that the study population was extremely diverse it is unlikely that a minor patient selection would have a strong effect on predictive markers. The assessed risk may have an impact on treatment and especially high-risk assessments may lead to withdrawing or withholding treatments (self-fulfilling-prophecy). Since we have no details on withholding or withdrawing treatments, we cannot exclude that risk assessments changed outcomes in single patients and may have slightly increased the performance of predictions. But importantly, if “self-fulfilling-prophecy” has occurred it would have decreased overestimation of death, and the true overestimation of death would have been even higher. There exist many risk assessment tools to benchmark ICUs. Since most scores were not routinely assessed at our study centres, comparing and assessing many predictive scores was beyond the scope and goal of this study. We focussed on mechanically ventilated patients since they are at the highest risk for death. Therefore, we do not know if our findings can be translated to non-ventilated ICU patients. However, since the study population covered many disease entities and severities, several results could also apply to the ICU population not requiring mechanical ventilation. Finally, subjective risk estimates are driven by patient and physician factors, with variable relevance. Therefore, survival perceptions probably vary across nurses and physicians and very likely across countries. We do not know if clinicians discriminate survivors from non-survivors equally well in different hospitals and if the overestimation of death occurs globally. Therefore, our findings need to be validated in larger international cohorts, including patients and clinicians of multiple backgrounds.
To summarise, we report several findings on risk assessment in mechanically ventilated ICU patients. We reveal specialised predictors and more general predictors capturing more specific or broader pathophysiological mechanisms related to outcome. We assessed different groups of objective markers for predicting short- and long-term outcomes and showed that the performance in the development cohort declines in the validation cohort, with the best combinations not outperforming SAPS2. And finally, we demonstrate that all of the clinical staff overestimates death and propose to combine subjective and objective tools to identify misclassified patients.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.