Explorative Data Analysis of the National Pandemic Cohort Network (NAPKON) Public Data Set¶

Welcome! The following statistics provide some visusal insights into NAPKON Public Data Set. The Public Data Set constitutes patient data from the NAPKON after a data cleaning process and includes data from patients documented until September 28, 2023. The NAPKON Public Data Set is originating from the NAPKON SecuTrial. The data anonymization pipeline is described by Jakob et al. in "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19". The public data is anonymized using our "data protection concept". The anonymization process was carried out with the "ARX software"

Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the NAPKON study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.

If you have any comments on the notebook, please drop us a message at support@napkon.de.

Data Set Structure¶

Here we provide information on the basic structure of the NAPKON Public Data Set.

The data set consists of 5988 patients and 16 variables. A row represents anonymized data of a single patient.

The columns are described by the variables:

  • age (categorial): age group
  • gender (categorial): gender
  • month_first_diagnosis (categorical): quartile of first confirmed diagnosis of COVID-19
  • year_first_diagnosis (categorical): year of first confirmed diagnosis of COVID-19
  • cohort (categorical): Selection of the respective cohort in which the patient was included
  • mild_disease_phase (categorical): yes if patient was in the Mild Phase*
  • moderate_disease_phase (categorical): yes if patient was in the Moderate Phase*
  • severe_disease_phase (categorical): yes if patient was in the Severe Phase*
  • patient_status_at_end_of_acute_phase (categorial): status of the patient at the end of acute phase of SARS-CoV-2
  • hospitalization_during_acute_phase (categorial): patient hospitalized during acute phase of SARS-CoV-2
  • intensive_care_treatment (categorial): patient with intensive care treatment during acute phase of SARS-CoV-2
  • inv_ventilation (categorial): patient with invasive ventilation during acute phase of SARS-CoV-2
  • availability_3month_followup (categorial): completion of the 3-month follow-up
  • ability_to_work_3MFU (categorial): patient is able to work at the time of 3-month follow-up
  • any_symptom_3MFU (categorial): patient has any symptoms at the 3-month follow-up

For further information regarding the variables please refer to: https://cloud.idcohorts.net/s/ELRzpzgBkKb5ejY and https://cloud.napkon.de/s/W2PQpmqzoRkjABL

*The Clinical Phases are defined according to the WHO clinical progression scale:

41385_2021_464_Fig1_HTML.png

To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete NAPKON data set. Anonymization processes may lead to variables having less values than in the complete NAPKON data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.

age:
18 - 39 years, 40 - 59 years, 60 - 79 years, >= 80 years

gender:
female, male

quarter_first_diagnosis:
Q1, Q2, Q3, Q4, unknown/missing

year_first_diagnosis:
2020, 2021, 2022, 2023, unknown/missing

cohort:
HAP, POP, SUEP

mild_phase:
no, yes

moderate_phase:
no, yes

severe_phase:
no, unknown/missing, yes

patient_status_end_acute_phase:
ambulant, dead, discharged, referral/transfer, unknown/missing

hospitalisation:
no, yes

intensive_care_treatment:
no, yes

inv_ventilation:
no, unknown/missing, yes

availability_3month_followup:
no/not yet, yes

ability_to_work_3MFU:
n/a, no, unknown/missing, yes

any_symptom_3MFU:
n/a, no, unknown/missing, yes

n/a: In cases where the patient was not in the respective phase a variable refers to, the variable has been given the value 'Not applicable (N/a)'. If for example a patient has never been in the completion of the 3-month follow-up, 'any symptoms at the 3-month follow-up' is a variable which is not applicable to this patient.

1. Descriptive Analysis¶

The following descriptive statistics are computed in this section:

  • Cohort Distribution
  • Date of diagnosis Distribution by Cohort
  • Gender Distribution by Cohort
  • Age Distribution by Cohort
  • Age - Gender Distribution by Cohort

Number of patients from HAP is lower than from SUEP and POP. Therefore please pay attention to difference in scaling especially in HAP.

The total number of patients is 5988.
Only cases with complete documentation and Review A status are considered.

The number of patients for SUEP is 2164.
The number of patients for POP is 3509.
The number of patients for HAP is 315.

2. Patient status at the end of acute phase¶

The following descriptive statistics on the health status at the end of medical consultation are computed in this section:

  • Frequency of Health Status at the End of Medical Consultation
  • Hospitalisation in the acute phase
  • Intensive care treatment in the acute phase
  • Invasive ventilation in the acute phase
  • COVID-19 Mortality and Recovery Rates
  • COVID-19 Mortality vs Recovery by Age
  • COVID-19 Mortality vs Recovery by Gender
  • Crosstable Mortality vs Recovery vs Gender vs Age

Note that we will use a filtered data set for computing the rates, which we describe below.

Frequency of Health Status at the End of Medical Consultation¶

SUEP POP HAP
discharged 1316 136 239
ambulant 333 3217 0
referral/transfer 296 0 18
dead 124 0 46
unknown/missing 95 156 12
palliative care 0 0 0

Hospitalisation in the acute phase¶

The number of patients for SUEP is 2164.
The number of patients for POP is 3509.
The number of patients for HAP is 315.

Intensive care treatment in the acute phase¶

The number of patients for SUEP is 2164.
The number of patients for POP is 3509.
The number of patients for HAP is 315.

Invasive ventilation in the acute phase¶

The number of patients for SUEP is 2164.
The number of patients for POP is 3509.
The number of patients for HAP is 315.

For the remaining section 2 we proceed with a filtered data set.¶

For the COVID-19 mortality and recovery rate computations, we exclude patients with a documented health status at the end of medical consultation of 'unknown/missing'. Please note that this influences the following computations and plots.

The number of patients in the filtered data set for SUEP is 2069.
The number of patients in the filtered data set for POP is 3353.
The number of patients in the filtered data set for HAP is 303.

Frequency of Health Status at the End of Medical Consultation in the Filtered Data Set¶

SUEP POP HAP
recovery 1945 3353 257
dead 124 0 46

COVID-19 Overall Mortality and Recovery Rate for Filtered Data Set:¶

SUEP
COVID-19 Overall Mortality Rate: 5.99 %
COVID-19 Overall Recovery Rate: 94.01 %

POP
All POP patients survived at least 6 to 12 months after first diagnosis.

HAP
COVID-19 Overall Mortality Rate: 15.18 %
COVID-19 Overall Recovery Rate: 84.82 %

COVID-19 Mortality/Recovery for Filtered Data Set by Age:¶

All POP patients survived at least 6 to 12 months after first diagnosis. Thus POP is not shown in the following graphic.

SUEP

Recovery rate Mortality rate
age
18 - 39 years 98.97 1.03
40 - 59 years 97.13 2.87
60 - 79 years 89.38 10.62
>= 80 years 88.82 11.18

HAP

Recovery rate Mortality rate
age
18 - 39 years 91.67 8.33
40 - 59 years 89.35 10.65
60 - 79 years 77.87 22.13
>= 80 years 0.00 0.00

COVID-19 Mortality/Recovery for Filtered Data Set by Gender:¶

All POP patients survived at least 6 to 12 months after first diagnosis. Thus POP is not shown in the following graphic.

SUEP

Recovery rate Mortality rate
gender
female 95.76 4.24
male 92.85 7.15

HAP

Recovery rate Mortality rate
gender
female 85.42 14.58
male 84.71 15.29

Crosstable Mortality vs Recovery vs Gender vs Age (percentage of patients in each of the subgroups)¶

SUEP

patient_status_end_acute_phase discharged ambulant referral/transfer dead All
gender age
female 18 - 39 years 4.93 3.96 0.43 0.05 9.38
40 - 59 years 9.71 3.24 1.45 0.14 14.55
60 - 79 years 8.60 0.58 2.27 1.30 12.76
>= 80 years 2.17 0.00 0.82 0.19 3.19
male 18 - 39 years 5.94 2.80 0.48 0.14 9.38
40 - 59 years 14.02 4.25 3.29 0.92 22.47
60 - 79 years 15.95 1.16 4.40 2.61 24.12
>= 80 years 2.27 0.10 1.16 0.63 4.16
All 63.61 16.09 14.31 5.99 100.00

HAP

patient_status_end_acute_phase discharged referral/transfer dead All
gender age
female 40 - 59 years 9.24 0.99 0.99 11.22
60 - 79 years 2.97 0.33 1.32 4.62
male 40 - 59 years 37.29 2.31 4.95 44.55
60 - 79 years 25.74 2.31 7.59 35.64
18 - 39 years 3.63 0.00 0.33 3.96
All 78.88 5.94 15.18 100.00

3. Clinical Phases¶

From here on we will indicate the three clinical phases as

  • Mild Phase
  • Moderate Phase
  • Severe Phase

In the following we will plot the:

  • Frequency of Phases

Since there might be patients who have no phase documented at all we need to proceed with a filtered data set in which those patients are dropped.

The number of patients which contain at least mid, moderate or severe phase in this filtered data set is 5988.

  • HAP: 315.
  • POP: 3509.
  • SUEP: 2164.

Maximum phase reached by the patients.

count
cohort max_phase
HAP moderate 189
severe 126
POP mild 3328
moderate 140
severe 41
SUEP mild 333
moderate 1415
severe 416

4. 3 Months Follow Up¶

In the following we will plot the:

  • Number of 3MFU patients
  • Ability to work - 3MFU
  • Any Symptom - 3MFU

! All POP patients have the first visit at least 6 to 12 months after first diagnosis.