# Explorative Data Analysis of the National Pandemic Cohort Network (NAPKON) Public Data Set¶

Welcome! The following statistics provide some visusal insights into NAPKON Public Data Set. The Public Data Set constitutes patient data from the NAPKON after a data cleaning process and includes data from patients documented until January 17, 2023. The NAPKON Public Data Set is originating from the NAPKON SecuTrial. The data anonymization pipeline is described by Jakob et al. in "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19". The public data is anonymized using our "data protection concept". The anonymization process was carried out with the "ARX software"

Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the NAPKON study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.

If you have any comments on the notebook, please drop us a message at support@napkon.de.

## Data Set Structure¶

Here we provide information on the basic structure of the NAPKON Public Data Set.

The data set consists of 3904 patients and 15 variables. A row represents anonymized data of a single patient.

The columns are described by the variables:

• age (categorial): age group
• gender (categorial): gender
• month_first_diagnosis (categorical): quartile of first confirmed diagnosis of COVID-19
• year_first_diagnosis (categorical): year of first confirmed diagnosis of COVID-19
• cohort (categorical): Selection of the respective cohort in which the patient was included
• mild_disease_phase (categorical): yes if patient was in the Mild Phase*
• moderate_disease_phase (categorical): yes if patient was in the Moderate Phase*
• severe_disease_phase (categorical): yes if patient was in the Severe Phase*
• patient_status_at_end_of_acute_phase (categorial): status of the patient at the end of acute phase of SARS-CoV-2
• hospitalization_during_acute_phase (categorial): patient hospitalized during acute phase of SARS-CoV-2
• intensive_care_treatment (categorial): patient with intensive care treatment during acute phase of SARS-CoV-2
• inv_ventilation (categorial): patient with invasive ventilation during acute phase of SARS-CoV-2
• availability_3month_followup (categorial): completion of the 3-month follow-up
• ability_to_work_3MFU (categorial): patient is able to work at the time of 3-month follow-up
• any_symptom_3MFU (categorial): patient has any symptoms at the 3-month follow-up

For further information regarding the variables please refer to: https://cloud.idcohorts.net/s/ELRzpzgBkKb5ejY and https://cloud.napkon.de/s/W2PQpmqzoRkjABL

*The Clinical Phases are defined according to the WHO clinical progression scale:

To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete NAPKON data set. Anonymization processes may lead to variables having less values than in the complete NAPKON data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.

age:
18 - 39 years, 40 - 59 years, 60 - 79 years, >= 80 years, nan

gender:
female, male

quarter_first_diagnosis:
Q1, Q2, Q3, Q4, unknown/missing

year_first_diagnosis:
2020, 2021, 2022, unknown/missing

cohort:
HAP, POP, SUEP

mild_phase:
no, yes

moderate_phase:
no, yes

severe_phase:
no, unknown/missing, yes

patient_status_end_acute_phase:

hospitalisation:
no, yes

intensive_care_treatment:
no, yes

inv_ventilation:
no, unknown/missing, yes

availability_3month_followup:
no/not yet, yes

ability_to_work_3MFU:
n/a, no, unknown/missing, yes

any_symptom_3MFU:
n/a, no, unknown/missing, yes


n/a: In cases where the patient was not in the respective phase a variable refers to, the variable has been given the value 'Not applicable (N/a)'. If for example a patient has never been in the completion of the 3-month follow-up, 'any symptoms at the 3-month follow-up' is a variable which is not applicable to this patient.

## 1. Descriptive Analysis¶

The following descriptive statistics are computed in this section:

• Cohort Distribution
• Date of diagnosis Distribution by Cohort
• Gender Distribution by Cohort
• Age Distribution by Cohort
• Age - Gender Distribution by Cohort

Number of patients from HAP is lower than from SUEP and POP. Therefore please pay attention to difference in scaling especially in HAP.

The total number of patients is 3904.
Only cases with complete documentation and Review A status are considered.

The number of patients for SUEP is 1387.
The number of patients for POP is 2280.
The number of patients for HAP is 237.

# 2. Patient status at the end of acute phase¶

The following descriptive statistics on the health status at the end of medical consultation are computed in this section:

• Frequency of Health Status at the End of Medical Consultation
• Hospitalisation in the acute phase
• Intensive care treatment in the acute phase
• Invasive ventilation in the acute phase
• COVID-19 Mortality and Recovery Rates
• COVID-19 Mortality vs Recovery by Age
• COVID-19 Mortality vs Recovery by Gender
• Crosstable Mortality vs Recovery vs Gender vs Age

Note that we will use a filtered data set for computing the rates, which we describe below.

## Frequency of Health Status at the End of Medical Consultation¶

SUEP POP HAP
discharged 1343 133 253
ambulant 309 2846 0
referral/transfer 293 0 20
unknown/missing 135 122 14
palliative care 0 0 0

## Hospitalisation in the acute phase¶

The number of patients for SUEP is 1387.
The number of patients for POP is 2280.
The number of patients for HAP is 237.

## Intensive care treatment in the acute phase¶

The number of patients for SUEP is 1387.
The number of patients for POP is 2280.
The number of patients for HAP is 237.

## Invasive ventilation in the acute phase¶

The number of patients for SUEP is 1387.
The number of patients for POP is 2280.
The number of patients for HAP is 237.

### For the remaining section 2 we proceed with a filtered data set.¶

For the COVID-19 mortality and recovery rate computations, we exclude patients with a documented health status at the end of medical consultation of 'unknown/missing'. Please note that this influences the following computations and plots.

The number of patients in the filtered data set for SUEP is 1323.
The number of patients in the filtered data set for POP is 2181.
The number of patients in the filtered data set for HAP is 228.

#### Frequency of Health Status at the End of Medical Consultation in the Filtered Data Set¶

SUEP POP HAP
recovery 1945 2979 273

#### COVID-19 Overall Mortality and Recovery Rate for Filtered Data Set:¶

SUEP
COVID-19 Overall Mortality Rate: 6.58 %
COVID-19 Overall Recovery Rate: 93.42 %

POP
All POP patients survived at least 6 to 12 months after first diagnosis.

HAP
COVID-19 Overall Mortality Rate: 14.91 %
COVID-19 Overall Recovery Rate: 85.09 %

#### COVID-19 Mortality/Recovery for Filtered Data Set by Age:¶

All POP patients survived at least 6 to 12 months after first diagnosis. Thus POP is not shown in the following graphic.

SUEP

Recovery rate Mortality rate
age
18 - 39 years 99.18 0.82
40 - 59 years 97.02 2.98
60 - 79 years 89.14 10.86
>= 80 years 89.86 10.14



HAP

Recovery rate Mortality rate
age
18 - 39 years 91.67 8.33
40 - 59 years 88.89 11.11
60 - 79 years 79.07 20.93
>= 80 years 0.00 0.00



#### COVID-19 Mortality/Recovery for Filtered Data Set by Gender:¶

All POP patients survived at least 6 to 12 months after first diagnosis. Thus POP is not shown in the following graphic.

SUEP

Recovery rate Mortality rate
gender
female 96.14 3.86
male 93.11 6.89

HAP

Recovery rate Mortality rate
gender
female 86.54 13.46
male 84.76 15.24

#### Crosstable Mortality vs Recovery vs Gender vs Age (percentage of patients in each of the subgroups)¶

SUEP

patient_status_end_acute_phase discharged ambulant referral/transfer dead All
gender age
female 18 - 39 years 5.10 3.72 0.51 0.05 9.38
40 - 59 years 10.19 2.91 1.43 0.15 14.68
60 - 79 years 8.66 0.51 2.24 1.22 12.64
>= 80 years 2.34 0.00 0.87 0.20 3.41
male 18 - 39 years 6.07 2.65 0.51 0.10 9.33
40 - 59 years 14.58 3.87 3.52 0.97 22.94
60 - 79 years 15.44 0.97 4.38 2.70 23.50
>= 80 years 2.29 0.10 1.17 0.56 4.13
All 64.68 14.73 14.63 5.96 100.00

HAP

gender age
female 40 - 59 years 9.03 0.93 0.93 10.90
60 - 79 years 3.74 0.31 1.25 5.30
male 40 - 59 years 37.07 2.80 5.30 45.17
60 - 79 years 25.55 2.18 7.17 34.89
18 - 39 years 3.43 0.00 0.31 3.74
All 78.82 6.23 14.95 100.00

## 3. Clinical Phases¶

From here on we will indicate the three clinical phases as

• Mild Phase
• Moderate Phase
• Severe Phase

In the following we will plot the:

• Frequency of Phases

Since there might be patients who have no phase documented at all we need to proceed with a filtered data set in which those patients are dropped.

The number of patients which contain at least mid, moderate or severe phase in this filtered data set is 3904.

• HAP: 237.
• POP: 2280.
• SUEP: 1387.

Maximum phase reached by the patients.

count
cohort max_phase
HAP moderate 199
severe 136
POP mild 2925
moderate 137
severe 39
SUEP mild 308
moderate 1473
severe 416