Welcome! The following statistics provide some visusal insights into NAPKON Public Data Set. The Public Data Set constitutes patient data from the NAPKON after a data cleaning process and includes data from patients documented until September 28, 2023. The NAPKON Public Data Set is originating from the NAPKON SecuTrial. The data anonymization pipeline is described by Jakob et al. in "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19". The public data is anonymized using our "data protection concept". The anonymization process was carried out with the "ARX software"
Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the NAPKON study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.
If you have any comments on the notebook, please drop us a message at support@napkon.de.
Here we provide information on the basic structure of the NAPKON Public Data Set.
The data set consists of 5988 patients and 16 variables. A row represents anonymized data of a single patient.
The columns are described by the variables:
For further information regarding the variables please refer to: https://cloud.idcohorts.net/s/ELRzpzgBkKb5ejY and https://cloud.napkon.de/s/W2PQpmqzoRkjABL
*The Clinical Phases are defined according to the WHO clinical progression scale:
To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete NAPKON data set. Anonymization processes may lead to variables having less values than in the complete NAPKON data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.
age: 18 - 39 years, 40 - 59 years, 60 - 79 years, >= 80 years gender: female, male quarter_first_diagnosis: Q1, Q2, Q3, Q4, unknown/missing year_first_diagnosis: 2020, 2021, 2022, 2023, unknown/missing cohort: HAP, POP, SUEP mild_phase: no, yes moderate_phase: no, yes severe_phase: no, unknown/missing, yes patient_status_end_acute_phase: ambulant, dead, discharged, referral/transfer, unknown/missing hospitalisation: no, yes intensive_care_treatment: no, yes inv_ventilation: no, unknown/missing, yes availability_3month_followup: no/not yet, yes ability_to_work_3MFU: n/a, no, unknown/missing, yes any_symptom_3MFU: n/a, no, unknown/missing, yes
n/a: In cases where the patient was not in the respective phase a variable refers to, the variable has been given the value 'Not applicable (N/a)'. If for example a patient has never been in the completion of the 3-month follow-up, 'any symptoms at the 3-month follow-up' is a variable which is not applicable to this patient.
The following descriptive statistics are computed in this section:
Number of patients from HAP is lower than from SUEP and POP. Therefore please pay attention to difference in scaling especially in HAP.
The total number of patients is 5988.
Only cases with complete documentation and Review A status are considered.
The number of patients for SUEP is 2164.
The number of patients for POP is 3509.
The number of patients for HAP is 315.