The HiRID database contains a large selection of all routinely collected data relating to patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU). The data was extracted from the ICU Patient Data Management System which is used to prospectively register patient health information, measurements of organ function parameters, results of laboratory tests and treatment parameters from ICU admission to discharge.
- Demographic data
- Measurements from bedside monitoring
- Measurements and settings of medical devices such as mechanical ventilation
- Observations by health care providers e.g.: GCS, RASS, urine and other fluid output
- Lab values
- Administered drugs, fluids and nutrition
HiRID has a higher time resolution than other published datasets, most importantly for bedside monitoring with most parameters recorded every 2 minutes.
To ensure the anonymization of individuals in the data set, we followed the procedures successfully applied for the MIMIC-III and AmsterdamUMCdb dataset which in turn adopted the Health Insurance Portability and Accountability Act (HIPAA) and in the case of AmsterdamUMCdb also the European Union's General Data Protection Regulation (GDPR) standards.
- Removal of all eighteen identifying data elements listed in HIPAA
- Dates were shifted by a random offset such that the admission date lies between 2100 and 2200. We made sure to preserve the seasonality, time of day and the day of week.
- Patient age, height and weight are binned into bins of size 5. For patient age, the max bin is 90 years and contains also all older patients.
- Measurements and medications with changing units over time were standardized to the latest unit used. This standardization was necessary to make a conclusion about estimated admission times, based on the units used in a specific patient, impossible.
- Free text was removed from the database
- k-anonymization was applied on patient age, weight, height and sex.
The data is published as original
source data as well as in two
pre-processed states. The source data contains all raw variables, whereas the pre-processed data contains only a small subset of aggregated variables. Further information can be found here:
Patient ID / ICU admission
The dataset treats each ICU admission uniquely and it is not possible to identify multiple ICU admissions as originating from the same patient. For each ICU (re-)admission a unique "Patient ID" is generated.
We store an Apache II or IV group for most stays. This table can be used to look up the encoding.
Variables in raw data
We provide a list of all variables as csv files included in the downloadable dataset, as well as an always updated google spread sheet:
Additionally to the variables in these files, the dataset contains age at admission and sex of each patient in the general table of the dataset.
|Source variable ids
Systolic BP (invasive)
Diastolic BP (invasive)
peak inspiratory pressure (ventilator)
20005110, 24000523, 24000585
1000706, 1000707,1000698, 1000267
275, 1000471, 1000472, 1000473, 1000489, 1000490, 1000683, 1000900, 225, 1000605, 1000632, 1000858
binary indication of drug presence [yes/no]