Introduction #
The COVID-19 dataset is comprised of 3616 positive COVID-19 CXR images, which are collected from different publicly available datasets, online sources, and published articles. A team of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh along with their collaborators from Pakistan and Malaysia in collaboration with medical doctors have created a database of chest X-ray images for COVID-19 positive cases along with normal and viral_pneumonia images. This COVID-19, normal, and other lung infection dataset is released in stages.
In the first release, authors have released 219 COVID-19, 1341 normal, and 1345 viral pneumonia chest X-ray (CXR) images. In the first update, authors have increased the COVID-19 class to 1200 CXR images. In the 2nd update, they have increased the database to 3616 COVID-19-positive cases along with 10,192 Normal, 6012 Lung Opacity (Non-COVID lung infection), and 1345 Viral Pneumonia images and corresponding lung masks.
Out of 3616 X-ray images, 2473 images are collected from the BIMCV-COVID19+ dataset, 183 images from a German medical school, 559 X-ray images are from the Italian Society of Medical Radiology (SIRM), GitHub, Kaggle & Twitter, and 400 X-ray images from another COVID-19 CXR repository. BIMCV-COVID19+dataset is the single largest public dataset with 2473 CXR images of COVID-19 patients acquired from digital X-ray (DX) and computerized X-ray (CX) machines. The major difference between the non-COVID and COVID categories is the lung opacity in the CXR images due to other lung-related diseases and COVID-19, respectively.
Summary #
COVID-19 Radiography is a dataset for semantic segmentation and classification tasks. It is used in the medical industry, and in the medical research.
The dataset consists of 21165 images with 21165 labeled objects belonging to 1 single class (lungs).
Images in the COVID-19 dataset have pixel-level semantic segmentation annotations. All images are labeled (i.e. with annotations). There are no pre-defined train/val/test splits in the dataset. Alternatively, the dataset could be split into 4 classification image sets: normal (10192 images), lung_opacity (6012 images), covid (3616 images), and viral_pneumonia (1345 images). The dataset was released in 2021 by the Qatar University, University of Dhaka, Hamad General Hospital, North South University, Bangabandhu Sheikh Mujib Medical University, University of Engineering and Technology, and Universiti Kebangsaan Malaysia.
Explore #
COVID-19 dataset has 21165 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.
Class balance #
There are 1 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.
Class ㅤ | Images ㅤ | Objects ㅤ | Count on image average | Area on image average |
---|---|---|---|---|
lungs➔ mask | 21165 | 21165 | 1 | 23.92% |
Images #
Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.
Object distribution #
Interactive heatmap chart for every class with object distribution shows how many images are in the dataset with a certain number of objects of a specific class. Users can click cell and see the list of all corresponding images.
Class sizes #
The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.
Class | Object count | Avg area | Max area | Min area | Min height | Min height | Max height | Max height | Avg height | Avg height | Min width | Min width | Max width | Max width |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lungs mask | 21165 | 23.92% | 57.95% | 2.91% | 53px | 17.73% | 299px | 100% | 204px | 68.29% | 63px | 21.07% | 299px | 100% |
Spatial Heatmap #
The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.
Objects #
Table contains all 21165 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.
Object ID ㅤ | Class ㅤ | Image name click row to open | Image size height x width | Height ㅤ | Height ㅤ | Width ㅤ | Width ㅤ | Area ㅤ |
---|---|---|---|---|---|---|---|---|
1➔ | lungs mask | Normal-2717.png | 299 x 299 | 204px | 68.23% | 200px | 66.89% | 21.55% |
2➔ | lungs mask | COVID-2811.png | 299 x 299 | 224px | 74.92% | 249px | 83.28% | 31.05% |
3➔ | lungs mask | Normal-4816.png | 299 x 299 | 173px | 57.86% | 186px | 62.21% | 15.82% |
4➔ | lungs mask | Normal-2173.png | 299 x 299 | 195px | 65.22% | 215px | 71.91% | 22.1% |
5➔ | lungs mask | Normal-9097.png | 299 x 299 | 173px | 57.86% | 201px | 67.22% | 20.24% |
6➔ | lungs mask | COVID-418.png | 299 x 299 | 222px | 74.25% | 208px | 69.57% | 25.86% |
7➔ | lungs mask | Lung_Opacity-1467.png | 299 x 299 | 206px | 68.9% | 235px | 78.6% | 22.13% |
8➔ | lungs mask | Normal-7501.png | 299 x 299 | 173px | 57.86% | 237px | 79.26% | 21.58% |
9➔ | lungs mask | Lung_Opacity-1599.png | 299 x 299 | 225px | 75.25% | 207px | 69.23% | 22.09% |
10➔ | lungs mask | Normal-9077.png | 299 x 299 | 157px | 52.51% | 207px | 69.23% | 19.46% |
License #
License is unknown for the COVID-19 Radiography dataset.
Citation #
If you make use of the COVID-19 data, please cite the following reference:
- M.E.H. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M.A. Kadir, Z.B. Mahbub, K.R.
Islam, M.S. Khan, A. Iqbal, N. Al-Emadi, M.B.I. Reaz, M. T. Islam, “Can AI help in
screening Viral and COVID-19 pneumonia?” IEEE Access, Vol. 8, 2020, pp. 132665 - 132676.
- Rahman, T., Khandakar, A., Qiblawey, Y., Tahir, A., Kiranyaz, S., Kashem, S.B.A.,
Islam, M.T., Maadeed, S.A., Zughaier, S.M., Khan, M.S. and Chowdhury, M.E., 2020.
Exploring the Effect of Image Enhancement Techniques on COVID-19 Detection using Chest
X-ray Images.
If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:
@misc{ visualization-tools-for-covid-dataset,
title = { Visualization Tools for COVID-19 Dataset },
type = { Computer Vision Tools },
author = { Dataset Ninja },
howpublished = { \url{ https://datasetninja.com/covid-19 } },
url = { https://datasetninja.com/covid-19 },
journal = { Dataset Ninja },
publisher = { Dataset Ninja },
year = { 2024 },
month = { oct },
note = { visited on 2024-10-15 },
}
Download #
Please visit dataset homepage to download the data.
Disclaimer #
Our gal from the legal dep told us we need to post this:
Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.
You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.