Introduction #
A bespoke EMPS: Electron Microscopy Particle Segmentation dataset was constructed to serve as the training data for the authors’ work. It consists of 465 electron micrographs and their corresponding human-labeled ground-truth semantic instance segmentation maps, as well as the coordinates of the polygons drawn around each particle to construct these segmentation maps.
The figure shows 16 sample images and their segmentation maps and portrays qualitatively the diversity of particle sizes, shapes, textures, densities, and (grayscale) colors that exist in the dataset. Although not relevant for computing quantitative measures, the authors included several images where particles overlap each other with varying degrees of overlap, as this is common in the electron micrographs of nanoparticles. The third EM image in the second row of the figure above is an example of highly overlapping particles, while the particles in the fourth EM image of the third-row show minor overlap.
All images in the EMPS data set were mined from published scientific literature using the data-mining application programming interface (API) of Elsevier. Authors first used the Article Retrieval API to obtain the digital object identifiers (DOIs) of articles published between the years 2015 and 2020, which had the possibility of containing EM images. This was achieved using the search query “SEM−TEM−scanning electron microscopy−transmission electron microscopy.” Next, using the Object Retrieval API, authors iterated through the figures in these articles and obtained images at high resolution from any figure that contained one or more of the acronyms or phrases from their search query. This resulted in 34 091 images of figures, from which 788 were manually determined as suitable and set aside for postprocessing. It was often the case that EM images were part of a panel of several images in these figures. Thus, the EM images were cropped from these 788 figures, resulting in 962 potential images being labeled (many figures contained several relevant EM images). Authors annotated 465 of these images using the VGG Image Annotator (VIA). This consisted of drawing polygons around each individual particle in each image. Once the annotation process was completed, the authors finally assigned pixels to particle instances in each image by finding all pixels that were encapsulated by the polygon of each particle.
The figure presents some statistics of the images and particles in the EMPS dataset. It is evident that most images contain fewer particles, and only a few images contain many particles (on the left). Similarly, with particle sizes, most particles in the dataset are small, with the number of large particles dropping significantly as particle size increases (on the right).
Summary #
EMPS: Electron Microscopy Particle Segmentation is a dataset for instance segmentation, semantic segmentation, and object detection tasks. It is used in the materials research.
The dataset consists of 465 images with 11594 labeled objects belonging to 1 single class (particle).
Images in the EMPS dataset have pixel-level instance segmentation annotations. Due to the nature of the instance segmentation task, it can be automatically transformed into a semantic segmentation (only one mask for every class) or object detection (bounding boxes for every object) tasks. All images are labeled (i.e. with annotations). There are no pre-defined train/val/test splits in the dataset. Also, the dataset contains doi tag. The dataset was released in 2021 by the University of Cambridge.
Explore #
EMPS dataset has 465 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.
Class balance #
There are 1 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.
Class ㅤ | Images ㅤ | Objects ㅤ | Count on image average | Area on image average |
---|---|---|---|---|
particle➔ mask | 465 | 11594 | 24.93 | 33.47% |
Images #
Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.
Class sizes #
The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.
Class | Object count | Avg area | Max area | Min area | Min height | Min height | Max height | Max height | Avg height | Avg height | Min width | Min width | Max width | Max width |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
particle mask | 11594 | 1.34% | 76.75% | 0% | 1px | 0.2% | 578px | 100% | 57px | 11.04% | 2px | 0.29% | 1022px | 100% |
Spatial Heatmap #
The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.
Objects #
Table contains all 11594 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.
Object ID ㅤ | Class ㅤ | Image name click row to open | Image size height x width | Height ㅤ | Height ㅤ | Width ㅤ | Width ㅤ | Area ㅤ |
---|---|---|---|---|---|---|---|---|
1➔ | particle mask | 22c776b059.png | 512 x 734 | 91px | 17.77% | 125px | 17.03% | 1.46% |
2➔ | particle mask | 22c776b059.png | 512 x 734 | 135px | 26.37% | 200px | 27.25% | 1.56% |
3➔ | particle mask | 22c776b059.png | 512 x 734 | 90px | 17.58% | 41px | 5.59% | 0.6% |
4➔ | particle mask | 22c776b059.png | 512 x 734 | 79px | 15.43% | 132px | 17.98% | 1.04% |
5➔ | particle mask | 22c776b059.png | 512 x 734 | 131px | 25.59% | 24px | 3.27% | 0.59% |
6➔ | particle mask | 22c776b059.png | 512 x 734 | 182px | 35.55% | 130px | 17.71% | 1.64% |
7➔ | particle mask | 22c776b059.png | 512 x 734 | 128px | 25% | 261px | 35.56% | 3.8% |
8➔ | particle mask | 22c776b059.png | 512 x 734 | 257px | 50.2% | 122px | 16.62% | 4.68% |
9➔ | particle mask | 22c776b059.png | 512 x 734 | 39px | 7.62% | 63px | 8.58% | 0.41% |
10➔ | particle mask | 22c776b059.png | 512 x 734 | 217px | 42.38% | 202px | 27.52% | 8.5% |
License #
Citation #
If you make use of the EMPS data, please cite the following reference:
@article{doi:10.1021/acs.jcim.0c01455,
author = {Yildirim, Batuhan and Cole, Jacqueline M.},
title = {Bayesian Particle Instance Segmentation for Electron Microscopy Image Quantification},
journal = {Journal of Chemical Information and Modeling},
volume = {61},
number = {3},
pages = {1136-1149},
year = {2021},
doi = {10.1021/acs.jcim.0c01455},
note ={PMID: 33682402},
URL = {https://doi.org/10.1021/acs.jcim.0c01455},
eprint = {https://doi.org/10.1021/acs.jcim.0c01455}
}
If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:
@misc{ visualization-tools-for-emps-dataset,
title = { Visualization Tools for EMPS Dataset },
type = { Computer Vision Tools },
author = { Dataset Ninja },
howpublished = { \url{ https://datasetninja.com/emps } },
url = { https://datasetninja.com/emps },
journal = { Dataset Ninja },
publisher = { Dataset Ninja },
year = { 2024 },
month = { sep },
note = { visited on 2024-09-15 },
}
Download #
Dataset EMPS can be downloaded in Supervisely format:
As an alternative, it can be downloaded with dataset-tools package:
pip install --upgrade dataset-tools
… using following python code:
import dataset_tools as dtools
dtools.download(dataset='EMPS', dst_dir='~/dataset-ninja/')
Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.
Disclaimer #
Our gal from the legal dep told us we need to post this:
Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.
You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.