EMPS - Dataset Ninja

Introduction #

Released 2021-03-08 ·Batuhan Yildirim, Jacqueline M. Cole

A bespoke EMPS: Electron Microscopy Particle Segmentation dataset was constructed to serve as the training data for the authors’ work. It consists of 465 electron micrographs and their corresponding human-labeled ground-truth semantic instance segmentation maps, as well as the coordinates of the polygons drawn around each particle to construct these segmentation maps.

Fig

The figure shows 16 sample images and their segmentation maps and portrays qualitatively the diversity of particle sizes, shapes, textures, densities, and (grayscale) colors that exist in the dataset. Although not relevant for computing quantitative measures, the authors included several images where particles overlap each other with varying degrees of overlap, as this is common in the electron micrographs of nanoparticles. The third EM image in the second row of the figure above is an example of highly overlapping particles, while the particles in the fourth EM image of the third-row show minor overlap.

All images in the EMPS data set were mined from published scientific literature using the data-mining application programming interface (API) of Elsevier. Authors first used the Article Retrieval API to obtain the digital object identifiers (DOIs) of articles published between the years 2015 and 2020, which had the possibility of containing EM images. This was achieved using the search query “SEM−TEM−scanning electron microscopy−transmission electron microscopy.” Next, using the Object Retrieval API, authors iterated through the figures in these articles and obtained images at high resolution from any figure that contained one or more of the acronyms or phrases from their search query. This resulted in 34 091 images of figures, from which 788 were manually determined as suitable and set aside for postprocessing. It was often the case that EM images were part of a panel of several images in these figures. Thus, the EM images were cropped from these 788 figures, resulting in 962 potential images being labeled (many figures contained several relevant EM images). Authors annotated 465 of these images using the VGG Image Annotator (VIA). This consisted of drawing polygons around each individual particle in each image. Once the annotation process was completed, the authors finally assigned pixels to particle instances in each image by finding all pixels that were encapsulated by the polygon of each particle.

Fig2

The figure presents some statistics of the images and particles in the EMPS dataset. It is evident that most images contain fewer particles, and only a few images contain many particles (on the left). Similarly, with particle sizes, most particles in the dataset are small, with the number of large particles dropping significantly as particle size increases (on the right).

Expand

Homepage

Research Paper

GitHub

Summary #

EMPS: Electron Microscopy Particle Segmentation is a dataset for instance segmentation, semantic segmentation, and object detection tasks. It is used in the materials research.

The dataset consists of 465 images with 11594 labeled objects belonging to 1 single class (particle).

Images in the EMPS dataset have pixel-level instance segmentation annotations. Due to the nature of the instance segmentation task, it can be automatically transformed into a semantic segmentation (only one mask for every class) or object detection (bounding boxes for every object) tasks. All images are labeled (i.e. with annotations). There are no pre-defined train/val/test splits in the dataset. Also, the dataset contains doi tag. The dataset was released in 2021 by the University of Cambridge.

Explore #

EMPS dataset has 465 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.

👀

Have a look at 465 images

View images along with annotations and tags, search and filter by various parameters

Class balance #

There are 1 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.

Rows 1-1 of 1

Class ㅤ	Images ㅤ	Objects ㅤ	Count on image average	Area on image average
particle➔ mask	465	11594	24.93	33.47%

Images #

Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.

Class sizes #

The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.

Rows 1-1 of 1

Class	Object count	Avg area	Max area	Min area	Min height	Min height	Max height	Max height	Avg height	Avg height	Min width	Min width	Max width	Max width
particle mask	11594	1.34%	76.75%	0%	1px	0.2%	578px	100%	57px	11.04%	2px	0.29%	1022px	100%

Spatial Heatmap #

The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.

Objects #

Table contains all 11594 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.

Rows 1-10 of 11594

Object ID ㅤ	Class ㅤ	Image name click row to open	Image size height x width	Height ㅤ	Height ㅤ	Width ㅤ	Width ㅤ	Area ㅤ
1➔	particle mask	22c776b059.png	512 x 734	91px	17.77%	125px	17.03%	1.46%
2➔	particle mask	22c776b059.png	512 x 734	135px	26.37%	200px	27.25%	1.56%
3➔	particle mask	22c776b059.png	512 x 734	90px	17.58%	41px	5.59%	0.6%
4➔	particle mask	22c776b059.png	512 x 734	79px	15.43%	132px	17.98%	1.04%
5➔	particle mask	22c776b059.png	512 x 734	131px	25.59%	24px	3.27%	0.59%
6➔	particle mask	22c776b059.png	512 x 734	182px	35.55%	130px	17.71%	1.64%
7➔	particle mask	22c776b059.png	512 x 734	128px	25%	261px	35.56%	3.8%
8➔	particle mask	22c776b059.png	512 x 734	257px	50.2%	122px	16.62%	4.68%
9➔	particle mask	22c776b059.png	512 x 734	39px	7.62%	63px	8.58%	0.41%
10➔	particle mask	22c776b059.png	512 x 734	217px	42.38%	202px	27.52%	8.5%

License #

EMPS: Electron Microscopy Particle Segmentation is under CC BY 4.0 license.

Source

Citation #

If you make use of the EMPS data, please cite the following reference:

@article{doi:10.1021/acs.jcim.0c01455,
  author = {Yildirim, Batuhan and Cole, Jacqueline M.},
  title = {Bayesian Particle Instance Segmentation for Electron Microscopy Image Quantification},
  journal = {Journal of Chemical Information and Modeling},
  volume = {61},
  number = {3},
  pages = {1136-1149},
  year = {2021},
  doi = {10.1021/acs.jcim.0c01455},
  note ={PMID: 33682402},
  URL = {https://doi.org/10.1021/acs.jcim.0c01455},
  eprint = {https://doi.org/10.1021/acs.jcim.0c01455}
}

Source

If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:

@misc{ visualization-tools-for-emps-dataset,
  title = { Visualization Tools for EMPS Dataset },
  type = { Computer Vision Tools },
  author = { Dataset Ninja },
  howpublished = { \url{ https://datasetninja.com/emps } },
  url = { https://datasetninja.com/emps },
  journal = { Dataset Ninja },
  publisher = { Dataset Ninja },
  year = { 2026 },
  month = { mar },
  note = { visited on 2026-03-17 },
}

Download #

Dataset EMPS can be downloaded in Supervisely format:

As an alternative, it can be downloaded with dataset-tools package:

pip install --upgrade dataset-tools

… using following python code:

import dataset_tools as dtools

dtools.download(dataset='EMPS', dst_dir='~/dataset-ninja/')

Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.

. . .

Disclaimer #

Our gal from the legal dep told us we need to post this:

Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.

You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.