Dataset Ninja LogoDataset Ninja:

MVTec D2S Dataset

53114602767
Tagretail
Taskinstance segmentation
Release YearMade in 2018
LicenseCC BY-NC-SA 4.0
Download11 GB

Introduction #

Released 2018-04-23 ·Patrick Follmann, Tobias Böttger, Philipp Härtingeret al.

The authors of the dataset introduced the MVTec D2S: MVTec Densely Segmented Supermarket Dataset as a novel benchmark for instance-aware semantic segmentation in an industrial domain. The benchmark was meticulously designed to mimic real-world settings of automatic checkout, inventory, or warehouse systems. In the training images, only objects of a single class appeared against a homogeneous background, while the validation and test sets were more complex and diverse. The scenes were captured under different lighting conditions, rotations, and backgrounds to evaluate the robustness of instance segmentation methods.

Care was taken to ensure unambiguous labeling of every instance, resulting in pixel-precise annotations. The comprehensive labeling allowed for the use of individual instance crops for artificial data augmentation. The dataset presented various challenges relevant to the field, including limited training data and high diversity in the validation and test sets.

The instance segmentation dataset offered high-resolution images representing a real-world, industrial environment. Annotations for 60 different object categories were obtained through meticulous labeling, ensuring high-quality annotations. The authors emphasized that the training set’s high-quality region annotations could be effectively used for artificial data augmentation, leading to a significant improvement in average precision (AP) on the test set.

image

The dataset was designed to realistically represent applications such as automatic checkout, inventory, or warehouse systems. In these scenarios, isolated products on a conveyor belt were identified, but external influences and partly occluded objects posed challenges. The dataset’s object categories included common everyday products like fruits, vegetables, cereal packets, pasta, and bottles, organized in a class hierarchy tree. The dataset provided scenes captured from various angles and under different lighting settings to evaluate the robustness of instance segmentation methods.

split all train val test
scenes 700 146 120 434
images 21000 4380 3600 13020
objects 72447 6900 15654 49893
objects/image 3.45 1.58 4.35 3.83
scenes w. occlusion 393 10 84 299
scenes w. clutter 86 0 18 68
rotations
lighting variation
background variation
clutter

To meet the mentioned industrial requirements, the training scenes are selected to be as simple as possible: They have a homogeneous background, mostly contain only one object and the amount of occlusions is reduced to a minimum.

train split properties:

  • contain only objects of one category
  • provide new views of an object
  • only contain objects with no or marginal overlap
  • have no clutter and a homogeneous background

The remaining scenes are split between the validation and the test set. They consist of scenes with:

  • single or multiple objects of different classes
  • touching or objects with occlusion
  • clutter objects
  • random background

The dataset also included ablation study which is performed to evaluate the importance of different variations including all rotations and lightings for one placement of objects in the scene, and the ability of methods to learn invariance with respect to rotations and illumination. For this purpose, the authors created three subsets of the full training set train. The train rot0 set contains all three lightings, but only the first rotation of each scene. The train light0 set contains only the default lighting, but all ten rotations of each scene. The train rot0_light0 set contains only the default lighting and the first rotation for each scene.

The data collection utilized a high-resolution industrial color camera with 1920 × 1440 pixels mounted above a turntable. The camera’s intentional off-centered mounting introduced more variations in the rotated images. The scenes were rotated ten times in increments of 36 degrees, ensuring precise rotation angles. To assess the robustness to illumination changes and reflection, each scene and rotation were captured under three different lighting settings using an LED ring light attached to the camera, spanning a wide spectrum of possible lightings.

Note, that this is a version 1.1 of D2S annotations - with new subsets containing specific difficulties (not mentioned in paper), such as occlusion, clutter or random background.

ExpandExpand
Dataset LinkHomepageDataset LinkResearch Paper

Summary #

MVTec D2S: MVTec Densely Segmented Supermarket Dataset is a dataset for instance segmentation, semantic segmentation, and object detection tasks. It is used in the retail industry.

The dataset consists of 53114 images with 258748 labeled objects belonging to 60 different classes including dr_oetker_vitalis_knuspermuesli_klassisch, clementine_single, salad_iceberg, and other: pasta_reggia_fusilli, gepa_italienischer_bio_espresso, adelholzener_gourmet_mineralwasser_02, cocoba_cocoa, douwe_egberts_professional_ground_coffee, zucchini, vine_tomatoes, franken_tafelreiniger, cafe_wunderbar_espresso, corny_schoko_banane_single, clementine, corny_nussvoll, kilimanjaro_tea_earl_grey, augustiner_weissbier_05, apple_braeburn_bundle, corny_nussvoll_single, gepa_bio_caffe_crema, rocket, pasta_reggia_spaghetti, koelln_muesli_schoko, coca_cola_light_05, gepa_bio_und_fair_kraeuterteemischung, koelln_muesli_fruechte, caona_cocoa, kiwi, and 32 more.

Images in the MVTec D2S dataset have pixel-level instance segmentation and bounding box annotations. Due to the nature of the instance segmentation task, it can be automatically transformed into a semantic segmentation task (only one mask for every class). There are 28740 (54% of the total) unlabeled images (i.e. without annotations). There are 4 splits in the dataset: test (28740 images), augmented (10000 images), validation (7950 images), and training (6424 images). Alternatively, the dataset could be split into 3 train ablation study sets: light0 (1460 images), rot0 (438 images), and rot0_light0 (146 images), or into 6 test specific difficulties: info (13020 images), info_random_background (4650 images), info_occlusion (3930 images), info_random_background_wo_clutter (2640 images), info_wo_occlusion (2490 images), and info_clutter (2010 images), or into 5 validation specific difficulties: random_background (1350 images), occlusion (990 images), random_background_wo_clutter (810 images), wo_occlusion (660 images), and clutter (540 images). The dataset was released in 2018 by the MVTec Software GmbH, Germany and Technical University of Munich.

Dataset Poster

Explore #

MVTec D2S dataset has 53114 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.

OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
OpenSample annotation mask from MVTec D2SSample image from MVTec D2S
👀
Have a look at 53114 images
View images along with annotations and tags, search and filter by various parameters

Class balance #

There are 60 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.

Search
Rows 1-10 of 60
Class
Images
Objects
Count on image
average
Area on image
average
dr_oetker_vitalis_knuspermuesli_klassisch
any
1167
2658
2.28
13.63%
salad_iceberg
any
1136
2456
2.16
11.29%
clementine_single
any
1134
2499
2.2
0.86%
pasta_reggia_fusilli
any
1128
2545
2.26
16.39%
gepa_italienischer_bio_espresso
any
1104
2484
2.25
9.63%
adelholzener_gourmet_mineralwasser_02
any
1086
2535
2.33
3.92%
vine_tomatoes
any
1078
2485
2.31
18.82%
corny_nussvoll
any
1070
2312
2.16
8.41%
cocoba_cocoa
any
1069
2525
2.36
8.91%
cafe_wunderbar_espresso
any
1053
2377
2.26
21.23%

Images #

Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.

Object distribution #

Interactive heatmap chart for every class with object distribution shows how many images are in the dataset with a certain number of objects of a specific class. Users can click cell and see the list of all corresponding images.

Class sizes #

The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.

Search
Rows 1-10 of 60
Class
Object count
Avg area
Max area
Min area
Min height
Min height
Max height
Max height
Avg height
Avg height
Min width
Min width
Max width
Max width
dr_oetker_vitalis_knuspermuesli_klassisch
any
5405
9.24%
22.4%
0%
2px
0.14%
819px
56.88%
539px
37.43%
2px
0.1%
821px
42.76%
corny_schoko_banane_single
any
5289
2.24%
6.2%
0%
2px
0.14%
472px
32.78%
294px
20.41%
2px
0.1%
472px
24.58%
cocoba_cocoa
any
5075
5.9%
17.5%
0%
2px
0.14%
719px
49.93%
440px
30.52%
2px
0.1%
748px
38.96%
adelholzener_gourmet_mineralwasser_02
any
5048
2.6%
10.37%
0%
2px
0.14%
661px
45.9%
291px
20.2%
2px
0.1%
661px
34.43%
pasta_reggia_fusilli
any
4965
11.07%
31.11%
0%
2px
0.14%
950px
65.97%
587px
40.76%
2px
0.1%
951px
49.53%
gepa_italienischer_bio_espresso
any
4958
6.65%
17.44%
0%
2px
0.14%
799px
55.49%
472px
32.77%
2px
0.1%
799px
41.61%
clementine_single
any
4926
0.68%
2.02%
0%
8px
0.56%
237px
16.46%
143px
9.95%
6px
0.31%
238px
12.4%
douwe_egberts_professional_ground_coffee
any
4909
6.12%
13.02%
0%
2px
0.14%
605px
42.01%
441px
30.63%
2px
0.1%
606px
31.56%
augustiner_weissbier_05
any
4889
4.31%
12.61%
0%
2px
0.14%
743px
51.6%
383px
26.61%
2px
0.1%
738px
38.44%
salad_iceberg
any
4850
8.79%
16.07%
0%
2px
0.14%
712px
49.44%
514px
35.72%
2px
0.1%
744px
38.75%

Spatial Heatmap #

The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.

Spatial Heatmap

Objects #

Table contains all 258748 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.

Search
Rows 1-10 of 258748
Object ID
Class
Image name
click row to open
Image size
height x width
Height
Height
Width
Width
Area
1
zucchini
any
D2S_057307_D2S_validation_random_background.jpg
1440 x 1920
289px
20.07%
568px
29.58%
2.48%
2
zucchini
any
D2S_057307_D2S_validation_random_background.jpg
1440 x 1920
290px
20.14%
569px
29.64%
5.97%
3
vine_tomatoes
any
D2S_057307_D2S_validation_random_background.jpg
1440 x 1920
830px
57.64%
1045px
54.43%
15.07%
4
vine_tomatoes
any
D2S_057307_D2S_validation_random_background.jpg
1440 x 1920
831px
57.71%
1046px
54.48%
31.44%
5
gepa_bio_caffe_crema
any
D2S_039728_D2S_validation_occlusion.jpg
1440 x 1920
975px
67.71%
876px
45.62%
15.62%
6
gepa_bio_caffe_crema
any
D2S_039728_D2S_validation_occlusion.jpg
1440 x 1920
976px
67.78%
877px
45.68%
30.96%
7
avocado
any
D2S_039728_D2S_validation_occlusion.jpg
1440 x 1920
234px
16.25%
265px
13.8%
1.51%
8
avocado
any
D2S_039728_D2S_validation_occlusion.jpg
1440 x 1920
235px
16.32%
266px
13.85%
2.26%
9
salad_iceberg
any
D2S_055321_D2S_validation.jpg
1440 x 1920
601px
41.74%
561px
29.22%
9.04%
10
salad_iceberg
any
D2S_055321_D2S_validation.jpg
1440 x 1920
602px
41.81%
562px
29.27%
12.24%

License #

MVTec D2S: MVTec D2S Densely Segmented Supermarket Dataset is under CC BY-NC-SA 4.0 license.

Source

Citation #

If you make use of the MVTEC D2S data, please cite the following reference:

Patrick Follmann, Tobias Böttger, Philipp Härtinger, Rebecca König, Markus Ulrich;
MVTec D2S: Densely Segmented Supermarket Dataset;
European Conference on Computer Vision (ECCV), 569-585, 2018

Source

If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:

@misc{ visualization-tools-for-mvtec-d2s-dataset,
  title = { Visualization Tools for MVTec D2S Dataset },
  type = { Computer Vision Tools },
  author = { Dataset Ninja },
  howpublished = { \url{ https://datasetninja.com/mvtec-d2s } },
  url = { https://datasetninja.com/mvtec-d2s },
  journal = { Dataset Ninja },
  publisher = { Dataset Ninja },
  year = { 2024 },
  month = { dec },
  note = { visited on 2024-12-07 },
}

Download #

Dataset MVTec D2S can be downloaded in Supervisely format:

As an alternative, it can be downloaded with dataset-tools package:

pip install --upgrade dataset-tools

… using following python code:

import dataset_tools as dtools

dtools.download(dataset='MVTec D2S', dst_dir='~/dataset-ninja/')

Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.

The data in original format can be downloaded here.

. . .

Disclaimer #

Our gal from the legal dep told us we need to post this:

Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.

You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.