Dataset Ninja LogoDataset Ninja:

dacl10k Dataset

992019699
Tagconstruction
Taskinstance segmentation
Release YearMade in 2023
LicenseCC BY-NC 4.0
Download5 GB

Introduction #

Released 2023-11-21 ·Johannes Flotzinger, Philipp J. Rosch, Thomas Braml

The dacl10k: Benchmark for Semantic Bridge Damage Segmentation is the first large-scale dataset for semantic bridge damage segmentation. It includes images collected during concrete bridge inspections acquired from databases at authorities and engineering offices, thus, it represents real-world scenarios. Concrete bridges represent the most common building type, besides steel, steel composite, and wooden bridges.

Civil engineering structures such as power plants, sewers, and bridges form essential components of the public infrastructure. It is mandatory to keep these structures in a safe and operational state. In order to ensure this, the authors of the dataset frequently inspected where the current recognition and documentation of defects and building components is mostly carried out manually.

A failure of individual structures results in enormous costs. For example, the economic costs caused by the closure of a bridge due to congestion is many times the cost of the bridge itself and its maintenance. The authors primary objective was to develop a dataset that enables the training of models which later support the inspector during damage recognition and documentation to a maximum.

Dataset acquisition

Approximately one half of the images originate from databases of engineering offices, while the other half was provided by local authorities from Germany. The images were taken between 2000 and 2020. Both data sources supplied highly heterogeneous images regarding camera type, pose, lighting condition, and resolution.

Dataset description

The authors recognize 13 frequently occurring defects on reinforced concrete bridges (e.g., crack, spalling, efflorescence) and 6 important building parts (e.g., exposed reinforcement bar exposed rebar, bearing, expansion joint, protective equipment). All these classes play an important role for determining the building’s structural integrity, traffic safety and durability. Dacl10k includes 9,920 images from more than 100 different bridges, specifically designed for practical use, as it comprises all visually unique damage types defined by bridge inspection standards. Every object class in dataset can be labeled with:

  • concrete defect tag: cavity,efflorescence,alligator crack,spalling,restformwork,exposed rebars, hollowareas, crack, rockpocket classes,
  • general defect tag: rust, wetspot, weathering, graffiti classes.
  • object part tag: bearing, expansion joint, drainage, protective equipment, joint tape, washouts/concrete corrosion classes.
image

Example annotations from dacl10k. Top row: original image. Middle row: polygonal annotations. Bottom row: stacked masks. The following classes are abbreviated: Alligator Crack (ACrack), Washouts/Concrete corrosion (WConccor), Expansion Joint (EJoint), Protective Equipment (PEquipment) and Joint Tape (JTape). From left to right, the images display the individual classes: 1. Weathering, Spalling, Exposed Rebars, Rust; 2. Weathering, Crack; 3. Alligator Crack, Restformwork, Efflorescence; 4. Weathering, Crack, Spalling, Rockpocket; 5. Crack, Rust, Expansion Joint, Spalling; 6. Weathering, Rockpocket, Spalling, Efflorescence, Crack, Rust, Restformwork, Joint Tape; 7. Weathering, Protective Eq.

The 19 classes considered within dacl10k are separated into three groups: concrete defects, general defects and objects. The concrete defects appear only on building parts made of (reinforced) concrete, while general defects may be present on all materials (e.g., concrete or steel). The only defect within dacl10k that is not visually recognizable is hollowareas. This damage is usually identified by hammering on the concrete surface, thus, it can only be detected acoustically (not visually) but, as it is bordered with chalk during hands-on inspections, the authors annotated its markings. The objects group includes all components of a bridge that are not made of concrete, such as joint tapes, railings or impact attenuation devices (protective equipment). The objects often show defects such as geometrical irregularities or deficits in structural capacity. Geometrical irregularities can arise from wrong distances between the railing rods or if the railing height is less than the minimum according to the given national standard. These visually challenging recognizable issues are not part of the dataset.

Dataset comprises coarse pixel-level annotations. The authors border each defect and object on a given image with one polygon (shape) and assign its label. Furthermore, they include polygons of the same class that overlap with each other in one shape. With respect to inspection standards and application, for all the defect classes, it is not important to differentiate between instances of a given class. Instead, their size and localization on a class-basis is important. Thereby, the authors utilize the open-source labeling tool LabelMe.

It often appears that shapes of different damage or object classes overlap with each other. Consequently, the underlying task can be described as multi-label semantic segmentation because one pixel can be part of multiple defects and objects. In other words, the labels are not assigned mutually exclusive to the pixels. The authors labeling process consisted of two consecutive steps: annotation of data received from engineering offices by civil engineering students (in-house) and annotation data from the authorities by an external annotation team, previously filtered for relevant image content. The students labeled approximately 7,000 images in accordance to our guidelines. Nearly 30% of the images had to be rejected due to flags indicating bad quality (blurring, overexposure) or personal data.

The pipeline during in-house labeling was separated into three parts. The first part consisted of the regular annotation which comprises annotating a batch of 100 images, getting feedback by a domain expert and correcting the failures accordingly. The second part included an extensive analysis of the dataset to find structural failures in the annotations. Thirdly, the dataset was divided into subtasks with respect to the failures most commonly made. Then, one student corrected each failure type consecutively. The quality assessment of the data annotated by the external team was divided into four quality checks for each data batch. In average, one batch included 250 images. Each check included one iteration over the annotated data by experts. Based on the analysis, the error rate was determined which is the ratio of false-labeled images and total amount of frames in the according batch. Starting with an error rate of 60%, the rate could be lowered to a final value of 1%.

Statistical analysis

Regarding the pixel counts over the whole dataset, it can be stated that weathering, followed by protective equipment, are the most dominant classes with
nearly four and 1.5 billion pixels. Within the range of 0.1 billion and 1 billion pixels, the majority of defects and objects with respect to the number of pixels can be found. Clearly underrepresented are the defects restformwork (66 million) and the objects joint tape (68 million) and exposed rebars (65 million).

image

Pixel counts with respect to each class in dacl10k based on the original image sizes. The bars are arranged according to the group affiliation.

The average image size is 1581 px in height and 1950 px in width. The mean image area is approximately 4 megapixels while the total pixel area in the dataset is approx. 43 billion px. Table below provides an overview of statistics describing the classwise density of polygons, size of polygons, density of pixels, share of polygons and share of pixels over the whole dataset. In average, 1.8 crack shapes are present on a crack image, whereby, one polygon includes 27,467 px. The average crack image shows approx. 50,000 px labeled as crack. With respect to the displayed shares, four out of 100 polygons are labeled as crack and
0.3% of the total pixel area received the label crack.

Furthermore, table below reveals the cause of the overrepresented classes weathering and protective equipment. They display a share regarding the number of polygons of 5.31% (top 20%) and 2.09% (exactly the median). This, in combination with the fact that an according image shows 900,000 px or rather 786,000 px of that class, leads to their dominant role. The overrepresentation of weathering can be more fatal than the one of protective equipment with respect to the model performance. This is due to the fact thatthe features (shape and texture) of weathering are similar to the ones from wetspot. Both are of round shape and represented by a darker area surrounded by a brighter “rest”. They vary slightly with regards to their texture. weathering is more noisy and more matt than wetspot which is smooth and sometimes mirroring.

In addition, wetspot and weathering often overlap which makes it difficult to distinguish between them from the model’s perspective. The features of the object protective equipment, in contrast, are unique and therefore shouldn’t interfere with other classes during learning. The lack of pixels representing restformwork, exposed rebars and joint tape, as mentioned before, originates from their relatively rare occurrences but mostly from their small shapes. In average, polygons bordering restformwork or joint tape have a size of approx. 50,000 px while polygons labeled as exposed rebars include 26,000 px. The average size of their polygons is equal to or less than the lower quartile (50,367 px). Additionally, exposed rebars shows the smallest average polygon size of all classes.

image

Overall statistics of the dataset regarding average number of polygons per image, number of pixels per polygon, number of pixels per image, share of polygons and share of pixels. Midrules separate the classes according to their group affiliation.

ExpandExpand
Dataset LinkHomepageDataset LinkResearch PaperDataset LinkDacl Challenge Data

Summary #

dacl10k: Benchmark for Semantic Bridge Damage Segmentation is a dataset for instance segmentation, semantic segmentation, and object detection tasks. It is used in the construction industry.

The dataset consists of 9920 images with 62327 labeled objects belonging to 19 different classes including rust, spalling, weathering, and other: crack, efflorescence, protective equipment, cavity, hollowareas, drainage, wetspot, graffiti, joint tape, exposed rebars, restformwork, bearing, expansion joint, alligator crack, rockpocket, and washouts/concrete corrosion.

Images in the dacl10k dataset have pixel-level instance segmentation annotations. Due to the nature of the instance segmentation task, it can be automatically transformed into a semantic segmentation (only one mask for every class) or object detection (bounding boxes for every object) tasks. There are 2081 (21% of the total) unlabeled images (i.e. without annotations). There are 3 splits in the dataset: train (6935 images), test (2010 images), and val (975 images). Additionally, each label in images has concrete defect, general defect, or object part tag. Also every image in test dataset has one of following tags: test dev or test challenge. Explore it in supervisely labeling tool. The dataset was released in 2023 by the University of the Bundeswehr Munich, Germany.

Here are the visualized examples for the classes:

Explore #

dacl10k dataset has 9920 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.

OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
OpenSample annotation mask from dacl10kSample image from dacl10k
👀
Have a look at 9920 images
View images along with annotations and tags, search and filter by various parameters

Class balance #

There are 19 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.

Search
Rows 1-10 of 19
Class
Images
Objects
Count on image
average
Area on image
average
rust
polygon
3915
14720
3.76
4.55%
spalling
polygon
3739
9884
2.64
5.48%
weathering
polygon
3096
4634
1.5
34.5%
crack
polygon
1981
3626
1.83
1.33%
efflorescence
polygon
1729
3965
2.29
9.34%
protective equipment
polygon
1494
1890
1.27
20.66%
cavity
polygon
1355
9271
6.84
4.67%
hollowareas
polygon
1252
1557
1.24
17.87%
drainage
polygon
1181
1652
1.4
6.62%
wetspot
polygon
1109
1661
1.5
18.57%

Co-occurrence matrix #

Co-occurrence matrix is an extremely valuable tool that shows you the images for every pair of classes: how many images have objects of both classes at the same time. If you click any cell, you will see those images. We added the tooltip with an explanation for every cell for your convenience, just hover the mouse over a cell to preview the description.

Images #

Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.

Class sizes #

The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.

Search
Rows 1-10 of 19
Class
Object count
Avg area
Max area
Min area
Min height
Min height
Max height
Max height
Avg height
Avg height
Min width
Min width
Max width
Max width
rust
polygon
14720
1.2%
98.49%
0%
4px
0.25%
4032px
100%
174px
11.24%
4px
0.22%
5176px
100%
spalling
polygon
9884
2.06%
99.71%
0%
6px
0.25%
4030px
100%
222px
15.66%
6px
0.44%
5472px
100%
cavity
polygon
9271
0.68%
99.87%
0%
2px
0.21%
3000px
100%
55px
4.75%
3px
0.18%
3898px
100%
weathering
polygon
4634
23.04%
99.94%
0%
11px
0.77%
4032px
100%
642px
49.79%
11px
0.62%
5152px
100%
efflorescence
polygon
3965
4.05%
99.94%
0%
8px
0.6%
3742px
100%
419px
25.68%
4px
0.21%
5053px
100%
crack
polygon
3626
0.7%
11.01%
0%
8px
0.45%
4240px
100%
388px
23.85%
9px
0.4%
4240px
100%
graffiti
polygon
2200
7.53%
99.94%
0%
6px
0.62%
3267px
100%
327px
27.65%
5px
0.52%
4673px
100%
exposed rebars
polygon
2000
0.54%
41.58%
0%
6px
0.41%
3130px
100%
199px
11.37%
5px
0.33%
4256px
99.95%
protective equipment
polygon
1890
16.27%
100.47%
0.01%
8px
0.67%
4032px
100%
605px
41.01%
8px
0.62%
5152px
100%
wetspot
polygon
1661
12.37%
99.94%
0.01%
9px
1.32%
4000px
100%
486px
40.47%
15px
0.94%
4441px
100%

Spatial Heatmap #

The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.

Spatial Heatmap

Objects #

Table contains all 62327 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.

Search
Rows 1-10 of 62327
Object ID
Class
Image name
click row to open
Image size
height x width
Height
Height
Width
Width
Area
1
spalling
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
384px
15.69%
372px
11.4%
1.31%
2
spalling
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
150px
6.13%
150px
4.6%
0.13%
3
spalling
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
484px
19.77%
377px
11.55%
1.51%
4
rust
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
214px
8.74%
77px
2.36%
0.14%
5
spalling
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
311px
12.7%
128px
3.92%
0.34%
6
drainage
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
389px
15.89%
475px
14.55%
1.42%
7
graffiti
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
1047px
42.77%
2835px
86.86%
22.42%
8
graffiti
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
831px
33.95%
1219px
37.35%
8.21%
9
graffiti
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
133px
5.43%
122px
3.74%
0.13%
10
hollowareas
polygon
dacl10k_v2_train_3793.jpg
2448 x 3264
508px
20.75%
412px
12.62%
1.85%

License #

dacl10k: Benchmark for Semantic Bridge Damage Segmentation is under CC BY-NC 4.0 license.

Source

Citation #

If you make use of the dacl10k data, please cite the following reference:

@dataset{dacl10k,
  author={Johannes Flotzinger and Philipp J. Rosch and Thomas Braml},
  title={dacl10k: Benchmark for Semantic Bridge Damage Segmentation},
  year={2023},
  url={https://dacl.ai/workshop.html}
}

Source

If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:

@misc{ visualization-tools-for-dacl10k-dataset,
  title = { Visualization Tools for dacl10k Dataset },
  type = { Computer Vision Tools },
  author = { Dataset Ninja },
  howpublished = { \url{ https://datasetninja.com/dacl10k } },
  url = { https://datasetninja.com/dacl10k },
  journal = { Dataset Ninja },
  publisher = { Dataset Ninja },
  year = { 2024 },
  month = { mar },
  note = { visited on 2024-03-05 },
}

Download #

Dataset dacl10k can be downloaded in Supervisely format:

As an alternative, it can be downloaded with dataset-tools package:

pip install --upgrade dataset-tools

… using following python code:

import dataset_tools as dtools

dtools.download(dataset='dacl10k', dst_dir='~/dataset-ninja/')

Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.

The data in original format can be downloaded here:

. . .

Disclaimer #

Our gal from the legal dep told us we need to post this:

Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.

You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.