Introduction #
Pink-Eggs Dataset V1 has been specifically curated for object detection tasks within the environmental industry. Comprising 1261 images, this dataset includes 2518 labeled objects falling under a singular class β eggs. The dataset presents a unique collection of images highlighting pink eggs recognized as belonging to the Pomacea canaliculata species, each accompanied by precise bounding box annotations. Its primary objective is to serve as a valuable resource for researchers, utilizing deep learning techniques to analyze and understand the distribution and proliferation of Pomacea canaliculata species. Furthermore, this dataset supports various investigative endeavors that rely on visual data pertaining to the eggs of Pomacea canaliculata, aiding studies within ecological research and environmental sciences.
Motivation
The authors were driven by a crucial need to address the urgent ecological threat posed by the invasive apple snail species, Pomacea canaliculata. Originating from South America, the rapid global spread of this species, triggered by human activities, has resulted in detrimental impacts on wetland ecosystems, potentially endangering native species and human health. With diverse control methods under consideration, ranging from pesticides to natural predators, each with distinct risks and benefits, the authors developed the PinkEggs Dataset. This initiative focuses on utilizing the capabilities of machine learning and computer vision to effectively identify Pomacea canaliculata eggs by their distinct pink color and clustering pattern. This innovative strategy not only offers a promising solution for invasive species management but also enhances authors comprehension of the behaviors and population dynamics of such species, paving the way for more sustainable and environmentally friendly solutions. Further research in this realm holds the potential to introduce groundbreaking strategies in the ongoing battle against invasive species.
About Dataset
Four examples of Pomacea canaliculata detection result with object bounding box localization.
Based on the morphological characteristics of the eggs observed, as well as the presence of Pomacea canaliculata in the surrounding area, authors infer that the specimens captured in Shenzhen between October and December of 2022 during daylight hours and clear weather conditions are the eggs of Pomacea canaliculata. For close-range photography, a Redmi K50 Ultra cellular device with default camera settings was employed to capture images of the eggs. To capture distant images, a D7200 camera equipped with an 18-140mm focus range lens was utilized in auto mode. In both cases, the images were saved in the JPG format.
After detecting distortions and imperfections in the collected data, data cleansing was performed by removing certain images that did not meet predetermined quality standards. Specifically, images that could be reliably identified as depicting Pomacea canaliculata eggs were retained, while images with severe degrees of blurriness were removed. These factors could be attributed to distance, motion-induced distortion, size, and camera-specific effects. Each image in the dataset was annotated with bounding box labels using the labelImg tool by three annotators, with all annotations incorporated into the dataset to facilitate object detection and classification. In order to minimize subjectivity in the labeling process, random samples of labeled data were reviewed. Furthermore, three sets of annotations were provided to enable evaluation through methods such as cross-validation and bootstrapping. The average Intersection over Union (IoU) rate was calculated between any two sets, and all the values are above 0.87. Based on these measures, authors are highly confident that the collected images are suitable for supporting research hypotheses.
The dataset was partitioned into three distinct subsets, namely train, val, and test sets, each being mutually exclusive. The training set consisted of a total of 1000 images, randomly selected from the dataset, whereas the val set and test set comprised 100 and 161 images, respectively. Additionally, all images were subjected to a modification process that involved the removal of embedded camera messages while preserving the original pixel values.
In the pursuit of acquiring a comprehensive dataset, further images of Pomacea canaliculata eggs were sourced through online search engines. However, due to a dearth of explicit consent for their reuse, download, and distribution, these images could not be included in the final curation.
Summary #
Pink-Eggs Dataset V1 is a dataset for an object detection task. It is used in the biological research.
The dataset consists of 1261 images with 2518 labeled objects belonging to 1 single class (eggs).
Images in the Pink-Eggs Dataset V1 dataset have bounding box annotations. All images are labeled (i.e. with annotations). There are 3 splits in the dataset: train (1000 images), test (161 images), and val (100 images). The dataset was released in 2023.
Explore #
Pink-Eggs Dataset V1 dataset has 1261 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.
Class balance #
There are 1 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.
Class γ
€ | Images γ
€ | Objects γ
€ | Count on image average | Area on image average |
---|---|---|---|---|
eggsβ rectangle | 1261 | 2518 | 2 | 3.5% |
Images #
Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.
Object distribution #
Interactive heatmap chart for every class with object distribution shows how many images are in the dataset with a certain number of objects of a specific class. Users can click cell and see the list of all corresponding images.
Class sizes #
The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.
Class | Object count | Avg area | Max area | Min area | Min height | Min height | Max height | Max height | Avg height | Avg height | Min width | Min width | Max width | Max width |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
eggs rectangle | 2518 | 1.76% | 36.66% | 0.01% | 35px | 0.88% | 2993px | 74.83% | 383px | 9.77% | 36px | 0.6% | 2162px | 72.07% |
Spatial Heatmap #
The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.
Objects #
Table contains all 2518 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.
Object ID γ
€ | Class γ
€ | Image name click row to open | Image size height x width | Height γ
€ | Height γ
€ | Width γ
€ | Width γ
€ | Area γ
€ |
---|---|---|---|---|---|---|---|---|
1β | eggs rectangle | IMG_20220925_135206.jpg | 4000 x 3000 | 419px | 10.47% | 401px | 13.37% | 1.4% |
2β | eggs rectangle | _WGX2651.JPG | 4000 x 6000 | 174px | 4.35% | 101px | 1.68% | 0.07% |
3β | eggs rectangle | _WGX2651.JPG | 4000 x 6000 | 158px | 3.95% | 101px | 1.68% | 0.07% |
4β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 180px | 4.5% | 135px | 2.25% | 0.1% |
5β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 114px | 2.85% | 53px | 0.88% | 0.03% |
6β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 279px | 6.97% | 140px | 2.33% | 0.16% |
7β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 227px | 5.67% | 105px | 1.75% | 0.1% |
8β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 188px | 4.7% | 92px | 1.53% | 0.07% |
9β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 280px | 7% | 97px | 1.62% | 0.11% |
10β | eggs rectangle | _WGX2957.JPG | 4000 x 6000 | 175px | 4.38% | 83px | 1.38% | 0.06% |
License #
Pink-Eggs Dataset V1 is under GNU GPL 2.0 license.
Citation #
If you make use of the Pink-Eggs Dataset V1 data, please cite the following reference:
@misc{xu2023pinkeggs,
title={Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions},
author={Di Xu and Yang Zhao and Xiang Hao and Xin Meng},
year={2023},
eprint={2305.09302},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:
@misc{ visualization-tools-for-pink-eggs-dataset-v1-dataset,
title = { Visualization Tools for Pink-Eggs Dataset V1 Dataset },
type = { Computer Vision Tools },
author = { Dataset Ninja },
howpublished = { \url{ https://datasetninja.com/pink-eggs-dataset-v1 } },
url = { https://datasetninja.com/pink-eggs-dataset-v1 },
journal = { Dataset Ninja },
publisher = { Dataset Ninja },
year = { 2024 },
month = { nov },
note = { visited on 2024-11-21 },
}
Download #
Dataset Pink-Eggs Dataset V1 can be downloaded in Supervisely format:
As an alternative, it can be downloaded with dataset-tools package:
pip install --upgrade dataset-tools
β¦ using following python code:
import dataset_tools as dtools
dtools.download(dataset='Pink-Eggs Dataset V1', dst_dir='~/dataset-ninja/')
Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.
The data in original format can be downloaded here.
Disclaimer #
Our gal from the legal dep told us we need to post this:
Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.
You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.