Simulated-Orchards - Dataset Ninja

Introduction #

Dylan Hasperhoven, Maya Aghaei, Klaas Dijkstra

Simulated-Orchards presents a dataset designed explicitly for object detection tasks, featuring 1499 images containing a total of 44885 labeled objects all falling within a singular class — apple. Notably, this dataset is generated systematically through a tool developed in the Unity 3D game engine, allowing for the systematic creation of simulated datasets. The focus on a singular class, in this case, apples, caters to applications in object detection, offering a rich resource for training models to identify and locate apples within simulated orchard environments, providing a valuable asset for agricultural and computer vision research.

For the systematic generation of simulated datasets, authors propose a tool made in the Unity 3D game engine. This software system can be configured to adapt the resulting dataset to specifically fit the application. After the configurations are chosen, the software will proceed to automatically create a dataset to be used as training data for object detection networks.

Simulating apple orchards in Unity

To generate realistic data, a simulator that mimics an apple orchard is needed. This should reflect the scenario in the real-world. Authors’ tool achieves this by using components supplied by the Unity 3D game engine. The camera component is used to simulate an RGB camera in a 3D scenario.

To simulate the ‘orchard’ component, authors’ simulator uses a 2D plane with an image of a real-world orchard projected on it. This strategy allows to simulate a representative orchard without the need for a true-to-life 3D reconstruction of an orchard, which would take a considerably larger amount of time and effort to realise.

Simulated apple orchard with 2D plane with projected image and 3D textured apple models.

The simulator uses textured 3D models that are positioned between the 2D plane and the camera to simulate apples. The placements of the 3D apples are randomly determined. A selection of apple models that represent a wide range of varieties was used.

Textured 3D apples used in the simulation.

Camera transformation

By altering the camera transform for every data point the data will be generated from different angles and distances. Our tool does this by generating pseudo-random transforms. Because our simulator uses a 2D background plane with 3D apples positioned in front of it, there is a limited space of transforms for the camera that would produce valid data. In this case, valid data is data where:

Every pixel on the RGB image is projected from the 2D background or an apple.
The camera is facing the front-side of the 2D plane.
Apples are clearly visible (i.e. not occluded, fully in frame of the camera, sufficiently sized to be recognizable).

Invalid camera positions: (a) Camera is too close to the 2D plane, causing the apples to be not clearly visible. (b) Camera is oriented downwards causing pixels not projected from the 2D background plane or an apple to appear in the camera image. © Camera is too far from the 2D plane, causing the apples not to be clearly visible.

Limits of orientations and positions can be configured in the tool. The generator then generates random values to calculate transforms on the linear interpolants between these limits. Furthermore, our tool includes a minimum and maximum distance between the 2D plane and the camera. This guarantees the generated camera transforms are ‘valid’ according to the conditions described above.

Apple positions

New 3D positions for apples are generated after a specified number of data points. This configurable number determines the generation. For the first method, a random uniform distribution creates x and y-coordinates within the target resolution of the image data. Back-projection and Unity’s raycasting system are utilized to establish the 3D position on the 2D background plane corresponding to the x and y-coordinates on the camera frame.

If the smallest distance between the generated position and other generated positions is less than the diameter of the apples (adjusted for maximum overlap), the generated position is discarded to ensure minimal overlap among apples.

In the second method, 2D coordinates of the centers of all apples are extracted from the instance segmentation masks of the APPLE MOTS dataset. Only segmentation masks from scenarios featuring a line of apple trees perpendicular to the camera are considered. These positions are stored per segmentation mask. During new position generation, the tool randomly selects a segmentation mask to utilize the positions from. Similar to the first method, back-projection and Unity’s raycasting system are employed to determine the 3D position on the 2D background plane corresponding to the x-coordinates and y-coordinates derived from the extracted positions on the image frame.

Augmenting data

To ensure the generated data reflects real-world scenarios, the generator tool incorporates what are known as augmentations. These augmentations can be enabled or disabled to diversify the datasets.

In the standard scenario (where no augmentations are applied), all apples within the simulator share identical 3D models, textures, scales, rotations, and z-axis coordinates. The lighting remains consistent across the dataset. The variations within the dataset are limited to changes in apple positions, quantities, and the camera’s transformation.

Augmentation	Description
rotation	Gives a random orientation to every apple.
scale	Gives a random scale to every apple.
depth	Gives a random z-coordinate (depth) to every apple.
lighting	Changes the lighting of the scene.
color per apple	Gives a random (realistic) color to each apple.
color per scene	Generates a random (realistic) color for all apples in the scene.
model per apple	Chooses a random 3D model for every apple from the set of 3D models shown in Fig. 2.
model per scene	Chooses a random 3D model apple from the set of 3D models shown in Fig. 2 and gives it to all apples in the scene.

List of all augmentations for generating Simulated-Orchards datasets.

Generating annotations

The technique introduced by this tool enables precise bounding box calculations within Unity 3D.

The algorithm initiates by computing an instance segmentation mask. This process involves utilizing back-projection to derive the directional vector from the camera origin, passing through an x and y-coordinate on the image plane. Within the 3D scene, Unity’s raycasting system casts a ray originating from the camera origin in the direction determined in the prior step. The Unity algorithm locates the first object intersected by the ray, discerning whether it represents the background or an apple. If the intersection corresponds to the background, a value of ‘0’ is inserted into the mask. Conversely, if the intersection corresponds to an apple, an instance identifier specific to that apple is inserted into the map. Bounding boxes are subsequently derived by establishing the minimum and maximum x and y-coordinates for each instance identifier.

Process of calculating bounding boxes. a: RGB image from the virtual camera. b: instance segmentation mask created using back-projection. c: bounding boxes (overlayed on original RGB image) calculated by finding the minimum and maximum x-coordinate and y-coordinate for every instance.

Real-world datasets

A real-world dataset was used to evaluate and compare the performance of models trained on simulated data.

You can check Mini-Orchards Dataset (available on DatasetNinja) for a more thorough understanding.

Expand

Homepage

Research Paper

Blog Post

Summary #

Simulated-Orchards is a dataset for an object detection task. It is used in the agricultural industry.

The dataset consists of 1499 images with 44885 labeled objects belonging to 1 single class (apple).

Images in the Simulated-Orchards dataset have bounding box annotations. There are 63 (4% of the total) unlabeled images (i.e. without annotations). There are no pre-defined train/val/test splits in the dataset. The dataset was released in 2023 by the NHL Stenden University of Applied Sciences, Netherlands.

Explore #

Simulated-Orchards dataset has 1499 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.

Sample annotation mask from Simulated-Orchards

👀

Have a look at 1499 images

View images along with annotations and tags, search and filter by various parameters

Class balance #

There are 1 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.

Rows 1-1 of 1

Class ㅤ	Images ㅤ	Objects ㅤ	Count on image average	Area on image average
apple➔ rectangle	1436	44885	31.26	1.79%

Images #

Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.

Class sizes #

The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.

Rows 1-1 of 1

Class	Object count	Avg area	Max area	Min area	Min height	Min height	Max height	Max height	Avg height	Avg height	Min width	Min width	Max width	Max width
apple rectangle	44885	0.06%	3.86%	0.01%	7px	0.72%	224px	23.05%	24px	2.45%	8px	0.62%	217px	16.74%

Spatial Heatmap #

The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.

Objects #

Table contains all 44885 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.

Rows 1-10 of 44885

Object ID ㅤ	Class ㅤ	Image name click row to open	Image size height x width	Height ㅤ	Height ㅤ	Width ㅤ	Width ㅤ	Area ㅤ
1➔	apple rectangle	837.png	972 x 1296	42px	4.32%	33px	2.55%	0.11%
2➔	apple rectangle	837.png	972 x 1296	33px	3.4%	26px	2.01%	0.07%
3➔	apple rectangle	837.png	972 x 1296	32px	3.29%	25px	1.93%	0.06%
4➔	apple rectangle	837.png	972 x 1296	32px	3.29%	25px	1.93%	0.06%
5➔	apple rectangle	837.png	972 x 1296	40px	4.12%	34px	2.62%	0.11%
6➔	apple rectangle	837.png	972 x 1296	31px	3.19%	25px	1.93%	0.06%
7➔	apple rectangle	837.png	972 x 1296	31px	3.19%	24px	1.85%	0.06%
8➔	apple rectangle	837.png	972 x 1296	37px	3.81%	33px	2.55%	0.1%
9➔	apple rectangle	837.png	972 x 1296	35px	3.6%	32px	2.47%	0.09%
10➔	apple rectangle	837.png	972 x 1296	28px	2.88%	23px	1.77%	0.05%

License #

Simulated-Orchards is under ODbL v1.0 license.

Source

Citation #

If you make use of the Simulated-Orchards data, please cite the following reference:

@dataset{Simulated-Orchards,
  author={Dylan Hasperhoven and Maya Aghaei and Klaas Dijkstra},
  title={Simulated-Orchards},
  year={2023},
  url={https://www.kaggle.com/datasets/dylanhasperhoven/simulated-orchards}
}

Source

If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:

@misc{ visualization-tools-for-simulated-orchards-dataset,
  title = { Visualization Tools for Simulated-Orchards Dataset },
  type = { Computer Vision Tools },
  author = { Dataset Ninja },
  howpublished = { \url{ https://datasetninja.com/simulated-orchards } },
  url = { https://datasetninja.com/simulated-orchards },
  journal = { Dataset Ninja },
  publisher = { Dataset Ninja },
  year = { 2025 },
  month = { aug },
  note = { visited on 2025-08-08 },
}

Download #

Dataset Simulated-Orchards can be downloaded in Supervisely format:

As an alternative, it can be downloaded with dataset-tools package:

pip install --upgrade dataset-tools

… using following python code:

import dataset_tools as dtools

dtools.download(dataset='Simulated-Orchards', dst_dir='~/dataset-ninja/')

Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.

The data in original format can be downloaded here.

. . .

Disclaimer #

Our gal from the legal dep told us we need to post this:

Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.

You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.