Dataset Ninja LogoDataset Ninja:

FoodSeg103 Dataset

71181032143
Tagfood
Tasksemantic segmentation
Release YearMade in 2021
LicenseApache 2.0
Download1 GB

Introduction #

Released 2021-05-12 ·Xiongwei Wu, Xin Fu, Ying Liuet al.

The authors built a new food image dataset FoodSeg103 Dataset containing 9,490 images. They annotate these images with 103 ingredient classes and each image has an average of 6 ingredient labels and pixel-wise masks. This dataset encompasses 7,118 images of Western cuisine, meticulously annotated with 103 distinct ingredient classes along with their corresponding segmentation masks.

Motivation

In recent years, food computing has garnered increasing attention from the public, serving as the backbone for advancements in food and health-related research and applications. A key objective within food computing is the automated recognition of various types of food and the profiling of their nutritional and caloric values. In the realm of computer vision, related endeavors encompass dish classification, recipe generation, and food image retrieval. However, the majority of these efforts focus on representing and analyzing food images as a whole entity, without explicitly localizing or classifying the individual ingredients comprising the cooked dish.

This distinction leads to two primary tasks: food image classification, which addresses the holistic identification of food items, and food image segmentation, which delves into the localization and classification of individual ingredients within the food image. Of the two, food image segmentation presents a greater level of complexity as it endeavors to discern each ingredient category and its respective pixel-wise locations within the food image. For instance, in an image depicting a “hamburger,” a proficient segmentation model would need to accurately delineate and mask out components like “beef,” “tomato,” “lettuce,” “onion,” and the “bread roll.”

Compared to semantic segmentation tasks involving general object images, food image segmentation poses heightened challenges due to the vast diversity in food appearances and the often imbalanced distribution of ingredient categories. Firstly, ingredients cooked in various manners can exhibit significant visual disparities, complicating their identification. Additionally, certain ingredients may bear striking resemblances; for instance, “pineapples” cooked with meat might closely resemble “potatoes” cooked with meat, posing a challenge for differentiation. Secondly, food datasets commonly grapple with imbalanced distributions, both in terms of overall food classes and individual ingredient categories. This disparity arises due to two primary factors: Firstly, a few popular food classes dominate a large portion of food images, while the majority of food classes remain less represented. Secondly, a selection bias may exist in the construction of food image collections, further exacerbating the imbalance in class distribution.

image

The first row shows a source image and its segmentation masks on our FoodSeg103. The second row shows example images to reveal the difficulties of food image segmentation, e.g., the pineapples in (a) and (b) look different, while the pineapple in (a) and the potato in (c) look quite similar.

Dataset description

To enable precise fine-grained food image segmentation, the authors have developed a comprehensive dataset known as FoodSeg103. This dataset encompasses 7,118 images of Western cuisine, meticulously annotated with 103 distinct ingredient classes along with their corresponding segmentation masks. The annotation process involved a meticulous approach, entailing careful data selection and iterative refinement of labels and masks to ensure the highest quality annotations. However, it’s worth noting that the annotation process undertaken by the authors was both resource-intensive and time-consuming. The source images utilized in constructing the FoodSeg103 dataset were sourced from an existing food dataset known as Recipe1M, which boasts millions of images and accompanying cooking recipes. These recipes not only detail the cooking instructions but also specify the ingredients used. Leveraging this auxiliary information, the authors incorporated recipe details into the training process of semantic segmentation models, thereby enhancing the model’s understanding of food composition and facilitating more accurate segmentation.

FoodSeg103 serves as a subset of the broader FoodSeg154 dataset, which encompasses an additional subset dedicated to Asian cuisine images and annotations. In dataset, the authors meticulously curated 7,118 images, defining 103 distinct ingredient categories and providing corresponding category labels along with segmentation masks. Additionally, within FoodSeg154, the authors collected a supplementary set comprising 2,372 images showcasing diverse Asian culinary offerings. This subset boasts a greater variety compared to the Western food images present in FoodSeg103. The authors specifically utilize this subset to assess the domain adaptation capabilities of their food image segmentation models. While FoodSeg103 is made publicly available to support research endeavors, the subset containing Asian food images cannot be released to the public at this time due to confidentiality constraints associated with the images.

image

Foodseg103 examples: source images (left) and annotations (right).

The authors utilized FoodSeg103 as a case study to elucidate the process of constructing the dataset. They detailed the origins of the images, the compilation of ingredient categories, and the selection of images as follows:

  • Image Source: The authors sourced images from Recipe1M, a dataset comprising 900k images featuring cooking instructions and ingredient labels. This dataset serves various purposes such as food image retrieval and recipe generation tasks.
  • Categories: Initially, the authors surveyed the frequency of all ingredient categories within Recipe1M. Despite the dataset containing around 1.5k ingredient categories, many proved challenging to mask out from images effectively. Consequently, the authors streamlined the categories to retain only the top 124 (later refined to 103) ingredients. Any ingredients not falling under these categories were assigned to the other ingredients category.
  • Image Selection: Within each fine-grained ingredient category, the authors sampled images from Recipe1M based on two criteria: 1) Each image should feature at least two ingredients, either of the same or different categories, with a maximum of 16 ingredients per image; and 2) The ingredients must be clearly visible and easily annotatable within the images. Following this selection process, the authors obtained 7,118 images for annotation with segmentation masks.
image

More annotation examples of FoodSeg103. The source images are in the left hand, while the annotation masks are in the right hand.

The subsequent phase involves annotating segmentation masks, which entails delineating polygons to cover the pixel-wise locations of various ingredients. This process comprises two main stages: annotation and refinement.

  • Annotation: The authors enlisted the assistance of a data annotation company to undertake the meticulous task of mask annotation. Each image was meticulously examined by a human annotator, who initially identified the ingredient categories present, assigned the appropriate category label to each ingredient, and delineated the pixel-wise mask accordingly. Annotators were instructed to disregard minuscule image regions, even if they contained some ingredients, if their area covered less than 5% of the entire image.
  • Refinement: Upon receiving all the masks from the annotation company, the authors proceeded with an extensive refinement process. This involved adhering to three primary refinement criteria: 1) rectifying any mislabeled data; 2) eliminating unpopular category labels assigned to fewer than 5 images; and 3) consolidating visually similar ingredient categories, such as merging “orange” and “citrus.” Following refinement, the initial set of 125 ingredient categories was streamlined to 103. The annotation and refinement endeavors spanned approximately one year.
image

Examples of dataset refinement. (a) sources images (b) before refinement (wrong or confusing labels exist), and (c) after refinement.

image
ExpandExpand
Dataset LinkHomepageDataset LinkResearch PaperDataset LinkGitHub

Summary #

FoodSeg103 Dataset is a dataset for semantic segmentation and object detection tasks. It is used in the food industry.

The dataset consists of 7118 images with 26016 labeled objects belonging to 103 different classes including bread, carrot, chicken duck, and other: sauce, tomato, potato, steak, broccoli, ice cream, cilantro mint, rice, pork, lemon, lettuce, strawberry, pie, cucumber, onion, corn, cake, pepper, cheese butter, french beans, fish, biscuit, egg, asparagus, noodles, and 75 more.

Images in the FoodSeg103 dataset have pixel-level semantic segmentation annotations. All images are labeled (i.e. with annotations). There are 2 splits in the dataset: train (4983 images) and test (2135 images). Alternatively, the dataset could be split into 15 supercategories: vegetable (9635 objects), main (3665 objects), meat (3561 objects), fruit (3007 objects), dessert (2429 objects), sauce (1145 objects), beverage (714 objects), seafood (560 objects), fungus (303 objects), nut (299 objects), egg (292 objects), other ingredients (227 objects), soup (89 objects), tofu (68 objects), and salad (22 objects). The dataset was released in 2021 by the Management University, Singapore and Beijing Jiaotong University, China.

Here is a visualized example for randomly selected sample classes:

Explore #

FoodSeg103 dataset has 7118 images. Click on one of the examples below or open "Explore" tool anytime you need to view dataset images with annotations. This tool has extended visualization capabilities like zoom, translation, objects table, custom filters and more. Hover the mouse over the images to hide or show annotations.

OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
OpenSample annotation mask from FoodSeg103Sample image from FoodSeg103
👀
Have a look at 7118 images
View images along with annotations and tags, search and filter by various parameters

Class balance #

There are 103 annotation classes in the dataset. Find the general statistics and balances for every class in the table below. Click any row to preview images that have labels of the selected class. Sort by column to find the most rare or prevalent classes.

Search
Rows 1-10 of 103
Class
Images
Objects
Count on image
average
Area on image
average
bread
mask
1405
1405
1
21.47%
carrot
mask
1279
1279
1
10.63%
chicken duck
mask
1242
1242
1
22.54%
sauce
mask
1145
1145
1
10.25%
tomato
mask
1139
1139
1
8.43%
potato
mask
1091
1091
1
14.79%
steak
mask
1065
1065
1
20.24%
broccoli
mask
1013
1013
1
12.73%
ice cream
mask
913
913
1
12.66%
cilantro mint
mask
900
900
1
7.32%

Images #

Explore every single image in the dataset with respect to the number of annotations of each class it has. Click a row to preview selected image. Sort by any column to find anomalies and edge cases. Use horizontal scroll if the table has many columns for a large number of classes in the dataset.

Object distribution #

Interactive heatmap chart for every class with object distribution shows how many images are in the dataset with a certain number of objects of a specific class. Users can click cell and see the list of all corresponding images.

Class sizes #

The table below gives various size properties of objects for every class. Click a row to see the image with annotations of the selected class. Sort columns to find classes with the smallest or largest objects or understand the size differences between classes.

Search
Rows 1-10 of 103
Class
Object count
Avg area
Max area
Min area
Min height
Min height
Max height
Max height
Avg height
Avg height
Min width
Min width
Max width
Max width
bread
mask
1405
21.47%
78.11%
0%
7px
1.82%
3479px
100%
351px
62.31%
1px
0.2%
4128px
100%
carrot
mask
1279
10.62%
80.04%
0.16%
21px
6%
2706px
100%
231px
46.78%
18px
3.52%
3254px
100%
chicken duck
mask
1242
22.54%
75.52%
0.26%
23px
4.1%
3873px
100%
411px
58.98%
56px
5.61%
3264px
100%
sauce
mask
1145
10.25%
60.47%
0.17%
7px
2.35%
2806px
100%
234px
39.25%
18px
5.77%
2668px
100%
tomato
mask
1139
8.43%
55.42%
0.06%
9px
4.17%
2417px
100%
221px
40.86%
11px
3.43%
3078px
100%
potato
mask
1091
14.79%
70.2%
0.36%
26px
6.77%
3402px
100%
305px
50.29%
34px
7.48%
4128px
100%
steak
mask
1065
20.24%
71.45%
0.52%
20px
5.75%
2430px
100%
299px
55.04%
55px
8.9%
3103px
100%
broccoli
mask
1013
12.73%
89.4%
0.03%
10px
2.94%
2448px
100%
289px
48.24%
36px
7.42%
4056px
100%
ice cream
mask
913
12.66%
79.22%
0%
1px
0.3%
2313px
100%
231px
42.9%
1px
0.2%
2734px
100%
cilantro mint
mask
900
7.32%
57.18%
0.15%
23px
7.05%
3135px
100%
229px
41.82%
20px
5.61%
2915px
100%

Spatial Heatmap #

The heatmaps below give the spatial distributions of all objects for every class. These visualizations provide insights into the most probable and rare object locations on the image. It helps analyze objects' placements in a dataset.

Spatial Heatmap

Objects #

Table contains all 26016 objects. Click a row to preview an image with annotations, and use search or pagination to navigate. Sort columns to find outliers in the dataset.

Search
Rows 1-10 of 26016
Object ID
Class
Image name
click row to open
Image size
height x width
Height
Height
Width
Width
Area
1
orange
mask
00000794.jpg
800 x 1280
285px
35.62%
340px
26.56%
4.12%
2
chicken duck
mask
00000794.jpg
800 x 1280
570px
71.25%
820px
64.06%
30.37%
3
sauce
mask
00000794.jpg
800 x 1280
26px
3.25%
144px
11.25%
0.24%
4
other ingredients
mask
00000794.jpg
800 x 1280
337px
42.12%
1018px
79.53%
9.8%
5
grape
mask
00001420.jpg
410 x 512
180px
43.9%
297px
58.01%
18.21%
6
chicken duck
mask
00001420.jpg
410 x 512
290px
70.73%
512px
100%
44.73%
7
potato
mask
00001420.jpg
410 x 512
238px
58.05%
223px
43.55%
18.51%
8
egg
mask
00001681.jpg
256 x 256
113px
44.14%
109px
42.58%
12.14%
9
pork
mask
00001681.jpg
256 x 256
138px
53.91%
135px
52.73%
19.65%
10
carrot
mask
00001681.jpg
256 x 256
73px
28.52%
87px
33.98%
5.83%

License #

FoodSeg103 Dataset is under Apache 2.0 license.

Source

Citation #

If you make use of the FoodSeg103 data, please cite the following reference:

@inproceedings{wu2021foodseg,
	title={A Large-Scale Benchmark for Food Image Segmentation},
	author={Wu, Xiongwei and Fu, Xin and Liu, Ying and Lim, Ee-Peng and Hoi, Steven CH and Sun, Qianru},
	booktitle={Proceedings of ACM international conference on Multimedia},
	year={2021}
}

Source

If you are happy with Dataset Ninja and use provided visualizations and tools in your work, please cite us:

@misc{ visualization-tools-for-food-seg-dataset,
  title = { Visualization Tools for FoodSeg103 Dataset },
  type = { Computer Vision Tools },
  author = { Dataset Ninja },
  howpublished = { \url{ https://datasetninja.com/food-seg-103 } },
  url = { https://datasetninja.com/food-seg-103 },
  journal = { Dataset Ninja },
  publisher = { Dataset Ninja },
  year = { 2024 },
  month = { nov },
  note = { visited on 2024-11-21 },
}

Download #

Dataset FoodSeg103 can be downloaded in Supervisely format:

As an alternative, it can be downloaded with dataset-tools package:

pip install --upgrade dataset-tools

… using following python code:

import dataset_tools as dtools

dtools.download(dataset='FoodSeg103', dst_dir='~/dataset-ninja/')

Make sure not to overlook the python code example available on the Supervisely Developer Portal. It will give you a clear idea of how to effortlessly work with the downloaded dataset.

The data in original format can be downloaded here.

. . .

Disclaimer #

Our gal from the legal dep told us we need to post this:

Dataset Ninja provides visualizations and statistics for some datasets that can be found online and can be downloaded by general audience. Dataset Ninja is not a dataset hosting platform and can only be used for informational purposes. The platform does not claim any rights for the original content, including images, videos, annotations and descriptions. Joint publishing is prohibited.

You take full responsibility when you use datasets presented at Dataset Ninja, as well as other information, including visualizations and statistics we provide. You are in charge of compliance with any dataset license and all other permissions. You are required to navigate datasets homepage and make sure that you can use it. In case of any questions, get in touch with us at hello@datasetninja.com.