There is a newer version of the record available.

Published April 9, 2026 | Version v1
Dataset Open

Testing, Training and Validation Synthetic Dataset of Transmission Electron Microscopy (TEM) Images of Gold Nano-particles for Segmentation

Description

This is a dataset of Transmission Electron Microscopy (TEM) images of gold nano-particles and comprises synthetic data (generated using TEMPOS) mimicking experimental TEM images (from Professor Lee Cronin, University of Glasgow, https://doi.org/10.1038/s41467-020-16501-4) for segmentation. This data has been split into test and training sets. A validation set of experimental images with manual annotations has also been included. This croissant file was heavily inspired by https://github.com/mlcommons/croissant/tree/828034a45d5c536789c7f6311d4c4a68f7804129/datasets/1.0/coco2014

FileObjects:

contentUrl description
./images.zip zip file containing TEM png image files, grouped into test/train/val folders.
./instances_annotations_train.json Metadata for training dataset and each image file along with segmentation annotations for each image file in COCO format (https://cocodataset.org/#home, very common for instance segmentation).
./instances_annotations_test.json Metadata for test dataset and each image file along with segmentation annotations for each image file in COCO format (https://cocodataset.org/#home, very common for instance segmentation).
./instances_annotations_val.json Metadata for experimental validation dataset and each image file along with segmentation annotations for each image file in COCO format (https://cocodataset.org/#home, very common for instance segmentation).
./val_binary_masks.zip Zip file containing binary masks file image (.png) files for the experimental validation dataset. These are very used for U-NET models (popular models in the microscopy field).

Field:

FileObject Name Extract dataType description
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Image ID $.images[*].id; $.annotations[*].image_id Integer The ID of the image.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Image filename $.images[*].file_name Text Filename of TEM image file.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Image width $.images[*].width Integer Width of TEM image file in pixels
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Image height $.images[*].height Integer Height of TEM image file in pixels
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Annotation ID $.annotations[*].id Integer The ID of the annotation.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Category ID $.annotations[*].category_id; $.categories[*].id Integer The ID of the annotation category.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Is Crowd $.annotations[*].iscrowd Integer A binary parameter used to indicate whether the segmentation annotation is an individual object instance (0) or a group or cluster of objects (1).
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Segmentation $.annotations[*].segmentation Integer; SegmentationMask; GeoShape Segmentation annotations of each TEM image file
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Bounding box $.annotations[*].bbox BoundingBox The bounding box around annotated object[s].
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Area $.annotations[*].area Float The area of the segmented area.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Bounding box mode $.annotations[*].bbox_mode Integer The ID of the bounding box mode.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Category name $.categories[*].name Text The name of the annotation category.
./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json Supercategory name $.categories[*].supercategory Text The name of the supercategory of the segmentation annotation category.

 

Files

croissant_metadata.json

Files (1.3 GB)

Name Size Download all
md5:88877d9cc890f9d9fbc2bddb53171926
27.3 kB Preview Download
md5:3c81f816417ae979b042b11b04742653
925.6 MB Preview Download
md5:0f2fedc6cd541e5715d5600f145fa41f
29.3 MB Preview Download
md5:43eb21f2e44bbcc7116e5fb83edf72bc
295.7 MB Preview Download
md5:50b2ab3d9a51a5eb6df155917c26271f
1.1 MB Preview Download
md5:6ca526372f8463bd41b86c26256fc276
961.5 kB Preview Download

Additional details

Related works

Is supplemented by
Journal article: 10.1038/s41467-020-16501-4 (DOI)

Funding

UK Research and Innovation
Provision of ‘AI ready’ data: prototyping data pipelines and repositories UKRI2697

Domain Specific Metadata

 
Property Value
Data Collection Experimental TEM images of gold nanoparticles from Professor Lee Cronin, University of Glasgow (https://doi.org/10.1038/s41467-020-16501-4) are included in data set for validation. A synthetic dataset has been generated using the TEMPOS software, to look visually similar to this corresponding experimental images and to be used for training and testing. TEMPOS generates synthetic TEM images by procedurally creating randomized nanoparticle scenes and applying image transformations to mimic imaging effects, while simultaneously producing pixel-perfect annotations from the known ground-truth geometry.
Data Collection Type Synthetic
Experimental
Data Collection Missing Data Not applicable
Data Collection Raw Data Transmission Electron Microscopy (TEM) image files (.png format)
Data Annotation Protocol Annotations are segmentation results. For experimental validation dataset, manual annotations have been performed.
Data Annotation Platform See machineAnnotationTools for details
Data Annotation Analysis Not applicable
Annotator Demographics Single annotator - demographics not relevant
Machine Annotation Tools For experimental validation dataset, the VIA annotator was used to make the segmentation annotations https://www.robots.ox.ac.uk/~vgg/software/via/).
Synthetic datasets have been annotated with MASKRCNN.
Annotations Per Item Multiple segmentation annotations per TEM image (depending on number of particles)
Data Preprocessing Protocol Not applicable
Data Manipulation Protocol Split of TEM images into training and test sets was random (with overall percentage split set to 80:20).
Data Imputation Protocol Not applicable
Data Use Cases Training, test and validation datasets for segmentation using machine learning models. Initial segmentation has been done using MASKRCNN but other machine learning models can be compared to this.
Data Biases Not applicable
Personal Sensitive Information No personal or sensitive information is included in the data.
Data Social Impact Not applicable
Data Limitations Not applicable
Data Release Maintenance Plan The data are being released as a one off with no immediate plans for revisions.