Published April 30, 2026
| Version v2
Dataset
Open
Testing, Training and Validation Synthetic Dataset of Transmission Electron Microscopy (TEM) Images of Gold Nano-particles for Segmentation
Creators
Description
This is a dataset of Transmission Electron Microscopy (TEM) images of gold nano-particles and comprises synthetic data (generated using TEMPOS) mimicking experimental TEM images (from Professor Lee Cronin, University of Glasgow, https://doi.org/10.1038/s41467-020-16501-4) for segmentation. This data has been split into test and training sets. A validation set of experimental images with manual annotations has also been included. This croissant file was heavily inspired by https://github.com/mlcommons/croissant/tree/828034a45d5c536789c7f6311d4c4a68f7804129/datasets/1.0/coco2014
FileObjects:
| contentUrl | description |
| ./images.zip | zip file containing TEM png image files, grouped into test/train/val folders. |
| ./instances_annotations_train.json | Metadata for training dataset and each image file along with segmentation annotations for each image file in COCO format (https://cocodataset.org/#home, very common for instance segmentation). |
| ./instances_annotations_test.json | Metadata for test dataset and each image file along with segmentation annotations for each image file in COCO format (https://cocodataset.org/#home, very common for instance segmentation). |
| ./instances_annotations_val.json | Metadata for experimental validation dataset and each image file along with segmentation annotations for each image file in COCO format (https://cocodataset.org/#home, very common for instance segmentation). |
| ./val_binary_masks.zip | Zip file containing binary masks file image (.png) files for the experimental validation dataset. These are very used for U-NET models (popular models in the microscopy field). |
Field:
| FileObject | Name | Extract | dataType | description |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Image ID | $.images[*].id; $.annotations[*].image_id | Integer | The ID of the image. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Image filename | $.images[*].file_name | Text | Filename of TEM image file. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Image width | $.images[*].width | Integer | Width of TEM image file in pixels |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Image height | $.images[*].height | Integer | Height of TEM image file in pixels |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Annotation ID | $.annotations[*].id | Integer | The ID of the annotation. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Category ID | $.annotations[*].category_id; $.categories[*].id | Integer | The ID of the annotation category. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Is Crowd | $.annotations[*].iscrowd | Integer | A binary parameter used to indicate whether the segmentation annotation is an individual object instance (0) or a group or cluster of objects (1). |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Segmentation | $.annotations[*].segmentation | Integer; SegmentationMask; GeoShape | Segmentation annotations of each TEM image file |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Bounding box | $.annotations[*].bbox | BoundingBox | The bounding box around annotated object[s]. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Area | $.annotations[*].area | Float | The area of the segmented area. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Bounding box mode | $.annotations[*].bbox_mode | Integer | The ID of the bounding box mode. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Category name | $.categories[*].name | Text | The name of the annotation category. |
| ./instances_annotations_train.json; ./instances_annotations_test.json; ./instances_annotations_val.json | Supercategory name | $.categories[*].supercategory | Text | The name of the supercategory of the segmentation annotation category. |
Files
croissant_metadata.json
Files
(1.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:88877d9cc890f9d9fbc2bddb53171926
|
27.3 kB | Preview Download |
|
md5:3c81f816417ae979b042b11b04742653
|
925.6 MB | Preview Download |
|
md5:0f2fedc6cd541e5715d5600f145fa41f
|
29.3 MB | Preview Download |
|
md5:43eb21f2e44bbcc7116e5fb83edf72bc
|
295.7 MB | Preview Download |
|
md5:50b2ab3d9a51a5eb6df155917c26271f
|
1.1 MB | Preview Download |
|
md5:9fb4b82df677bb7f09f7169dcee65a4e
|
3.2 kB | Preview Download |
|
md5:c12d5336fb372e3ff2b71689af0bdf90
|
4.2 kB | Preview Download |
|
md5:6ca526372f8463bd41b86c26256fc276
|
961.5 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Journal article: 10.1038/s41467-020-16501-4 (DOI)
Funding
Domain Specific Metadata
| Property | Value |
|---|---|
| Data Collection | Experimental TEM images of gold nanoparticles from Professor Lee Cronin, University of Glasgow (https://doi.org/10.1038/s41467-020-16501-4) are included in data set for validation. A synthetic dataset has been generated using the TEMPOS software, to look visually similar to this corresponding experimental images and to be used for training and testing. TEMPOS generates synthetic TEM images by procedurally creating randomized nanoparticle scenes and applying image transformations to mimic imaging effects, while simultaneously producing pixel-perfect annotations from the known ground-truth geometry. |
| Data Collection Type |
Synthetic
Experimental |
| Data Collection Missing Data | Not applicable |
| Data Collection Raw Data | Transmission Electron Microscopy (TEM) image files (.png format) |
| Data Annotation Protocol | Annotations are segmentation results. For experimental validation dataset, manual annotations have been performed. |
| Data Annotation Platform |
See machineAnnotationTools for details
|
| Data Annotation Analysis |
Not applicable
|
| Annotator Demographics |
Single annotator - demographics not relevant
|
| Machine Annotation Tools |
For experimental validation dataset, the VIA annotator was used to make the segmentation annotations https://www.robots.ox.ac.uk/~vgg/software/via/).
Synthetic datasets have been annotated with MASKRCNN. |
| Annotations Per Item | Multiple segmentation annotations per TEM image (depending on number of particles) |
| Data Preprocessing Protocol |
Not applicable
|
| Data Manipulation Protocol | Split of TEM images into training and test sets was random (with overall percentage split set to 80:20). |
| Data Imputation Protocol | Not applicable |
| Data Use Cases |
Training, test and validation datasets for segmentation using machine learning models. Initial segmentation has been done using MASKRCNN but other machine learning models can be compared to this.
|
| Data Biases |
Not applicable
|
| Personal Sensitive Information |
No personal or sensitive information is included in the data.
|
| Data Social Impact | Not applicable |
| Data Limitations |
Not applicable
|
| Data Release Maintenance Plan |
The data are being released as a one off with no immediate plans for revisions.
|