Published April 30, 2026 | Version v3
Dataset Open

Single Crystal 3D ED Electron Diffraction Dataset (University of Southampton NCS/NEDF) for Sample Screening

Description

Aggregated dataset of 3D ED electron diffraction experiments to train a machine learning model to classify experiments as “good”/"bad”/"complex” quality which indicates whether a diffraction pattern would be likely to produce a reasonable crystal structure.

FileObjects:

contentUrl description
metadata.csv Key parameters for each experiment obtained by initial processing of the electron diffraction patterns (as indicated by the 'processing_program' column) and then by using a custom script (based on https://github.com/robertbuecker/cap-tools/blob/main/generate_learning_set.py).
learning_set.zip zip file containing the 3D diffraction image files for each experiment, arranged in a folder which indicates their grid and experiment number and with name indicated by the 'diff_img_tiff_filename' column of metadata.csv. All identifiers have been anonymised.

Fields:

FileObject Name Extract dataType description
metadata.csv grid_name grid_name Text anonymised name of grid
metadata.csv experiment_name experiment_name Text identifier for experiment
metadata.csv collection_temperature collection_temperature Float temperature of collection of electron diffraction experiments in Kelvin
metadata.csv scan_range scan_range Float The rotation range (in degrees) covered during the 3D ED experiment.
metadata.csv detector_distance detector_distance Float The virtual detector distance ('camera length') given in millimetre. The value is periodically calibrated on the instrument using an aluminium standard.
metadata.csv indexation indexation Float Percentage of successfully indexed reflections of the whole dataset into a consistent unit cell. This is based on the full 3D ED data collection (i.e. full scan range).
metadata.csv diff_limit diff_limit Float Diffraction limit of the full data collection. This is based on the full 3D ED data collection (i.e. full scan range).
metadata.csv r_int r_int Float Internal agreement R factor of the full data collection. This is based on the full 3D ED data collection (i.e. full scan range).
metadata.csv collection_program collection_program Text Name of program used to collect electron diffraction pattern
metadata.csv processing_program processing_program Text Name of program used to process electron diffraction patterns
metadata.csv frames_collected frames_collected Integer Number of electron diffraction pattern frames selected for further analysis (not relevant for this dataset)
metadata.csv frame_conversion_program frame_conversion_program Text Name of program used to convert electron diffraction pattern frames from .rodhypix format to .tiff
metadata.csv diff_img_tiff_filename diff_img_tiff_filename Text The diff_img_tiff file of an experiment (whose name is indicated by this column) is a still electron diffraction image prior to data collection of the 3D ED experiment.
metadata.csv grain_img_tiff_filename grain_img_tiff_filename Text The grain_img_tiff file of an experiment (whose name is indicated by this column) is a real space image of the particle the 3D ED experiment was performed on.
metadata.csv frames_tiff_filenames frames_tiff_filenames Text The frames_tiff_filenames files of an experiment (whose names are indicated by this column) are the names of particular electron diffraction images for further analysis
metadata.csv 3D ED quality (calculated) 3D ED quality (calculated) Text Classification of this experiment as either 'good'/'bad'/'complex' as calculated as described in dataAnnotationProtocol. Please note that these values gave poor (<50%) agreement with manual annotations '3D ED quality', so are not to be used directly in machine learning models.
metadata.csv 3D ED quality 3D ED quality Text Classification of this experiment as either 'good'/'bad'/'complex' as assigned by manual annotation. This is the main target of the machine learning model.

 

Files

croissant_metadata.json

Files (618.4 MB)

Name Size Download all
md5:a25f0c0817d489aea505e970efa03da8
25.3 kB Preview Download
md5:888a4e6e025df0a1dde2d2037c375f08
616.6 MB Preview Download
md5:1834a65451c5c893102db5d126004813
1.8 MB Preview Download
md5:bfd5a0867a3e779617277c74ac5b4538
2.5 kB Preview Download
md5:f4d0a33bb00771d28fc223dc41fe9ca4
2.7 kB Preview Download

Additional details

Funding

UK Research and Innovation
Provision of ‘AI ready’ data: prototyping data pipelines and repositories UKRI2697
UK Research and Innovation
A National Electron Diffraction Facility for Nanomaterial Structural Studies EP/X014444/1

Domain Specific Metadata

 
Property Value
Data Collection Collection site: University of Southampton, NCS/NEDF; Instrument: Rigaku XtaLAB Synergy-ED, electron diffractometer; Radiation source: LaB6; Accelerating voltage: 200 kV; Wavelength: 0.0251 Angstroms; Probe type: Parallel beam (SAED); Beam convergence: Parallel beam; Detector: Rigaku HyPix-ED, hybrid pixel array detector; Number of pixels in the image:775 x 385; Pixel size: 100 micrometres; Hardware binning: 1
Data Collection Type Experiments
Data Collection Missing Data Not applicable
Data Collection Raw Data Experiments are individual 3D ED data collections (in continuous rotation electron diffraction mode) of particles on a TEM grid. Each frame is an electron diffraction pattern obtained during such a 3D ED experiment. The diffraction pattern contained in the file labelled at diff_img_tiff is acquired prior to the experiment and is a still pattern (i.e. no rotation). Original diffraction image files are .rodhypix files.
Data Annotation Protocol Categories assigned were 'good'/'bad'/'complex' as indicated in the '3D EM quality' column of metadata.csv. This classification indicates whether the 3D ED electron diffraction pattern from this sample is 'good' (good quality image, likely to lead to a successful structure determination), 'bad' (unlikely or impossible to obtain a crystal structure) or 'complex' (difficulties expected with crystal structure determination, will likely require extended data analysis time or even specialised data collection parameters).
Data Annotation Platform Not applicable
Data Annotation Analysis In-house python script was written to make a graphical user interface which displays an image at random from this set along with its initially calculated annotation ('3D ED quality (calculated)', calculated by the method described in machineAnnotationTools) and allows this to be confirmed or revised (as the manual annotation '3D ED quality'). This was initially written to spot-check the calculated values but this analysis concluded that calculated values were not reliable enough (less than 50% were correct) and that only manually annotated values should be used.
Annotator Demographics Single annotator - demographics not relevant.
Machine Annotation Tools Step 1: The diff_limit and indexation parameters were extracted using a custom script (based on https://github.com/robertbuecker/cap-tools/blob/main/generate_learning_set.py). These parameters are originally obtained during on-the-fly processing during data collection.
Step 2: Python code applied cut-offs to the diff_limit and indexation parameters of each sample to derive the calculated annotation of 'good'/'bad'/'complex' for each. diff_limit indicates the crystallinity of the particle (whether it is diffracting well) ('diff_limit' less than 1 is 'good'; 'diff_limit' between 1 and 2 (inclusive) is 'complex'; and 'diff_limit' greater than or equal to 2 is 'bad' ). indexation indicates the agreeableness of the determined reflections being part of a singular crystal lattice (indexation greater than 90 is 'good'; indexation between 90 and 50 (inclusive) is 'complex'; indexation less than 50 is 'bad'). The worst quality score from these two contributions is taken as the overall quality score (so that if either of these is 'bad' then then the overall quality score is 'bad', or if not and either is 'complex' then the overall quality score is 'complex' but if not then it is 'good'.
Step 3: this categorisation was reviewed and spot checked as described in dataAnnotationAnalysis.
Note that machineAnnotation target values '3D ED quality (calculated)' were not used as the final target ,but only manually annotated values '3D ED quality'
Annotations Per Item 1 annotation (classification) per dataset item (experiment)
Data Preprocessing Protocol Not applicable
Data Manipulation Protocol The software used for initial processing of the electron diffraction patterns is indicated by the 'processing_program' column of metadata.csv. Key parameters for each 3D ED experiment were extracted using a custom script (based on https://github.com/robertbuecker/cap-tools/blob/main/generate_learning_set.py) and stored in metadata.csv. The original electron diffraction .rodhypix files were converted into .tiff files using the software indicated by the 'frame_conversion_program' column of metadata.csv.
Data Imputation Protocol Not applicable.
Data Use Cases training/test/validation set to train a model to predict the values in the '3D ED quality' column (with values 'good', 'bad' and 'complex') based on one input – the image specified by the 'diff_img_tiff_filename' column
Data Biases Not applicable
Personal Sensitive Information Not applicable
Data Social Impact Not applicable
Data Limitations Please note that indexation and diff-limit parameters are calculated from analysis of all 3D ED frames, not just initial frame.
Data Release Maintenance Plan TBD (to be decided)