Published April 30, 2026
| Version v3
Dataset
Open
Single Crystal 3D ED Electron Diffraction Dataset (University of Southampton NCS/NEDF) for Sample Screening
Creators
Description
Aggregated dataset of 3D ED electron diffraction experiments to train a machine learning model to classify experiments as “good”/"bad”/"complex” quality which indicates whether a diffraction pattern would be likely to produce a reasonable crystal structure.
FileObjects:
| contentUrl | description |
| metadata.csv | Key parameters for each experiment obtained by initial processing of the electron diffraction patterns (as indicated by the 'processing_program' column) and then by using a custom script (based on https://github.com/robertbuecker/cap-tools/blob/main/generate_learning_set.py). |
| learning_set.zip | zip file containing the 3D diffraction image files for each experiment, arranged in a folder which indicates their grid and experiment number and with name indicated by the 'diff_img_tiff_filename' column of metadata.csv. All identifiers have been anonymised. |
Fields:
| FileObject | Name | Extract | dataType | description |
| metadata.csv | grid_name | grid_name | Text | anonymised name of grid |
| metadata.csv | experiment_name | experiment_name | Text | identifier for experiment |
| metadata.csv | collection_temperature | collection_temperature | Float | temperature of collection of electron diffraction experiments in Kelvin |
| metadata.csv | scan_range | scan_range | Float | The rotation range (in degrees) covered during the 3D ED experiment. |
| metadata.csv | detector_distance | detector_distance | Float | The virtual detector distance ('camera length') given in millimetre. The value is periodically calibrated on the instrument using an aluminium standard. |
| metadata.csv | indexation | indexation | Float | Percentage of successfully indexed reflections of the whole dataset into a consistent unit cell. This is based on the full 3D ED data collection (i.e. full scan range). |
| metadata.csv | diff_limit | diff_limit | Float | Diffraction limit of the full data collection. This is based on the full 3D ED data collection (i.e. full scan range). |
| metadata.csv | r_int | r_int | Float | Internal agreement R factor of the full data collection. This is based on the full 3D ED data collection (i.e. full scan range). |
| metadata.csv | collection_program | collection_program | Text | Name of program used to collect electron diffraction pattern |
| metadata.csv | processing_program | processing_program | Text | Name of program used to process electron diffraction patterns |
| metadata.csv | frames_collected | frames_collected | Integer | Number of electron diffraction pattern frames selected for further analysis (not relevant for this dataset) |
| metadata.csv | frame_conversion_program | frame_conversion_program | Text | Name of program used to convert electron diffraction pattern frames from .rodhypix format to .tiff |
| metadata.csv | diff_img_tiff_filename | diff_img_tiff_filename | Text | The diff_img_tiff file of an experiment (whose name is indicated by this column) is a still electron diffraction image prior to data collection of the 3D ED experiment. |
| metadata.csv | grain_img_tiff_filename | grain_img_tiff_filename | Text | The grain_img_tiff file of an experiment (whose name is indicated by this column) is a real space image of the particle the 3D ED experiment was performed on. |
| metadata.csv | frames_tiff_filenames | frames_tiff_filenames | Text | The frames_tiff_filenames files of an experiment (whose names are indicated by this column) are the names of particular electron diffraction images for further analysis |
| metadata.csv | 3D ED quality (calculated) | 3D ED quality (calculated) | Text | Classification of this experiment as either 'good'/'bad'/'complex' as calculated as described in dataAnnotationProtocol. Please note that these values gave poor (<50%) agreement with manual annotations '3D ED quality', so are not to be used directly in machine learning models. |
| metadata.csv | 3D ED quality | 3D ED quality | Text | Classification of this experiment as either 'good'/'bad'/'complex' as assigned by manual annotation. This is the main target of the machine learning model. |
Files
croissant_metadata.json
Files
(618.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a25f0c0817d489aea505e970efa03da8
|
25.3 kB | Preview Download |
|
md5:888a4e6e025df0a1dde2d2037c375f08
|
616.6 MB | Preview Download |
|
md5:1834a65451c5c893102db5d126004813
|
1.8 MB | Preview Download |
|
md5:bfd5a0867a3e779617277c74ac5b4538
|
2.5 kB | Preview Download |
|
md5:f4d0a33bb00771d28fc223dc41fe9ca4
|
2.7 kB | Preview Download |
Additional details
Funding
Domain Specific Metadata
| Property | Value |
|---|---|
| Data Collection | Collection site: University of Southampton, NCS/NEDF; Instrument: Rigaku XtaLAB Synergy-ED, electron diffractometer; Radiation source: LaB6; Accelerating voltage: 200 kV; Wavelength: 0.0251 Angstroms; Probe type: Parallel beam (SAED); Beam convergence: Parallel beam; Detector: Rigaku HyPix-ED, hybrid pixel array detector; Number of pixels in the image:775 x 385; Pixel size: 100 micrometres; Hardware binning: 1 |
| Data Collection Type |
Experiments
|
| Data Collection Missing Data | Not applicable |
| Data Collection Raw Data | Experiments are individual 3D ED data collections (in continuous rotation electron diffraction mode) of particles on a TEM grid. Each frame is an electron diffraction pattern obtained during such a 3D ED experiment. The diffraction pattern contained in the file labelled at diff_img_tiff is acquired prior to the experiment and is a still pattern (i.e. no rotation). Original diffraction image files are .rodhypix files. |
| Data Annotation Protocol | Categories assigned were 'good'/'bad'/'complex' as indicated in the '3D EM quality' column of metadata.csv. This classification indicates whether the 3D ED electron diffraction pattern from this sample is 'good' (good quality image, likely to lead to a successful structure determination), 'bad' (unlikely or impossible to obtain a crystal structure) or 'complex' (difficulties expected with crystal structure determination, will likely require extended data analysis time or even specialised data collection parameters). |
| Data Annotation Platform |
Not applicable
|
| Data Annotation Analysis |
In-house python script was written to make a graphical user interface which displays an image at random from this set along with its initially calculated annotation ('3D ED quality (calculated)', calculated by the method described in machineAnnotationTools) and allows this to be confirmed or revised (as the manual annotation '3D ED quality'). This was initially written to spot-check the calculated values but this analysis concluded that calculated values were not reliable enough (less than 50% were correct) and that only manually annotated values should be used.
|
| Annotator Demographics |
Single annotator - demographics not relevant.
|
| Machine Annotation Tools |
Step 1: The diff_limit and indexation parameters were extracted using a custom script (based on https://github.com/robertbuecker/cap-tools/blob/main/generate_learning_set.py). These parameters are originally obtained during on-the-fly processing during data collection.
Step 2: Python code applied cut-offs to the diff_limit and indexation parameters of each sample to derive the calculated annotation of 'good'/'bad'/'complex' for each. diff_limit indicates the crystallinity of the particle (whether it is diffracting well) ('diff_limit' less than 1 is 'good'; 'diff_limit' between 1 and 2 (inclusive) is 'complex'; and 'diff_limit' greater than or equal to 2 is 'bad' ). indexation indicates the agreeableness of the determined reflections being part of a singular crystal lattice (indexation greater than 90 is 'good'; indexation between 90 and 50 (inclusive) is 'complex'; indexation less than 50 is 'bad'). The worst quality score from these two contributions is taken as the overall quality score (so that if either of these is 'bad' then then the overall quality score is 'bad', or if not and either is 'complex' then the overall quality score is 'complex' but if not then it is 'good'. Step 3: this categorisation was reviewed and spot checked as described in dataAnnotationAnalysis. Note that machineAnnotation target values '3D ED quality (calculated)' were not used as the final target ,but only manually annotated values '3D ED quality' |
| Annotations Per Item | 1 annotation (classification) per dataset item (experiment) |
| Data Preprocessing Protocol |
Not applicable
|
| Data Manipulation Protocol | The software used for initial processing of the electron diffraction patterns is indicated by the 'processing_program' column of metadata.csv. Key parameters for each 3D ED experiment were extracted using a custom script (based on https://github.com/robertbuecker/cap-tools/blob/main/generate_learning_set.py) and stored in metadata.csv. The original electron diffraction .rodhypix files were converted into .tiff files using the software indicated by the 'frame_conversion_program' column of metadata.csv. |
| Data Imputation Protocol | Not applicable. |
| Data Use Cases |
training/test/validation set to train a model to predict the values in the '3D ED quality' column (with values 'good', 'bad' and 'complex') based on one input – the image specified by the 'diff_img_tiff_filename' column
|
| Data Biases |
Not applicable
|
| Personal Sensitive Information |
Not applicable
|
| Data Social Impact | Not applicable |
| Data Limitations |
Please note that indexation and diff-limit parameters are calculated from analysis of all 3D ED frames, not just initial frame.
|
| Data Release Maintenance Plan |
TBD (to be decided)
|