As covered in more detail on the About page, the Data to Knowledge repository, part of PSDI Community Data Collections, is a centralised data facility for the purpose of storing data to be used or generate by machine learning models in modelling materials and molecular systems. This currently includes simulations data, training data and models themselves. The repository will accept submissions with minimal metadata description and datasets alongside more advanced metadata descriptions that may include setup provenance data be it machine readable – eg aiida generated or human generated. Data to Knowledge repository exists for the benefits of the community, which includes CCP5, CCP9, MCC and other modelling communities and is operated on their behalf by the partner organizations within PSDI. Data to Knowledge repository operates a community led review process that will screen data submissions for remit and quality of contributions. Membership and therefore submissions will be from UK based researchers that fit within the above communities' remit areas, performing materials and molecular simulations using machine learning techniques. This will be checked by community appointed reviewers upon review of contributions.
Deposit:
- Depositors are expected to comply with the PSDI Community Data Collections Policy.
- Depositories are required to accept and comply with the PSDI deposit conditions.
- Any file format is accepted
- Multiple files can be zipped before deposit
- Acceptance into the community is determined by the Data to Knowledge Community Administrators
- Descriptive metadata to accepted standards for discovery and description, must be assigned to each dataset.
- A Creative Commons CC0 licence is the default licence for deposits, however depositors are expected to pay careful consideration to this and must always ensure that an appropriate licence is selected.
- There is an option to set an embargo period for datasets.
- There are no charges to individual researchers for deposit or storage of datasets.
- Before publication in Data to Knowledge, datasets and metadata will be reviewed for accuracy by appointed reviewers. The deposit will also be inspected to ensure it is within the scientific scope of the repository as outlined above.
- Datasets should be no larger than 100GB.
All items in this policy are subject to review as the service matures and may therefore change. Users should check here for updates and the latest version
Data to Knowledge Data Curation Policy version 1.0 March 2025