Readme

About

Name: chili100k_strat: Dataset to train or fine-tune CrystaLLM-pi for the targeted generation of experimental materials conditioned on XRD profiles Description: The dataset contains experimentally determined crystal structures sourced from Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning as described in CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning, a curated and filtered subset of the Crystallography Open Database COD. The structural data underwent text augmentation as per the pre-processing pipeline in CrystaLLM-pi. Each structure was labelled with its theoretical X-ray diffraction (XRD) pattern. The complete dataset, published on Hugging Face (https://huggingface.co/datasets/c-bone/chili100k_strat), can be used to train or fine-tune CrystaLLM-pi for the targeted generation of materials conditioned on XRD profiles. Type: Dataset Date published: 2026-04-24T00:00:00Z Version: 1.0.0 License: None Keywords: 1K - 10K, parquet, Text, Datasets, pandas, Croissant, Polars, US Region: US, CrystaLLM-pi, Crystallography Open Database (COD), CIF, condition vector, crystal Main entity: train-00000-of-00001.parquet About: None

People and Organisations

Creator

Main Resource

  • train-00000-of-00001.parquet
    • Type: File
    • Description: Training subset of dataset (11091 records)
    • Encoding Format: application/x-parquet

File Structure

| Location | File | Format | Description | |---|---|---|---| | /https:/huggingface.co/datasets/c-bone/chili100k_strat/resolve/main/data/ | test-00000-of-00001.parquet?download=true | application/x-parquet | Test subset of dataset (1500 records) | | /https:/huggingface.co/datasets/c-bone/chili100k_strat/resolve/main/data/ | train-00000-of-00001.parquet?download=true | application/x-parquet | Training subset of dataset (11091 records) | | /https:/huggingface.co/datasets/c-bone/chili100k_strat/resolve/main/data/ | validation-00000-of-00001.parquet?download=true | application/x-parquet | Validation subset of dataset (1500 records) |

Metadata Notes

  • JSON-LD context: https://w3id.org/ro/crate/1.1/context, {"croissant": "http://mlcommons.org/croissant/", "rai": "http://mlcommons.org/croissant/RAI/", "dct": "http://purl.org/dc/terms/"}
  • Conforms to: https://w3id.org/ro/crate/1.1
  • Root entity type: Dataset
  • Entities described: 6
  • Generated by: This README was generated automatically from the RO-Crate JSON metadata loaded in the application.