Name: chili100k_strat: Dataset to train or fine-tune CrystaLLM-pi for the targeted generation of experimental materials conditioned on XRD profiles Description: The dataset contains experimentally determined crystal structures sourced from Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning as described in CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning, a curated and filtered subset of the Crystallography Open Database COD. The structural data underwent text augmentation as per the pre-processing pipeline in CrystaLLM-pi. Each structure was labelled with its theoretical X-ray diffraction (XRD) pattern. The complete dataset, published on Hugging Face (https://huggingface.co/datasets/c-bone/chili100k_strat), can be used to train or fine-tune CrystaLLM-pi for the targeted generation of materials conditioned on XRD profiles. Type: Dataset Date published: 2026-04-24T00:00:00Z Version: 1.0.0 License: None Keywords: 1K - 10K, parquet, Text, Datasets, pandas, Croissant, Polars, US Region: US, CrystaLLM-pi, Crystallography Open Database (COD), CIF, condition vector, crystal Main entity: train-00000-of-00001.parquet About: None
| Location | File | Format | Description | |---|---|---|---| | /https:/huggingface.co/datasets/c-bone/chili100k_strat/resolve/main/data/ | test-00000-of-00001.parquet?download=true | application/x-parquet | Test subset of dataset (1500 records) | | /https:/huggingface.co/datasets/c-bone/chili100k_strat/resolve/main/data/ | train-00000-of-00001.parquet?download=true | application/x-parquet | Training subset of dataset (11091 records) | | /https:/huggingface.co/datasets/c-bone/chili100k_strat/resolve/main/data/ | validation-00000-of-00001.parquet?download=true | application/x-parquet | Validation subset of dataset (1500 records) |