Published April 23, 2026
| Version Version 1.0
Dataset
Open
A PChProp (v.1) export of an AI-ready dataset of 3973 compounds with both melting point and boiling point data
- 1. University of Southampton
Description
These two files contain the exported melting points and boiling point records from Physical Chemistry Properties Data Sets (PChProp) version 1 provided by the Physical Science Data Infrastructure project. MP_BP_records.csv contains the data points and limited molecular identifiers. MP_BP_compounds.csv contains futher molecular infromation about the compounds.
To support AI use of these files the croissant.json meta data file conforms to croissant version 1.1 schema. This dataset can be downloaded via the "Download all" link to produce a packaged Research Object Crate (RO-Crate) zip file which also contains a ro-crate-metadata.json and a README.md file which describe it.
FileObjects:
| contentUrl | description |
| MP_BP_records.csv | This file contains the melting point and boiling point information for the compounds within PChProp version 1 that have both boiling point and melting |
| MP_BP_compounds.csv | Records extracted from the CSV file, with their schema. |
Fields:
| FileObject | Name | Extract | dataType | description |
| MP_BP_records.csv | PSDI | PSDI | Text | Stable identifier for the entry in PChProp. Use as the primary key for joining, deduplication, and dataset indexing |
| MP_BP_records.csv | InChi | InChi | Text | IUPAC International Chemical Identifier for the structure. Recommended canonical identity string for deduplication and joining across datasets. Use InChIKey for indexing |
| MP_BP_records.csv | Melting Point (C) | Melting Point (C) | Float | Temperature at which a pure substance transitions between solid and liquid at the specified pressure and (if relevant) composition. Values are condition-dependent; do not treat multiple values as duplicates without checking conditions and measurement basis |
| MP_BP_records.csv | Boiling Point (C) | Boiling Point (C) | Float | Temperature at which a liquid’s vapour pressure equals the specified external pressure. Strongly pressure-dependent; pressure should be captured for comparability |
| MP_BP_records.csv | Source | Source | Text | The originating data file for this data |
| MP_BP_records.csv | Citation | Citation | Text | The citation text provided by the originating paper or location for that data point |
| MP_BP_records.csv | Master Version | Master Version | Text | Version number of the originating data collection. This is for internal use to understand the upload history of the data into the data collection |
| MP_BP_compounds.csv | PSDI_ID | PSDI_ID | Text | Stable identifier for the entry in PChProp. Use as the primary key for joining, deduplication, and dataset indexing. |
| MP_BP_compounds.csv | Canonical name | Canonical name | Text | The canonical name of the compounds. This col is empty as this data was not exported from PChProp |
| MP_BP_compounds.csv | InChi | InChi | Text | IUPAC International Chemical Identifier for the structure. Recommended canonical identity string for deduplication and joining across datasets. Use InChIKey for indexing |
| MP_BP_compounds.csv | InChiKey | InChiKey | Text | Hashed InChI identifier. Useful for indexing/joining. Not reversible; keep InChI if you need structure reconstruction |
| MP_BP_compounds.csv | SMILES | SMILES | Text | SMILES string representing the structure. May not be canonical unless your pipeline enforces canonicalisation; treat as a representation, not guaranteed unique identity |
| MP_BP_compounds.csv | Tautomers | Tautomers | Integer | Number of enumerated tautomers under the calculation method used. Counts depend on enumeration rules; calcualted by RDKit |
| MP_BP_compounds.csv | Isomers | Isomers | Integer | Number of enumerated isomers under the calculation method used. Interpretation depends on whether stereochemistry/tautomerism is included; calcualted by RDKit |
| MP_BP_compounds.csv | n_MeltingPoint | n_MeltingPoint | Integer | Number of melting point entries for that compound within PChProp |
| MP_BP_compounds.csv | n_BoilingPoint | n_BoilingPoint | Integer | Number of boiling point entries for that compound within PChProp |
| MP_BP_compounds.csv | n_HLC | n_HLC | Integer | Number of henry's law constant entries for that compound within PChProp |
| MP_BP_compounds.csv | n_LogS | n_LogS | Integer | Number of solubility entries for that compound within PChProp |
| MP_BP_compounds.csv | n_Miscibility | n_Miscibility | Integer | Number of miscibility entries for that compound within PChProp |
| MP_BP_compounds.csv | Molecular Weight | Molecular Weight | Float | Molecular weight of the compound |
| MP_BP_compounds.csv | CLogP | CLogP | Float | Calculated octanol/water partition coefficient on a log10 scale. Calculated by RDKit |
| MP_BP_compounds.csv | Heavy Atom Count | Heavy Atom Count | Integer | Count of non-hydrogen atoms in the structure. Deterministic given a defined structure representation |
| MP_BP_compounds.csv | Hydrogen Bond Acceptors | Hydrogen Bond Acceptors | Integer | Number of hydrogen bond acceptor sites as defined by RDKit |
| MP_BP_compounds.csv | Hydrogen Bond Donors | Hydrogen Bond Donors | Integer | Number of hydrogen bond donor sites as defined by RDKit |
| MP_BP_compounds.csv | Rotatable Bonds | Rotatable Bonds | Integer | Number of rotatable bonds as calculated by RDKit |
| MP_BP_compounds.csv | Rings | Rings | Integer | Total ring count under the ring perception model used as calcualted by RDKit |
| MP_BP_compounds.csv | Hetero Aromatic Rings | Hetero Aromatic Rings | Integer | Count of aromatic rings containing heteroatoms under the method’s aromaticity model |
| MP_BP_compounds.csv | Aromatic Rings | Aromatic Rings | Integer | Count of aromatic rings under the method’s aromaticity model |
| MP_BP_compounds.csv | Topological Polar Surface Area | Topological Polar Surface Area | Float | Topological polar surface area in square ångströms |
| MP_BP_compounds.csv | Quantitative Estimation of Drug-likeness | Quantitative Estimation of Drug-likeness | Float | Quantitative estimation of drug-likeness score |
Files
MP_BP_records.csv
Files
(3.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:951c2c834c7aa8082a1586afaa97fd28
|
18.5 kB | Preview Download |
|
md5:a9265e5ca40817be93eaea140b821e26
|
722.6 kB | Preview Download |
|
md5:81bfac74230092f68e75fe3e356a2af2
|
2.4 MB | Preview Download |
|
md5:d286206bf5b7adf0f36866143a85522e
|
2.3 kB | Preview Download |
|
md5:5c0178cf5ebb2c1c2965dff61379b443
|
3.0 kB | Preview Download |
Additional details
Funding
Dates
- Created
-
2025-10-20
- Issued
-
2026-04-23
Domain Specific Metadata
| Property | Value |
|---|---|
| Data Collection | Example data exported from Physical Chemistry Properties Data Sets (PChProp) (https://resources.psdi.ac.uk/data/44160b2f-5938-442e-9f0a-652eb55d1c2b) |
| Data Collection Type |
Secondary Data Analysis
|
References
- Jeremy G. Frey, Samantha Pearman-Kanza, Joshua Cheung, Joanna Grundy and Matthew Partridge. Physical Chemistry Properties Data Collection. Online. 20 October 2024