Published April 23, 2026 | Version Version 1.0
Dataset Open

A PChProp (v.1) export of an AI-ready dataset of 3973 compounds with both melting point and boiling point data

  • 1. University of Southampton

Description

These two files contain the exported melting points and boiling point records from Physical Chemistry Properties Data Sets (PChProp) version 1 provided by the Physical Science Data Infrastructure project. MP_BP_records.csv contains the data points and limited molecular identifiers. MP_BP_compounds.csv contains futher molecular infromation about the compounds.

To support AI use of these files the croissant.json meta data file conforms to croissant version 1.1 schema. This dataset can be downloaded via the "Download all" link to produce a packaged Research Object Crate (RO-Crate) zip file which also contains a ro-crate-metadata.json and a README.md file which describe it.

FileObjects:

contentUrl description
MP_BP_records.csv This file contains the melting point and boiling point information for the compounds within PChProp version 1 that have both boiling point and melting
MP_BP_compounds.csv Records extracted from the CSV file, with their schema.

Fields:

FileObject Name Extract dataType description
MP_BP_records.csv PSDI PSDI Text Stable identifier for the entry in PChProp. Use as the primary key for joining, deduplication, and dataset indexing
MP_BP_records.csv InChi InChi Text IUPAC International Chemical Identifier for the structure. Recommended canonical identity string for deduplication and joining across datasets. Use InChIKey for indexing
MP_BP_records.csv Melting Point (C) Melting Point (C) Float Temperature at which a pure substance transitions between solid and liquid at the specified pressure and (if relevant) composition. Values are condition-dependent; do not treat multiple values as duplicates without checking conditions and measurement basis
MP_BP_records.csv Boiling Point (C) Boiling Point (C) Float Temperature at which a liquid’s vapour pressure equals the specified external pressure. Strongly pressure-dependent; pressure should be captured for comparability
MP_BP_records.csv Source Source Text The originating data file for this data
MP_BP_records.csv Citation Citation Text The citation text provided by the originating paper or location for that data point
MP_BP_records.csv Master Version Master Version Text Version number of the originating data collection. This is for internal use to understand the upload history of the data into the data collection
MP_BP_compounds.csv PSDI_ID PSDI_ID Text Stable identifier for the entry in PChProp. Use as the primary key for joining, deduplication, and dataset indexing.
MP_BP_compounds.csv Canonical name Canonical name Text The canonical name of the compounds. This col is empty as this data was not exported from PChProp
MP_BP_compounds.csv InChi InChi Text IUPAC International Chemical Identifier for the structure. Recommended canonical identity string for deduplication and joining across datasets. Use InChIKey for indexing
MP_BP_compounds.csv InChiKey InChiKey Text Hashed InChI identifier. Useful for indexing/joining. Not reversible; keep InChI if you need structure reconstruction
MP_BP_compounds.csv SMILES SMILES Text SMILES string representing the structure. May not be canonical unless your pipeline enforces canonicalisation; treat as a representation, not guaranteed unique identity
MP_BP_compounds.csv Tautomers Tautomers Integer Number of enumerated tautomers under the calculation method used. Counts depend on enumeration rules; calcualted by RDKit
MP_BP_compounds.csv Isomers Isomers Integer Number of enumerated isomers under the calculation method used. Interpretation depends on whether stereochemistry/tautomerism is included; calcualted by RDKit
MP_BP_compounds.csv n_MeltingPoint n_MeltingPoint Integer Number of melting point entries for that compound within PChProp
MP_BP_compounds.csv n_BoilingPoint n_BoilingPoint Integer Number of boiling point entries for that compound within PChProp
MP_BP_compounds.csv n_HLC n_HLC Integer Number of henry's law constant entries for that compound within PChProp
MP_BP_compounds.csv n_LogS n_LogS Integer Number of solubility entries for that compound within PChProp
MP_BP_compounds.csv n_Miscibility n_Miscibility Integer Number of miscibility entries for that compound within PChProp
MP_BP_compounds.csv Molecular Weight Molecular Weight Float Molecular weight of the compound
MP_BP_compounds.csv CLogP CLogP Float Calculated octanol/water partition coefficient on a log10 scale. Calculated by RDKit
MP_BP_compounds.csv Heavy Atom Count Heavy Atom Count Integer Count of non-hydrogen atoms in the structure. Deterministic given a defined structure representation
MP_BP_compounds.csv Hydrogen Bond Acceptors Hydrogen Bond Acceptors Integer Number of hydrogen bond acceptor sites as defined by RDKit
MP_BP_compounds.csv Hydrogen Bond Donors Hydrogen Bond Donors Integer Number of hydrogen bond donor sites as defined by RDKit
MP_BP_compounds.csv Rotatable Bonds Rotatable Bonds Integer Number of rotatable bonds as calculated by RDKit
MP_BP_compounds.csv Rings Rings Integer Total ring count under the ring perception model used as calcualted by RDKit
MP_BP_compounds.csv Hetero Aromatic Rings Hetero Aromatic Rings Integer Count of aromatic rings containing heteroatoms under the method’s aromaticity model
MP_BP_compounds.csv Aromatic Rings Aromatic Rings Integer Count of aromatic rings under the method’s aromaticity model
MP_BP_compounds.csv Topological Polar Surface Area Topological Polar Surface Area Float Topological polar surface area in square ångströms
MP_BP_compounds.csv Quantitative Estimation of Drug-likeness Quantitative Estimation of Drug-likeness Float Quantitative estimation of drug-likeness score

Files

MP_BP_records.csv

Files (3.1 MB)

Name Size Download all
md5:951c2c834c7aa8082a1586afaa97fd28
18.5 kB Preview Download
md5:a9265e5ca40817be93eaea140b821e26
722.6 kB Preview Download
md5:81bfac74230092f68e75fe3e356a2af2
2.4 MB Preview Download
md5:d286206bf5b7adf0f36866143a85522e
2.3 kB Preview Download
md5:5c0178cf5ebb2c1c2965dff61379b443
3.0 kB Preview Download

Additional details

Funding

UK Research and Innovation
Provision of ‘AI ready’ data: prototyping data pipelines and repositories UKRI2697

Dates

Created
2025-10-20
Issued
2026-04-23

Domain Specific Metadata

 
Property Value
Data Collection Example data exported from Physical Chemistry Properties Data Sets (PChProp) (https://resources.psdi.ac.uk/data/44160b2f-5938-442e-9f0a-652eb55d1c2b)
Data Collection Type Secondary Data Analysis

References

  • Jeremy G. Frey, Samantha Pearman-Kanza, Joshua Cheung, Joanna Grundy and Matthew Partridge. Physical Chemistry Properties Data Collection. Online. 20 October 2024