This community contains a collection of “AI Ready Datasets”.
Each record contains a dataset which can be useful for AI tools and systems. For some datasets these are made available in a general form for consumers to use for their own application where they can decide for themselves which columns to use for categorisation/annotation, and how to split it into test, training and validation subsets. However, some are examples of datasets that have been tailored to be specific to a particular machine learning task, and have annotations, and splits between test, training and validation entries explicitly indicated. All records comprise one or more datafiles with an accompanying croissant dataset description metadata json file. Croissant metadata helps loading ML datasets into different ML frameworks.
This collection is part of the “AI for Science” PSDI sub-project (“Provision of ‘AI ready’ data: prototyping data pipelines and repositories”, grant application APP84520, award UKRI2697, opportunity OPP1033:EPSRC AI for Science) and as such has been seeded with examples of “AI ready” data that is useful as training data for AI tools and systems.
Any feedback, suggestions or contributions can be provided to support@psdi.ac.uk. Specifically, feedback regarding the use this dataset by AI tools and systems would be particularly welcome.