As described in more detail in the About page, this community contains a collection of “AI Ready Datasets”. All datasets need to include a croissant dataset description metadata json file which describe the data files and their fields in more detail, record their provenance, and include additional fields of specific relevance to its use in AI and description of annotation processes. These files can (but don’t have to) indicate splits into training, test, and validation sets (either by column values or separate files). 

Deposit: 

  • Depositors are expected to comply with the PSDI Community Data Collections Policy. 

  • Depositors are required to accept and comply with the “AI Ready Datasets” deposit conditions. 

  • Acceptance into the community is determined by the “AI Ready Datasets” Community Administrators  

  • Descriptive metadata to accepted standards for discovery and description, must be assigned to each dataset. 

  • Specifically, all records in this collection should contain a croissant dataset description metadata json file 

  • A Creative Commons CC0 licence is the default licence for deposits, however depositors are expected to pay careful consideration to this and must always ensure that an appropriate licence is selected. 

  • There is an option to set an embargo period for datasets. 

  • There are no charges to individual researchers for deposit or storage of datasets. 

  • Before publication, datasets and metadata will be reviewed for accuracy by appointed reviewers.  

  • Datasets should be no larger than 100GB