Configuring an Ingest

(For CLI usage, see the CLI commands page.)

Koza is designed to process and transform existing data into a target csv/json/jsonl format.

This process is internally known as an ingest. Ingests are defined by:

  1. Source config yaml: Ingest configuration, including:
    • metadata, formats, required columns, any SSSOM files, etc.
  2. Map config yaml: (Optional) configures creation of mapping dictionary
  3. Transform code: a Python script, with specific transform instructions