Skip to content

Koza

A data transformation framework in Python

Overview

Koza is a data transformation framework which allows you to write semi-declarative "ingests"

  • Transform csv, json, yaml, jsonl, or xml source data, converting them to a target csv, json, or jsonl format based on your dataclass model.
  • Koza also can output data in the KGX format
  • Write data transforms in semi-declarative Python
  • Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
  • Create or import mapping files to be used in ingests (eg. id mapping, type mappings)
  • Create and use translation tables to map between source and target vocabularies

Installation

Koza is available on PyPi and can be installed via pip:

pip install koza

Usage

See the Ingests page for information on how to configure ingests for koza to use.

Koza can be used as a Python library, or via the command line.
CLI commands are available for validating and transforming data.
See the Module page for information on using Koza as a library.

Koza also includes some examples to help you get started (see koza/examples).

Basic Examples

Validate

Give Koza a local or remote csv file, and get some basic information (headers, number of rows)

koza validate \
  --file ./examples/data/string.tsv \
  --delimiter ' '

Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl

koza validate \
  --file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
  --format jsonl
koza validate \
  --file ./examples/data/ddpheno.json.gz \
  --format json

Transform

Try one of Koza's example ingests:

koza transform \
  --source examples/string-declarative/protein-links-detailed.yaml \
  --global-table examples/translation_table.yaml

Note: Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory
(these files can also simply be named transform.yaml and transform.py, as is default):

.
├── ...
│   ├── some_source
│   │   ├── your_ingest.yaml
│   │   └── your_ingest.py
│   └── some_translation_table.yaml
└── ...