Koza
A data transformation framework in Python
Overview
Koza is a data transformation framework which allows you to write semi-declarative "ingests"
- Transform csv, json, yaml, jsonl, or xml source data, converting them to a target csv, json, or jsonl format based on your dataclass model.
- Koza also can output data in the KGX format
- Write data transforms in semi-declarative Python
- Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
- Create or import mapping files to be used in ingests (eg. id mapping, type mappings)
- Create and use translation tables to map between source and target vocabularies
Installation
Koza is available on PyPi and can be installed via pip:
pip install koza
Usage
See the Ingests page for information on how to configure ingests for koza to use.
Koza can be used as a Python library, or via the command line.
CLI commands are available for validating and transforming data.
See the Module page for information on using Koza as a library.
Koza also includes some examples to help you get started (see koza/examples
).
Basic Examples
Validate
Give Koza a local or remote csv file, and get some basic information (headers, number of rows)
koza validate \
--file ./examples/data/string.tsv \
--delimiter ' '
Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl
koza validate \
--file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
--format jsonl
koza validate \
--file ./examples/data/ddpheno.json.gz \
--format json
Transform
Try one of Koza's example ingests:
koza transform \
--source examples/string-declarative/protein-links-detailed.yaml \
--global-table examples/translation_table.yaml
Note:
Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory
(these files can also simply be named transform.yaml
and transform.py
, as is default):
.
├── ...
│ ├── some_source
│ │ ├── your_ingest.yaml
│ │ └── your_ingest.py
│ └── some_translation_table.yaml
└── ...