Source Config
This YAML file sets properties for the ingest of a single file type from a within a Source.
Paths are relative to the directory from which you execute Koza.
Source Configuration Properties
Required properties | |
---|---|
name |
Name of the data ingest, as <data source>_<type_of_ingest> , ex. hpoa_gene_to_disease |
files |
List of files to process |
node_properties |
List of node properties to include in output |
edge_properties |
List of edge properties to include in output |
Note | Either node or edge properties (or both) must be defined in the primary config yaml for your transform |
Optional properties | |
file_archive |
Path to a file archive containing the file(s) to process Supported archive formats: zip, gzip |
format |
Format of the data file(s) (CSV or JSON) |
sssom_config |
Configures usage of SSSOM mapping files |
depends_on |
List of map config files to use |
metadata |
Metadata for the source, either a list of properties, or path to a metadata.yaml |
transform_code |
Path to a python file to transform the data |
transform_mode |
How to process the transform file |
global_table |
Path to a global translation table file |
local_table |
Path to a local translation table file |
field_type_map |
Dict of field names and their type (using the FieldType enum) |
filters |
List of filters to apply |
json_path |
Path within JSON object containing data to process |
required_properties |
List of properties that must be present in output (JSON only) |
CSV-Specific Properties | |
delimiter |
Delimiter for csv files (Required for CSV format) |
Optional CSV Properties | |
columns |
List of columns to include in output (CSV only) |
header |
Header row index for csv files |
header_delimiter |
Delimiter for header in csv files |
header_prefix |
Prefix for header in csv files |
comment_char |
Comment character for csv files |
skip_blank_lines |
Skip blank lines in csv files |
Metadata Properties
Metadata is optional, and can be defined as a list of properties and values, or as a path to a metadata.yaml
file,
for example - metadata: "./path/to/metadata.yaml"
.
Remember that the path is relative to the directory from which you execute Koza.
Metadata Properties | |
---|---|
name | Name of data source, ex. "FlyBase" |
description | Description of data/ingest |
ingest_title | *Title of source of data, map to biolink name |
ingest_url | *URL to source of data, Maps to biolink iri |
provided_by | <data source>_<type_of_ingest> , ex. hpoa_gene_to_disease (see config propery "name") |
rights | Link to license information for the data source |
*Note: For more information on ingest_title
and ingest_url
, see the infores catalog
Composing Configuration from Multiple Yaml Files
Koza's custom YAML Loader supports importing/including other yaml files with an !include
tag.
For example, if you had a file named standard-columns.yaml
:
- "column_1"
- "column_2"
- "column_3"
- "column_4": "int"
Then in any ingests you wish to use these columns, you can simply !include
them:
columns: !include "./path/to/standard-columns.yaml"
Next Steps: Mapping and Additional Data