`koza`

Usage:

$ koza [OPTIONS] COMMAND [ARGS]...

Options:

--version
--install-completion: Install completion for the current shell.
--show-completion: Show completion for the current shell, to copy it or customize the installation.
--help: Show this message and exit.

Commands:

transform: Transform a source file.
join: Join multiple KGX files into a unified...
split: Split a KGX file by specified fields with...
prune: Prune graph by removing dangling edges and...
append: Append new KGX files to an existing graph...
normalize: Apply SSSOM mappings to normalize edge...
merge: Complete merge pipeline: join → normalize...
report: Generate comprehensive reports for KGX...
node-report: Generate tabular node report with GROUP BY...
edge-report: Generate tabular edge report with...
node-examples: Generate sample rows per node type.
edge-examples: Generate sample rows per edge type.

`koza transform`

Transform a source file.

Accepts either a config YAML file or a Python transform file directly.

Examples: # With config file (existing behavior) koza transform config.yaml

# Config-free mode with transform file (shell expands the glob)
koza transform transform.py -o ./output -f jsonl data/*.yaml

# With output options
koza transform transform.py -f jsonl -o ./output data/*.yaml

# Explicit input format
koza transform transform.py --input-format yaml data/*.dat

# CSV with comma delimiter (default for .csv files)
koza transform transform.py data/*.csv

# TSV with explicit delimiter
koza transform transform.py -d &#x27;\t&#x27; data/*.txt

Usage:

$ koza transform [OPTIONS] CONFIG_OR_TRANSFORM [INPUT_FILES]...

Arguments:

CONFIG_OR_TRANSFORM: Configuration YAML file OR Python transform file [required]
[INPUT_FILES]...: Input files (required for .py transforms, supports shell glob expansion)

Options:

--input-format [csv|jsonl|json|yaml]: Input format (auto-detected if not specified)
-o, --output-dir TEXT: Path to output directory [default: ./output]
-f, --output-format [tsv|jsonl|kgx|passthrough]: Output format [default: tsv]
-n, --limit INTEGER: Number of rows to process (if skipped, processes entire source file) [default: 0]
-p, --progress: Display progress of transform
-q, --quiet: Disable log output
-d, --delimiter TEXT: Field delimiter for CSV/TSV files (default: tab for .tsv, comma for .csv)
--help: Show this message and exit.

`koza join`

Join multiple KGX files into a unified DuckDB database

Examples: # Auto-discover files in directory koza join --input-dir tmp/ -o graph.duckdb

# Use glob patterns
koza join -n &quot;tmp/*_nodes.tsv&quot; -e &quot;tmp/*_edges.tsv&quot; -o graph.duckdb

# Mix directory discovery with additional files
koza join --input-dir tmp/ -n extra_nodes.tsv -o graph.duckdb

# Multiple individual files
koza join -n file1.tsv -n file2.tsv -e edges.tsv -o graph.duckdb

Usage:

$ koza join [OPTIONS]

Options:

-n, --nodes TEXT: Node files or glob patterns (can specify multiple)
-e, --edges TEXT: Edge files or glob patterns (can specify multiple)
-d, --input-dir TEXT: Directory to auto-discover KGX files
-o, --output TEXT: Path to output database file (default: in-memory)
-f, --format [tsv|jsonl|parquet]: Output format for any exported files [default: tsv]
--schema-report: Generate schema compliance report
-q, --quiet: Suppress output
-p, --progress: Show progress bars [default: True]
--help: Show this message and exit.

`koza split`

Split a KGX file by specified fields with format conversion support

Usage:

$ koza split [OPTIONS] FILE FIELDS

Arguments:

FILE: Path to the KGX file to split [required]
FIELDS: Comma-separated list of fields to split on [required]

Options:

-o, --output-dir TEXT: Output directory for split files [default: ./output]
-f, --format [tsv|jsonl|parquet]: Output format (default: preserve input format)
--remove-prefixes: Remove prefixes from values in filenames
-q, --quiet: Suppress output
-p, --progress: Show progress bars [default: True]
--help: Show this message and exit.

`koza prune`

Prune graph by removing dangling edges and handling singleton nodes

Examples: # Keep singleton nodes, move dangling edges koza prune graph.duckdb

# Remove singleton nodes to separate table
koza prune graph.duckdb --remove-singletons

# Experimental: filter small components
koza prune graph.duckdb --min-component-size 10

Usage:

$ koza prune [OPTIONS] DATABASE

Arguments:

DATABASE: Path to the DuckDB database file to prune [required]

Options:

--keep-singletons: Keep singleton nodes in main table
--remove-singletons: Move singleton nodes to separate table
--min-component-size INTEGER: Minimum connected component size (experimental)
-q, --quiet: Suppress output
-p, --progress: Show progress bars [default: True]
--help: Show this message and exit.

`koza append`

Append new KGX files to an existing graph database

Examples: # Append specific files to existing database koza append graph.duckdb -n new_nodes.tsv -e new_edges.tsv

# Auto-discover files in directory and append
koza append graph.duckdb --input-dir new_data/

# Append with deduplication and schema reporting
koza append graph.duckdb -n &quot;*.tsv&quot; --deduplicate --schema-report

Usage:

$ koza append [OPTIONS] DATABASE

Arguments:

DATABASE: Path to existing DuckDB database file [required]

Options:

-n, --nodes TEXT: Node files or glob patterns (can specify multiple)
-e, --edges TEXT: Edge files or glob patterns (can specify multiple)
-d, --input-dir TEXT: Directory to auto-discover KGX files
--deduplicate: Remove duplicates during append
--schema-report: Generate schema compliance report
-q, --quiet: Suppress output
-p, --progress: Show progress bars [default: True]
--help: Show this message and exit.

`koza normalize`

Apply SSSOM mappings to normalize edge subject/object references

This operation loads SSSOM mapping files and applies them to rewrite edge subject and object identifiers to their canonical/equivalent forms. Node identifiers themselves are not changed - only edge references are normalized.

Examples: # Apply specific mapping files koza normalize graph.duckdb -m gene_mappings.sssom.tsv -m mondo.sssom.tsv

# Auto-discover SSSOM files in directory
koza normalize graph.duckdb --mappings-dir ./sssom/

# Apply mappings with glob pattern
koza normalize graph.duckdb -m &quot;*.sssom.tsv&quot;

Usage:

$ koza normalize [OPTIONS] DATABASE

Arguments:

DATABASE: Path to existing DuckDB database file [required]

Options:

-m, --mappings TEXT: SSSOM mapping files or glob patterns (can specify multiple)
-d, --mappings-dir TEXT: Directory containing SSSOM mapping files
-q, --quiet: Suppress output
-p, --progress: Show progress bars [default: True]
--help: Show this message and exit.

`koza merge`

Complete merge pipeline: join → normalize → prune

This composite operation orchestrates the full graph processing pipeline: 1. Join: Load and combine multiple KGX files into a unified database 2. Normalize: Apply SSSOM mappings to edge subject/object references 3. Prune: Remove dangling edges and handle singleton nodes

The pipeline can be customized by skipping steps or configuring options.

Examples: # Full pipeline with auto-discovery koza merge --input-dir ./data/ --mappings-dir ./sssom/ -o clean_graph.duckdb

# Specific files with export
koza merge -n nodes.tsv -e edges.tsv -m mappings.sssom.tsv --export --export-dir ./output/

# Skip normalization, only join and prune
koza merge -n &quot;*.tsv&quot; -e &quot;*.tsv&quot; --skip-normalize -o graph.duckdb

# Custom singleton handling
koza merge --input-dir ./data/ -m &quot;*.sssom.tsv&quot; --remove-singletons

Usage:

$ koza merge [OPTIONS]

Options:

-n, --nodes TEXT: Node files or glob patterns (can specify multiple)
-e, --edges TEXT: Edge files or glob patterns (can specify multiple)
-m, --mappings TEXT: SSSOM mapping files or glob patterns (can specify multiple)
-d, --input-dir TEXT: Directory to auto-discover KGX files
--mappings-dir TEXT: Directory containing SSSOM mapping files
-o, --output TEXT: Path to output database file (default: temporary)
--export: Export final clean data to files
--export-dir TEXT: Directory for exported files (required if --export)
-f, --format [tsv|jsonl|parquet]: Output format for exported files [default: tsv]
--archive: Export as archive (tar) instead of loose files
--compress: Compress archive as tar.gz (requires --archive)
--graph-name TEXT: Name for graph files in archive (default: merged_graph)
--skip-normalize: Skip normalization step
--skip-prune: Skip pruning step
--keep-singletons: Keep singleton nodes (default) [default: True]
--remove-singletons: Move singleton nodes to separate table
-q, --quiet: Suppress output
-p, --progress: Show progress bars [default: True]
--help: Show this message and exit.

`koza report`

Generate comprehensive reports for KGX graph databases.

Available report types:

• qc: Quality control analysis by data source

• graph-stats: Comprehensive graph statistics (similar to merged_graph_stats.yaml)

• schema: Database schema analysis and biolink compliance

Examples:

# Generate QC report
koza report qc -d merged.duckdb -o qc_report.yaml

# Generate graph statistics
koza report graph-stats -d merged.duckdb -o graph_stats.yaml

# Generate schema report
koza report schema -d merged.duckdb -o schema_report.yaml

# Quick QC analysis (console output only)
koza report qc -d merged.duckdb

Usage:

$ koza report [OPTIONS] REPORT_TYPE

Arguments:

REPORT_TYPE: Type of report to generate: qc, graph-stats, or schema [required]

Options:

-d, --database TEXT: Path to DuckDB database file [required]
-o, --output TEXT: Path to output report file (YAML format)
-q, --quiet: Suppress progress output
--help: Show this message and exit.

`koza node-report`

Generate tabular node report with GROUP BY ALL categorical columns.

Outputs count of nodes grouped by categorical columns (namespace, category, etc.).

Examples:

# From database
koza node-report -d merged.duckdb -o node_report.tsv

# From file
koza node-report -f nodes.tsv -o node_report.parquet --format parquet

# Custom columns
koza node-report -d merged.duckdb -o report.tsv -c namespace -c category -c provided_by

Usage:

$ koza node-report [OPTIONS]

Options:

-d, --database TEXT: Path to DuckDB database file
-f, --file TEXT: Path to node file (TSV, JSONL, or Parquet)
-o, --output TEXT: Path to output report file
--format [tsv|parquet|jsonl]: Output format [default: tsv]
-c, --column TEXT: Categorical columns to group by (can specify multiple)
-q, --quiet: Suppress progress output
--help: Show this message and exit.

`koza edge-report`

Generate tabular edge report with denormalized node info.

Joins edges to nodes to get subject_category, object_category, etc., then outputs count of edges grouped by categorical columns.

Examples:

# From database
koza edge-report -d merged.duckdb -o edge_report.tsv

# From files
koza edge-report -n nodes.tsv -e edges.tsv -o edge_report.parquet --format parquet

# Custom columns
koza edge-report -d merged.duckdb -o report.tsv \
    -c subject_category -c predicate -c object_category -c primary_knowledge_source

Usage:

$ koza edge-report [OPTIONS]

Options:

-d, --database TEXT: Path to DuckDB database file
-n, --nodes TEXT: Path to node file (for denormalization)
-e, --edges TEXT: Path to edge file (TSV, JSONL, or Parquet)
-o, --output TEXT: Path to output report file
--format [tsv|parquet|jsonl]: Output format [default: tsv]
-c, --column TEXT: Categorical columns to group by (can specify multiple)
-q, --quiet: Suppress progress output
--help: Show this message and exit.

`koza node-examples`

Generate sample rows per node type.

Samples N example rows for each distinct value in the type column (default: category).

Examples:

# From database (5 examples per category)
koza node-examples -d merged.duckdb -o node_examples.tsv

# From file with 10 examples per type
koza node-examples -f nodes.tsv -o examples.tsv -n 10

# Group by different column
koza node-examples -d merged.duckdb -o examples.tsv -t provided_by

Usage:

$ koza node-examples [OPTIONS]

Options:

-d, --database TEXT: Path to DuckDB database file
-f, --file TEXT: Path to node file (TSV, JSONL, or Parquet)
-o, --output TEXT: Path to output examples file
--format [tsv|parquet|jsonl]: Output format [default: tsv]
-n, --sample-size INTEGER: Number of examples per type [default: 5]
-t, --type-column TEXT: Column to partition examples by [default: category]
-q, --quiet: Suppress progress output
--help: Show this message and exit.

`koza edge-examples`

Generate sample rows per edge type.

Samples N example rows for each distinct combination of type columns (default: subject_category, predicate, object_category).

Examples:

# From database (5 examples per edge type)
koza edge-examples -d merged.duckdb -o edge_examples.tsv

# From files with 10 examples
koza edge-examples -n nodes.tsv -e edges.tsv -o examples.tsv -s 10

# Custom type columns
koza edge-examples -d merged.duckdb -o examples.tsv -t predicate -t primary_knowledge_source

Usage:

$ koza edge-examples [OPTIONS]

Options:

-d, --database TEXT: Path to DuckDB database file
-n, --nodes TEXT: Path to node file (for denormalization)
-e, --edges TEXT: Path to edge file (TSV, JSONL, or Parquet)
-o, --output TEXT: Path to output examples file
--format [tsv|parquet|jsonl]: Output format [default: tsv]
-s, --sample-size INTEGER: Number of examples per type [default: 5]
-t, --type-column TEXT: Columns to partition examples by (can specify multiple)
-q, --quiet: Suppress progress output
--help: Show this message and exit.