hinoki-experiments #
Dataset Management #
publish_dataset.py #
Script to publish datasets to the Weave platform.
python src/scripts/publish_dataset.py \
--xero_folder_path data/invoices/GG \
--tenant_name GG_JPN \
--data_range XERO_202411_202412
Features:
- Publishing regular Xero datasets
- Publishing invoice datasets
- Publishing historical datasets
Evaluation #
run_evaluation.py #
Script to run model evaluation. Supports multiple evaluation modes.
python src/scripts/run_evaluation.py \
--mode extraction \
--xero_folder_path data/invoices/GG \
--tenant_name GG_JPN \
--data_range XERO_202411_202412 \
--debug
Evaluation modes:
classification: Classification evaluationextraction: Extraction evaluationaccounting: Accounting evaluationmatching: Matching evaluationknowledge: Vendor knowledge evaluation
Note: For classification and knowledge modes, the data_range parameter is not relevant as they use the same dataset regardless of the data range specified.
DataRange values:
XERO_202211_202212= “202211-202212”XERO_202301_202312= “202301-202312”XERO_202401_202406= “202401-202406”XERO_202407_202412= “202407-202412”XERO_202401_202412= “202401-202412”XERO_202411_202412= “202411-202412”XERO_202201_202412= “202201-202412” (Suitable for historical data)
TenantName values:
GG_JPN= “Japan - GoGlobal K.K.”GG_UK= “UK - GoGlobal GEO UK Limited”PROCEPT_UK= “UK - PROCEPT BioRobotics UK Ltd”
Debug mode (--debug) allows running with a smaller dataset.