Run Evaluation

hinoki-experiments #

Dataset Management #

publish_dataset.py #

Script to publish datasets to the Weave platform.

python src/scripts/publish_dataset.py \
  --xero_folder_path data/invoices/GG \
  --tenant_name GG_JPN \
  --data_range XERO_202411_202412

Features:

  • Publishing regular Xero datasets
  • Publishing invoice datasets
  • Publishing historical datasets

Evaluation #

run_evaluation.py #

Script to run model evaluation. Supports multiple evaluation modes.

python src/scripts/run_evaluation.py \
  --mode extraction \
  --xero_folder_path data/invoices/GG \
  --tenant_name GG_JPN \
  --data_range XERO_202411_202412 \
  --debug

Evaluation modes:

  • classification: Classification evaluation
  • extraction: Extraction evaluation
  • accounting: Accounting evaluation
  • matching: Matching evaluation
  • knowledge: Vendor knowledge evaluation

Note: For classification and knowledge modes, the data_range parameter is not relevant as they use the same dataset regardless of the data range specified.

DataRange values:

  • XERO_202211_202212 = “202211-202212”
  • XERO_202301_202312 = “202301-202312”
  • XERO_202401_202406 = “202401-202406”
  • XERO_202407_202412 = “202407-202412”
  • XERO_202401_202412 = “202401-202412”
  • XERO_202411_202412 = “202411-202412”
  • XERO_202201_202412 = “202201-202412” (Suitable for historical data)

TenantName values:

  • GG_JPN = “Japan - GoGlobal K.K.”
  • GG_UK = “UK - GoGlobal GEO UK Limited”
  • PROCEPT_UK = “UK - PROCEPT BioRobotics UK Ltd”

Debug mode (--debug) allows running with a smaller dataset.