Extraction Export Library

Extraction Export Library #

@bonsai/extraction-export is an independent TypeScript library that converts extracted document data (invoices, bank statements, direct expenses) into 27+ accounting software formats. It was extracted from apps/webapp/src/shared/lib/export into libs/typescript/extraction-export/ to enable reuse across services and enforce a clean dependency boundary.

Architecture #

Directory Layout #

libs/typescript/extraction-export/
├── build.ts                         # Bun bundler configuration
├── package.json                     # Scripts, exports, dependencies
├── tsconfig.json                    # TypeScript config (noEmit for dev)
├── tsconfig.build.json              # TypeScript config (declaration-only for build)
├── scripts/
│   └── generate-metadata.ts         # Codegen: ts-morph → export-format-metadata.tsx
└── src/
    ├── index.ts                     # Server entry point (full exports)
    ├── client.ts                    # Client entry point (no Node.js deps)
    ├── format-converters.tsx         # Base types: FormatConverter, ExportFormat, enriched types
    ├── ap-bill-format-converters.tsx  # AP Bill converters (22 formats)
    ├── ar-invoice-format-converters.tsx # AR Invoice converters (15 formats)
    ├── bank-statement-format-converters.tsx # Bank Statement converters (8 formats)
    ├── direct-expense-format-converters.tsx # Direct Expense converters (3 formats)
    ├── export-format-metadata.tsx    # Auto-generated (gitignored) — client-safe metadata
    ├── export-utils.ts              # Export orchestration: enrichment, validation, archiving
    ├── direct-expense-types.ts      # Enriched direct expense types
    ├── logger.ts                    # Pluggable logger (configureLogger)
    ├── icons/                       # React SVG icons for accounting software logos
    ├── types/                       # API types (Invoice, BankStatement, etc.)
    └── utils/                       # Shared utilities (round, StrictSubset)

Key Modules #

Module Purpose
format-converters.tsx Base interfaces (FormatConverter, ExportFormat), enriched types, ExportFormatType union, formatDate utility
ap-bill-format-converters.tsx All AP Bill converter classes (QBO, Xero, AutoCount, MYOB, Sage, etc.)
ar-invoice-format-converters.tsx All AR Invoice converter classes
bank-statement-format-converters.tsx All Bank Statement converter classes
direct-expense-format-converters.tsx All Direct Expense converter classes (Zoho, MYOB, SQL Accounting)
export-utils.ts High-level functions: enrichInvoicesWithAccountingData, exportInvoices, createMultiFileExport, validation
export-format-metadata.tsx Auto-generated — lightweight metadata (id, name, icon, disabled) for client-side UI format selectors
logger.ts Pluggable logger — webapp bridges its Sentry/Datadog logger via configureLogger()

Dual Entry Points #

The library provides two entry points to separate server-side and client-side concerns:

@bonsai/extraction-export (server) #

Full library with all converter classes, export utilities, and Node.js dependencies (archiver, node-xlsx). Use this in API routes and server-side code.

import {
  exportInvoices,
  getApBillConverter,
  configureLogger,
} from '@bonsai/extraction-export';

@bonsai/extraction-export/client (client-safe) #

Types, metadata getters, icons, and lightweight values only. No archiver or node-xlsx — safe for browser bundles.

import {
  getApBillExportFormatMetadata,
  configureAssetBaseUrl,
  type ApBillExportFormatType,
} from '@bonsai/extraction-export/client';

The client entry point exports:

  • All types (type-only re-exports from converter modules)
  • Export format metadata getters (from the auto-generated export-format-metadata.tsx)
  • Icons and configureAssetBaseUrl
  • Logger, utility functions, and enums

How Webapp Consumes the Library #

The webapp configures the library at import time through two config files:

  • export-config.ts — Sets the R2 CDN base URL for format icons: configureAssetBaseUrl('https://assets.gotofu.com')
  • export-config.server.ts — Bridges the webapp’s structured logger: configureLogger(logger as Logger)

Server-side API routes (/api/export-invoices, /api/export-bank-statements, etc.) import from @bonsai/extraction-export. Client-side UI components import from @bonsai/extraction-export/client.

Build System #

Build Pipeline #

The build runs two stages:

pnpm build
# Which executes: pnpm codegen && bun run build.ts
  1. Codegen (pnpm codegen): Runs scripts/generate-metadata.ts which uses ts-morph to parse the 4 converter files, extract lightweight metadata (id, name, icon, disabled) from each converter class, and writes src/export-format-metadata.tsx. This file is gitignored and regenerated on every build.

  2. Bundle (bun run build.ts):

    • Cleans dist/
    • Builds ESM bundles with Bun for both entry points (index.ts, client.ts)
    • Externalizes react, archiver, date-fns, node-xlsx
    • Generates source maps
    • Runs tsc --project tsconfig.build.json to emit .d.ts declaration files
    • Runs tsc-alias to resolve @/ path aliases in declarations

Postinstall #

"postinstall": "[ -d dist ] || pnpm build"

Builds the library on pnpm install if dist/ doesn’t exist. This ensures the library is available to consumers without an explicit build step in CI and local development.

Codegen Details #

scripts/generate-metadata.ts uses ts-morph to:

  1. Parse the 4 converter source files (ap-bill-format-converters.tsx, etc.)
  2. Walk class hierarchies to find format, name, icon, and disabled properties
  3. Generate a single export-format-metadata.tsx with typed metadata arrays and getter functions
  4. Format the output with Biome

Each section is configured with a SectionConfig specifying the source file, type name, and output variable name. Bank statement formats are sorted alphabetically; others preserve source order.

To verify codegen is up to date:

pnpm codegen:check
# Runs codegen, then git diff --exit-code to detect uncommitted changes

Format Converter Types #

The library supports 4 document types, each with its own set of accounting software converters:

AP Bill (Accounts Payable) #

22 formats including: QBO, QBO Desktop, Xero, AutoCount, AutoCount Cloud, Bukku, FAST, FAST Voucher, Winton, MYOB, Sage, Odoo, Freee, Yayoi, SAP, Zoho Bill, QNE (Past Bills & Purchase Invoices), PEAK, ThaiTax, SQL Accounting, AccountingAI

AR Invoice (Accounts Receivable) #

15 formats including: QBO, QBO Desktop, Xero, AutoCount, Bukku, FAST, Winton, MYOB, Sage, QNE (Past Invoices & Sales Invoices), Zoho Sales Invoice, PEAK, SQL Accounting, AccountingAI

Bank Statement #

8 formats including: QBO, Xero, AutoCount, Bukku, Odoo, Zoho, PEAK, AccountingAI

Direct Expense #

3 formats: Zoho Expense, MYOB Direct Expense, SQL Accounting

Converter Interface #

Each converter implements a consistent interface:

interface FormatConverter {
  format: ExportFormatType;       // e.g. 'qbo', 'xero'
  headers: string[];              // CSV/XLSX column headers
  convertInvoice: (invoice: EnrichedInvoice) => (string | number | Date | null)[][];
  getFileExtension: () => string; // 'csv' or 'xlsx'
  getContentType: () => string;   // MIME type
  getExportWithMetadata?: (       // Optional: custom export with metadata rows
    invoices: EnrichedInvoice[],
    metadata?: ExportMetadata
  ) => string | Buffer;
}

Bank statement and direct expense converters follow the same pattern with type-appropriate methods (convertBankStatement, convertDirectExpense).

Mise Tasks #

All tasks are defined in libs/typescript/.tasks.toml:

Task Command Description
mise ts-lib-build pnpm build Run codegen + Bun bundler + tsc declarations
mise ts-lib-codegen pnpm codegen Generate export-format-metadata.tsx
mise ts-lib-check Runs lint + format + typecheck in parallel All quality checks
mise ts-lib-lint-check pnpm lint (oxlint) Lint source files
mise ts-lib-format-check pnpm format (biome) Check formatting
mise ts-lib-format-fix pnpm format:fix Auto-fix formatting
mise ts-lib-type-check pnpm typecheck (tsc –noEmit) Type checking

CI/CD #

The library is built in CI through two mechanisms:

  • ci.yml: The postinstall script handles it — pnpm install triggers [ -d dist ] || pnpm build, building the library if dist/ is absent.
  • deploy.yaml: Explicitly sets up mise and runs mise ts-lib-build to ensure a fresh build before deployment.

Development Workflow #

Local Development #

# Build the library (codegen + bundle + declarations)
mise ts-lib-build

# Run all quality checks
mise ts-lib-check

# Fix formatting issues
mise ts-lib-format-fix

# Regenerate metadata after modifying converters
mise ts-lib-codegen

# Type-check only
mise ts-lib-type-check

After Modifying Converters #

When you add or change a converter class, you must regenerate metadata:

mise ts-lib-codegen   # Regenerates export-format-metadata.tsx
mise ts-lib-build     # Full rebuild including codegen

The postinstall hook rebuilds automatically when dist/ is missing, but during development you should run explicit builds after changes.

Key Conventions #

File Formats #

  • CSV (text/csv): Most formats export as CSV. Headers are the first row, followed by one row per line item.
  • Excel (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet): Some formats (FAST, FAST Voucher, SAP, QNE, SQL Accounting, MYOB Direct Expense) export as .xlsx using node-xlsx. These use getExportWithMetadata for custom sheet layouts.

Date Formatting #

Each accounting software expects dates in specific formats. The formatDate(dateMs, format) utility wraps date-fns/format. Common patterns:

  • dd/MM/yyyy — Default, Xero, AutoCount
  • MM/dd/yyyy — QBO
  • yyyy-MM-dd — Zoho, Odoo, Sage, SAP, PEAK

Tax Calculations #

  • Converters respect InvoiceDataLineAmountType (INCLUSIVE vs EXCLUSIVE) to determine how tax is calculated
  • Tax amounts are extracted from enriched line items (accounting_tax_rate)
  • Rounding uses the round() utility from utils/number.ts

Enriched Data #

Before export, invoices and bank statements are enriched with accounting data (contacts, accounts, tax rates, tags, items, locations) via enrichInvoicesWithAccountingData and enrichBankStatementsWithAccountingData in export-utils.ts. The enriched types (EnrichedInvoice, EnrichedInvoiceLineItem, etc.) extend the base API types with optional accounting fields.

Validation #

Export utilities include validation functions (validateInvoiceForExport, validateBankStatementForExport, validateDirectExpenseForExport) that check for required fields before export.