Email Integration

Email Integration #

BonsAI allows users to submit documents via email. The system uses Postfix as the SMTP server to receive emails, extracts attachments, and processes them through the document pipeline.

Overview #

The email integration provides:

  • Document Ingestion via Email: Users can email documents directly to BonsAI
  • Extraction Type Routing: Email address tags determine document type (AP bills, AR invoices, etc.)
  • Attachment Processing: Only attachments are extracted and imported as documents
  • Sender Whitelisting: Entity-level control over allowed senders
  • Deduplication: Prevents reprocessing of duplicate emails
  • TLS Encryption: Let’s Encrypt certificates for secure email transport

Architecture #

Email arrives on SMTP (port 25)
         ↓
    Postfix MTA
         ↓
  tofu-postfix (Rust)
         ↓
 ┌───────┴───────┐
 │               │
Redis         S3 (raw email)
(dedup)          │
                 ↓
            RabbitMQ
                 ↓
       bonsai-trigger-sync
                 ↓
    ┌────────────┴────────────┐
    │                         │
PostgreSQL              S3 (documents)
(Document records)      (Attachment files)

Components #

Component Location Purpose
tofu-postfix apps/tofu-postfix/ SMTP server that receives and parses emails
bonsai-trigger-sync apps/bonsai-trigger-sync/ Worker that processes emails and creates documents
Deployment deployment/resources/tofu-postfix/ Kubernetes manifests

Email Address Format #

Users must send emails to a specially formatted address:

{username}-{random}+{extraction-type}@{domain}

Default Format #

When an email integration is created, the system generates a default email address:

{organization-slug}-{entity-slug}-{random}+{extraction-type}@{domain}
  • organization-slug: The organization’s URL slug (lowercase)
  • entity-slug: The entity’s URL slug (lowercase)
  • random: A 6-character alphanumeric string for uniqueness
  • extraction-type: The document type tag (e.g., ap-bills)
  • domain: The email ingestion domain (environment-specific)

Example #

acme-corp-headquarters-x7k2m9+ap-bills@docs.gotofu.com
Part Description
acme-corp-headquarters Default username (org-slug + entity-slug)
x7k2m9 Random suffix for uniqueness
+ Delimiter (SMTP recipient delimiter)
ap-bills Extraction type tag
@docs.gotofu.com Email ingestion domain (environment-specific)

Custom Username #

Users can customize the username portion via the UI. When the username is changed:

  • A new random suffix is generated
  • The username must be 40 characters or fewer
  • The username must form a valid email address

This allows organizations to create more memorable email addresses like:

invoices-x7k2m9+ap-bills@docs.gotofu.com

Supported Extraction Types #

Tag Extraction Type Description
ap-bills AP Bill Accounts Payable Bill
ar-invoices AR Invoice Accounts Receivable Invoice
bank-statements Bank Statement Bank Statement
direct-expenses Direct Expense Direct Expense
expense-claims Expense Claim Expense Claim

Domain Configuration #

Environment Domain
Development docs-dev.gotofu.com
Production docs.gotofu.com

The domain is configured via the EMAIL_DOC_INGESTION_DOMAIN environment variable.

Email Processing Flow #

1. Email Reception (tofu-postfix) #

  1. Email arrives on SMTP port 25
  2. Postfix receives and validates the email
  3. Postfix invokes tofu-postfix-wrapper script with sender and recipient
  4. Rust binary reads email from stdin and parses RFC822 format
  5. Validates email has attachments (exits if none)
  6. Extracts or generates Message-ID for deduplication
  7. Checks Redis for duplicate (24-hour TTL)
  8. Uploads raw email bytes to S3
  9. Enqueues EmailTriggerSyncJob to RabbitMQ

2. Attachment Processing (bonsai-trigger-sync) #

  1. Reads raw email from S3
  2. Looks up entity by username via Email integration
  3. Validates sender is in whitelist
  4. Checks extraction type is enabled for entity
  5. For each attachment:
    • Validates MIME type (PDF, JPEG, PNG only)
    • Checks file size (max 10 MB)
    • Uploads to S3 document bucket
    • Creates Document record with email metadata
    • Enqueues document conversion job

Attachment Handling #

Supported File Types #

MIME Type Extension
application/pdf .pdf
image/jpeg .jpg, .jpeg
image/png .png

Size Limits #

  • Maximum attachment size: 10 MB per file
  • Maximum email size: 25 MB (including all attachments and body)

Inline Images #

  • Inline images with Content-ID headers (typically email signatures) are skipped
  • Only actual file attachments are processed

Entity Integration Setup #

For an entity to receive documents via email:

  1. Create an Entity Integration with provider Email
  2. Set unique_internal_query_key to the entity username (lowercase)
  3. Configure optional sender whitelist in metadata

Whitelist Configuration #

The integration metadata supports:

{
  "address_whitelist": ["supplier@example.com", "finance@vendor.com"],
  "domain_whitelist": ["trusted-vendor.com", "partner.org"]
}
  • If both are empty/null: all senders allowed
  • If either is set: sender must match an address or domain

Disposable Domain Blocking #

Emails from disposable/temporary email domains are automatically blocked. The blocklist is loaded from /etc/config/disposable_domains.txt.

Deployment #

Kubernetes Resources #

The deployment consists of:

Resource File Purpose
Deployment deployment.yaml Pod specification with containers
Service service.yaml AWS NLB LoadBalancer for SMTP traffic
Certificate certificate.yaml Let’s Encrypt TLS certificate

TLS Certificate Management #

Certificates are managed automatically using cert-manager with Let’s Encrypt:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: tofu-postfix-tls
spec:
  secretName: tofu-postfix-tls
  duration: 2160h         # 90 days
  renewBefore: 720h       # Renew 30 days before expiry
  issuerRef:
    name: letsencrypt
    kind: ClusterIssuer
  dnsNames:
    - mail.docs.gotofu.com

Certificate Rotation #

Automatic certificate rotation is handled by:

  1. cert-manager: Renews certificates 30 days before expiry
  2. Reloader: Detects secret changes and restarts pods

The deployment annotation enables automatic restarts:

metadata:
  annotations:
    secret.reloader.stakater.com/reload: "tofu-postfix-tls"

Mounting Certificates to Pods #

The same TLS certificate secret is mounted to all tofu-postfix pods:

volumes:
  - name: tls-certs
    secret:
      secretName: tofu-postfix-tls
      defaultMode: 0400  # Read-only for owner

containers:
  - volumeMounts:
      - name: tls-certs
        mountPath: /etc/postfix/tls
        readOnly: true

Postfix is configured to use these certificates at startup via entrypoint.sh.

Network Configuration #

The service uses AWS Network Load Balancer:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  ports:
    - port: 25
      targetPort: 25
      protocol: TCP

Configuration #

Environment Variables #

Variable Description Example
EMAIL_DOC_INGESTION_DOMAIN Email domain for receiving documents docs.gotofu.com
EMAIL_INGESTION_S3_BUCKET_NAME S3 bucket for raw emails email-ingestion-prod
AWS_REGION AWS region eu-central-1
RABBITMQ_HOST RabbitMQ hostname rabbitmq.internal
RABBITMQ_PORT RabbitMQ port 5672
REDIS_HOST Redis hostname (for deduplication) redis.internal
REDIS_PORT Redis port 6379
ENVIRONMENT_NAME Environment name production
RUST_APPS_OTEL_ENABLED Enable OpenTelemetry tracing true

Postfix Configuration #

Key settings in apps/tofu-postfix/postfix/main.cf:

# Route local mail to tofu-postfix binary
local_transport = tofu-postfix:

# Enable +tag parsing
recipient_delimiter = +

# Message size limit (25 MB)
message_size_limit = 26214400

# Rate limiting
smtpd_client_connection_rate_limit = 60
smtpd_client_message_rate_limit = 100

Document Metadata #

Documents created from email include metadata:

DocumentMetadata::Email(EmailDocumentMetadata {
    sender: "supplier@example.com",
    from: ["Supplier Name <supplier@example.com>"],
    to: ["acme-corp+ap-bills@docs.gotofu.com"],
    cc: [],
    subject: "Invoice #12345",
    body: "Please find attached...",
    date: "2024-01-15T10:30:00Z",
})

This metadata is preserved with the document for reference.

Deduplication #

Emails are deduplicated using Redis:

  1. Extract Message-ID header (or generate UUID if missing)
  2. Create key: email:dedupe:{message_id}|{sender}|{recipient}
  3. Attempt to claim with 24-hour TTL
  4. If already claimed, skip processing

This prevents duplicate documents when:

  • Emails are forwarded multiple times
  • Mail servers retry delivery

Error Handling #

Rejection Notifications #

When email processing fails, users receive notifications:

Scenario Notification
Sender not in whitelist “Sender email address X is not in the whitelist.”
Disposable domain “Sender domain X is a disposable email domain and is not allowed.”
Feature not enabled “AP Bill feature is not enabled.”
Unsupported file type “File type not supported.”
File too large “File size exceeds 10 MB limit.”

Non-Blocking Design #

  • Email reception succeeds even if downstream processing fails
  • Failed attachments don’t prevent other attachments from processing
  • Notifications are enqueued asynchronously

Security #

  • TLS Encryption: All SMTP connections support STARTTLS
  • Sender Validation: Whitelist and disposable domain blocking
  • Rate Limiting: Postfix enforces per-IP connection limits
  • Non-Root Processing: Mail handling runs as unprivileged user
  • No Secrets in Logs: Sensitive data is not logged