Email Integration #
BonsAI allows users to submit documents via email. The system uses Postfix as the SMTP server to receive emails, extracts attachments, and processes them through the document pipeline.
Overview #
The email integration provides:
- Document Ingestion via Email: Users can email documents directly to BonsAI
- Extraction Type Routing: Email address tags determine document type (AP bills, AR invoices, etc.)
- Attachment Processing: Only attachments are extracted and imported as documents
- Sender Whitelisting: Entity-level control over allowed senders
- Deduplication: Prevents reprocessing of duplicate emails
- TLS Encryption: Let’s Encrypt certificates for secure email transport
Architecture #
Email arrives on SMTP (port 25)
↓
Postfix MTA
↓
tofu-postfix (Rust)
↓
┌───────┴───────┐
│ │
Redis S3 (raw email)
(dedup) │
↓
RabbitMQ
↓
bonsai-trigger-sync
↓
┌────────────┴────────────┐
│ │
PostgreSQL S3 (documents)
(Document records) (Attachment files)
Components #
| Component | Location | Purpose |
|---|---|---|
| tofu-postfix | apps/tofu-postfix/ |
SMTP server that receives and parses emails |
| bonsai-trigger-sync | apps/bonsai-trigger-sync/ |
Worker that processes emails and creates documents |
| Deployment | deployment/resources/tofu-postfix/ |
Kubernetes manifests |
Email Address Format #
Users must send emails to a specially formatted address:
{username}-{random}+{extraction-type}@{domain}
Default Format #
When an email integration is created, the system generates a default email address:
{organization-slug}-{entity-slug}-{random}+{extraction-type}@{domain}
- organization-slug: The organization’s URL slug (lowercase)
- entity-slug: The entity’s URL slug (lowercase)
- random: A 6-character alphanumeric string for uniqueness
- extraction-type: The document type tag (e.g.,
ap-bills) - domain: The email ingestion domain (environment-specific)
Example #
acme-corp-headquarters-x7k2m9+ap-bills@docs.gotofu.com
| Part | Description |
|---|---|
acme-corp-headquarters |
Default username (org-slug + entity-slug) |
x7k2m9 |
Random suffix for uniqueness |
+ |
Delimiter (SMTP recipient delimiter) |
ap-bills |
Extraction type tag |
@docs.gotofu.com |
Email ingestion domain (environment-specific) |
Custom Username #
Users can customize the username portion via the UI. When the username is changed:
- A new random suffix is generated
- The username must be 40 characters or fewer
- The username must form a valid email address
This allows organizations to create more memorable email addresses like:
invoices-x7k2m9+ap-bills@docs.gotofu.com
Supported Extraction Types #
| Tag | Extraction Type | Description |
|---|---|---|
ap-bills |
AP Bill | Accounts Payable Bill |
ar-invoices |
AR Invoice | Accounts Receivable Invoice |
bank-statements |
Bank Statement | Bank Statement |
direct-expenses |
Direct Expense | Direct Expense |
expense-claims |
Expense Claim | Expense Claim |
Domain Configuration #
| Environment | Domain |
|---|---|
| Development | docs-dev.gotofu.com |
| Production | docs.gotofu.com |
The domain is configured via the EMAIL_DOC_INGESTION_DOMAIN environment variable.
Email Processing Flow #
1. Email Reception (tofu-postfix) #
- Email arrives on SMTP port 25
- Postfix receives and validates the email
- Postfix invokes
tofu-postfix-wrapperscript with sender and recipient - Rust binary reads email from stdin and parses RFC822 format
- Validates email has attachments (exits if none)
- Extracts or generates Message-ID for deduplication
- Checks Redis for duplicate (24-hour TTL)
- Uploads raw email bytes to S3
- Enqueues
EmailTriggerSyncJobto RabbitMQ
2. Attachment Processing (bonsai-trigger-sync) #
- Reads raw email from S3
- Looks up entity by username via Email integration
- Validates sender is in whitelist
- Checks extraction type is enabled for entity
- For each attachment:
- Validates MIME type (PDF, JPEG, PNG only)
- Checks file size (max 10 MB)
- Uploads to S3 document bucket
- Creates Document record with email metadata
- Enqueues document conversion job
Attachment Handling #
Supported File Types #
| MIME Type | Extension |
|---|---|
application/pdf |
.pdf |
image/jpeg |
.jpg, .jpeg |
image/png |
.png |
Size Limits #
- Maximum attachment size: 10 MB per file
- Maximum email size: 25 MB (including all attachments and body)
Inline Images #
- Inline images with
Content-IDheaders (typically email signatures) are skipped - Only actual file attachments are processed
Entity Integration Setup #
For an entity to receive documents via email:
- Create an Entity Integration with provider
Email - Set
unique_internal_query_keyto the entity username (lowercase) - Configure optional sender whitelist in metadata
Whitelist Configuration #
The integration metadata supports:
{
"address_whitelist": ["supplier@example.com", "finance@vendor.com"],
"domain_whitelist": ["trusted-vendor.com", "partner.org"]
}
- If both are empty/null: all senders allowed
- If either is set: sender must match an address or domain
Disposable Domain Blocking #
Emails from disposable/temporary email domains are automatically blocked. The blocklist is loaded from /etc/config/disposable_domains.txt.
Deployment #
Kubernetes Resources #
The deployment consists of:
| Resource | File | Purpose |
|---|---|---|
| Deployment | deployment.yaml |
Pod specification with containers |
| Service | service.yaml |
AWS NLB LoadBalancer for SMTP traffic |
| Certificate | certificate.yaml |
Let’s Encrypt TLS certificate |
TLS Certificate Management #
Certificates are managed automatically using cert-manager with Let’s Encrypt:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: tofu-postfix-tls
spec:
secretName: tofu-postfix-tls
duration: 2160h # 90 days
renewBefore: 720h # Renew 30 days before expiry
issuerRef:
name: letsencrypt
kind: ClusterIssuer
dnsNames:
- mail.docs.gotofu.com
Certificate Rotation #
Automatic certificate rotation is handled by:
- cert-manager: Renews certificates 30 days before expiry
- Reloader: Detects secret changes and restarts pods
The deployment annotation enables automatic restarts:
metadata:
annotations:
secret.reloader.stakater.com/reload: "tofu-postfix-tls"
Mounting Certificates to Pods #
The same TLS certificate secret is mounted to all tofu-postfix pods:
volumes:
- name: tls-certs
secret:
secretName: tofu-postfix-tls
defaultMode: 0400 # Read-only for owner
containers:
- volumeMounts:
- name: tls-certs
mountPath: /etc/postfix/tls
readOnly: true
Postfix is configured to use these certificates at startup via entrypoint.sh.
Network Configuration #
The service uses AWS Network Load Balancer:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
type: LoadBalancer
ports:
- port: 25
targetPort: 25
protocol: TCP
Configuration #
Environment Variables #
| Variable | Description | Example |
|---|---|---|
EMAIL_DOC_INGESTION_DOMAIN |
Email domain for receiving documents | docs.gotofu.com |
EMAIL_INGESTION_S3_BUCKET_NAME |
S3 bucket for raw emails | email-ingestion-prod |
AWS_REGION |
AWS region | eu-central-1 |
RABBITMQ_HOST |
RabbitMQ hostname | rabbitmq.internal |
RABBITMQ_PORT |
RabbitMQ port | 5672 |
REDIS_HOST |
Redis hostname (for deduplication) | redis.internal |
REDIS_PORT |
Redis port | 6379 |
ENVIRONMENT_NAME |
Environment name | production |
RUST_APPS_OTEL_ENABLED |
Enable OpenTelemetry tracing | true |
Postfix Configuration #
Key settings in apps/tofu-postfix/postfix/main.cf:
# Route local mail to tofu-postfix binary
local_transport = tofu-postfix:
# Enable +tag parsing
recipient_delimiter = +
# Message size limit (25 MB)
message_size_limit = 26214400
# Rate limiting
smtpd_client_connection_rate_limit = 60
smtpd_client_message_rate_limit = 100
Document Metadata #
Documents created from email include metadata:
DocumentMetadata::Email(EmailDocumentMetadata {
sender: "supplier@example.com",
from: ["Supplier Name <supplier@example.com>"],
to: ["acme-corp+ap-bills@docs.gotofu.com"],
cc: [],
subject: "Invoice #12345",
body: "Please find attached...",
date: "2024-01-15T10:30:00Z",
})
This metadata is preserved with the document for reference.
Deduplication #
Emails are deduplicated using Redis:
- Extract
Message-IDheader (or generate UUID if missing) - Create key:
email:dedupe:{message_id}|{sender}|{recipient} - Attempt to claim with 24-hour TTL
- If already claimed, skip processing
This prevents duplicate documents when:
- Emails are forwarded multiple times
- Mail servers retry delivery
Error Handling #
Rejection Notifications #
When email processing fails, users receive notifications:
| Scenario | Notification |
|---|---|
| Sender not in whitelist | “Sender email address X is not in the whitelist.” |
| Disposable domain | “Sender domain X is a disposable email domain and is not allowed.” |
| Feature not enabled | “AP Bill feature is not enabled.” |
| Unsupported file type | “File type not supported.” |
| File too large | “File size exceeds 10 MB limit.” |
Non-Blocking Design #
- Email reception succeeds even if downstream processing fails
- Failed attachments don’t prevent other attachments from processing
- Notifications are enqueued asynchronously
Security #
- TLS Encryption: All SMTP connections support STARTTLS
- Sender Validation: Whitelist and disposable domain blocking
- Rate Limiting: Postfix enforces per-IP connection limits
- Non-Root Processing: Mail handling runs as unprivileged user
- No Secrets in Logs: Sensitive data is not logged