Tofie Architecture #
This document provides a deep dive into the Tofie system architecture, explaining how all components work together to automate the development workflow.
System Overview #
Tofie is a distributed system that orchestrates multiple services to automate software development tasks. The architecture follows an event-driven pattern with n8n as the central orchestrator.
sequenceDiagram
participant User
participant Linear
participant n8n
participant Coder
participant Claude
participant GitHub
User->>Linear: Comment "@Tofie plan"
Linear->>n8n: Webhook: issue.comment_created
n8n->>n8n: Parse command & extract data
n8n->>Linear: Update status to "Planning"
n8n->>Coder: SSH: Execute planning.sh
Coder->>Coder: Create git worktree
Coder->>Claude: Call Claude CLI
Claude->>Claude: Generate PLAN.md
Claude->>Coder: Save to .plans/PLAN.md
Coder->>n8n: Return JSON result
n8n->>Linear: Comment with plan
n8n->>Linear: Update status to "Planned"
Component Architecture #
1. Linear (Project Management Layer) #
Role: Source of truth for requirements and project tracking
Key Features:
- Issue management with descriptions and comments
- Status workflow (Todo → Planning → Planned → In Progress → In Review → Done)
- Webhooks for real-time event notifications
- API for programmatic status updates
Integration Points:
- Outbound: Webhooks to n8n on issue comments
- Inbound: API calls from n8n for status updates and comments
Data Flow:
User creates/updates issue
↓
Issue contains:
- Title and description
- Acceptance criteria
- Comments with clarifications
- Labels and priority
↓
Webhook fires on @Tofie mention
↓
n8n receives full issue context
2. n8n (Orchestration Layer) #
Role: Central automation hub that coordinates all services
Key Features:
- Webhook receivers for Linear events
- Workflow automation with conditional logic
- SSH execution for remote scripts
- Error handling and retry logic
- Credential management
Workflow Structure:
Webhook Trigger
↓
Parse Command (plan/implement/review/pr)
↓
Extract Linear Context
↓
Build Script Input JSON
↓
SSH to Coder Instance
↓
Execute Tofie Script
↓
Parse Script Output
↓
Update Linear (status + comment)
Configuration:
- Environment:
bonsai.app.n8n.cloud - Webhook URL:
https://bonsai.app.n8n.cloud/webhook/tofie-event - Signing key: Validates webhook authenticity
- SSH credentials: Stored in n8n credential vault
3. Coder Instance (Execution Layer) #
Role: Development environment where all work happens
Infrastructure:
- EC2 instance (Ubuntu Linux)
- Persistent storage for repositories
- Claude Code CLI installed
- Git configured with SSH keys
- Access to Doppler for secrets
Directory Structure:
/home/coder/
├── bonsai/ # Main repository (on main branch)
│ ├── .claude/ # Claude Code configuration
│ │ ├── commands/tofie/ # Slash commands
│ │ └── skills/ # Claude skills
│ ├── tools/local/scripts/tofie/ # Tofie automation scripts
│ │ ├── planning.sh
│ │ ├── planning-subagent.sh
│ │ ├── implement.sh
│ │ ├── implement-subagent.sh
│ │ ├── adjust.sh
│ │ ├── adjust-subagent.sh
│ │ └── submit-pr.sh
│ └── [rest of codebase]
│
└── trees/ # Git worktrees (isolated branches)
├── john-eng-1144/ # Worktree for issue ENG-1144
│ ├── .plans/ # Gitignored artifacts
│ │ ├── PLAN.md
│ │ ├── linear-metadata.json
│ │ ├── n8n-metadata.json
│ │ └── pr-info.json
│ └── [full codebase on feature branch]
│
└── jane-eng-2205/ # Another parallel worktree
└── ...
Git Worktree Benefits:
- Isolation: Each issue works in separate directory
- Parallelization: Multiple issues can be worked on simultaneously
- No switching: No need to checkout/switch branches
- Clean state: Each worktree starts from clean main branch
- Safety: Changes don’t affect main repository
4. Claude Code (AI Layer) #
Role: AI-powered development assistant
Capabilities:
- Code generation and modification
- Implementation planning
- Code review and quality checks
- PR description generation
- Following project conventions
Invocation Methods:
Direct Slash Commands (Fast, for single operations):
claude "/tofie-plan --branch feature/my-feature"
Via Subagents (For complex, isolated operations):
claude "Use tofie-planner SUBAGENT for comprehensive research"
Permission Modes:
- Standard scripts:
--permission-mode acceptEdits - Subagent scripts:
--dangerously-skip-permissions
Context Engineering: Claude uses a structured planning framework with:
- Background research phase
- Requirements analysis
- Phase breakdown
- Risk assessment
- Success criteria
5. GitHub (Code Hosting Layer) #
Role: Version control and collaboration platform
Integration:
- PRs created via
ghCLI - All Tofie PRs are drafts
- PR descriptions reference Linear issues
- Conventional commit format for titles
PR Creation Flow:
submit-pr.sh script
↓
Check if PR already exists (gh pr list)
↓
If exists: Return existing PR info
↓
If not exists:
├─ Review branch commits
├─ Read PLAN.md for context
├─ Generate PR title (conventional format)
├─ Build PR description
├─ Create draft PR (gh pr create --draft)
└─ Return PR URL and number
Data Flow #
Planning Workflow #
flowchart TD
A[User: @Tofie plan] --> B[Linear Webhook]
B --> C[n8n: Parse command]
C --> D[n8n: Build JSON input]
D --> E[n8n: SSH to Coder]
E --> F[Coder: planning.sh]
F --> G{Worktree exists?}
G -->|No| H[Create worktree from main]
G -->|Yes| I[Reuse existing]
H --> J[Create .plans/ directory]
I --> J
J --> K[Write linear-metadata.json]
J --> L[Write n8n-metadata.json]
K --> M[Execute Claude CLI]
L --> M
M --> N[Claude: Research codebase]
N --> O[Claude: Analyze requirements]
O --> P[Claude: Generate PLAN.md]
P --> Q[Write .plans/PLAN.md]
Q --> R[Return JSON success]
R --> S[n8n: Parse result]
S --> T[n8n: Post plan to Linear]
T --> U[n8n: Update status to Planned]
Implementation Workflow #
flowchart TD
A[User: @Tofie implement] --> B[Linear Webhook]
B --> C[n8n: Parse command]
C --> D[n8n: Find worktree]
D --> E[n8n: SSH to Coder]
E --> F[Coder: implement.sh]
F --> G[Read .plans/PLAN.md]
G --> H[Read linear-metadata.json]
H --> I[Execute Claude CLI]
I --> J[Claude: Review plan]
J --> K[Claude: Implement changes]
K --> L{Run quality checks}
L -->|Fail| M[Claude: Fix issues]
M --> L
L -->|Pass| N[Claude: Create commits]
N --> O[Push to remote branch]
O --> P{Create PR?}
P -->|Yes| Q[Run submit-pr.sh]
P -->|No| R[Return JSON success]
Q --> S[Create draft PR]
S --> R
R --> T[n8n: Parse result]
T --> U[n8n: Post summary to Linear]
U --> V[n8n: Update status]
Script Architecture #
All Tofie scripts follow a consistent pattern:
Script Structure #
#!/bin/bash
set -euo pipefail
# 1. Configuration
CLAUDE_CMD="/home/$USER/.local/bin/claude"
REPO_ROOT="$(cd ... && pwd)"
# 2. Logging functions (stderr)
log_info() { echo "[INFO] $*" >&2; }
log_error() { echo "[ERROR] $*" >&2; }
# 3. JSON output function (stdout)
output_json() {
# Structured JSON output for n8n
cat <<EOF
{
"success": $success,
"message": "$message",
...
}
EOF
}
# 4. Input validation
INPUT_JSON=$(cat) # Read from stdin
BRANCH_NAME=$(echo "$INPUT_JSON" | jq -r '.branchName')
# 5. Main logic
# - Create/find worktree
# - Prepare metadata
# - Execute Claude CLI
# - Parse results
# 6. Output results
output_json true "Success message" ...
Script Variants #
Standard Scripts (planning.sh, implement.sh, adjust.sh):
- Use slash commands directly
- Faster execution
- Synchronous operation
- Permission mode:
--permission-mode acceptEdits
Subagent Scripts (planning-subagent.sh, implement-subagent.sh, adjust-subagent.sh):
- Launch isolated Claude subagents
- For complex, thorough operations
- Parallel execution possible
- Permission mode:
--dangerously-skip-permissions
When to use which:
- Standard: Quick operations, single plan/implementation
- Subagent: Parallel planning, long-running research, experimental approaches
State Management #
Worktree Lifecycle #
stateDiagram-v2
[*] --> Created: git worktree add
Created --> Planning: planning.sh
Planning --> Planned: PLAN.md created
Planned --> Implementation: implement.sh
Implementation --> InReview: Code committed
InReview --> PRCreated: submit-pr.sh
PRCreated --> Merged: GitHub PR merged
Merged --> Cleaned: git worktree remove
Cleaned --> [*]
note right of Planning
.plans/PLAN.md
.plans/linear-metadata.json
.plans/n8n-metadata.json
end note
note right of PRCreated
.plans/pr-info.json
end note
Linear Status Transitions #
Todo
↓ @Tofie plan
Planning
↓ PLAN.md created
Planned
↓ @Tofie implement
In Progress
↓ Implementation complete
In Review
↓ @Tofie pr
PR Created
↓ GitHub PR merged
Done
Metadata Persistence #
linear-metadata.json:
- Created during planning
- Contains full issue context (description, comments, labels)
- Read during implementation for context
- Never modified after creation
n8n-metadata.json:
- Created during planning
- Contains n8n webhook URL and signing key
- Used for future event notifications
- Static configuration
pr-info.json:
- Created during PR submission
- Contains PR number, URL, title
- Used to avoid duplicate PR creation
- Updated if PR is recreated
Security Architecture #
Authentication Flow #
sequenceDiagram
participant n8n
participant Coder
participant Doppler
participant GitHub
n8n->>Coder: SSH (key-based auth)
Coder->>Doppler: Request GITHUB_TOKEN
Doppler->>Coder: Return encrypted token
Coder->>GitHub: gh CLI with token
GitHub->>GitHub: Validate token
GitHub->>Coder: Success
Security Measures #
-
SSH Key Authentication
- n8n → Coder via SSH keys (no passwords)
- Keys stored in n8n credential vault
- Limited to specific user account
-
Secrets Management
- GitHub tokens in Doppler
- Linear API keys in n8n credentials
- Webhook signing keys for validation
-
Least Privilege
- Coder instance: Limited SSH access
- GitHub tokens: Repo and workflow scope only
- Linear API: Read issues + update status only
-
Isolation
- Each worktree is isolated
- No cross-contamination between issues
- Clean state from main branch each time
Scalability Considerations #
Current Limits #
-
Coder Instance: Single EC2 instance
- Can handle ~10 parallel worktrees
- Limited by disk space and memory
-
n8n: Cloud-hosted, auto-scaling
- No practical webhook limits
- Execution queue for SSH operations
-
Claude Code: Rate-limited by API
- Concurrent requests limited
- Long operations may timeout
Future Improvements #
-
Multiple Coder Instances
- Load balancing across instances
- Geographic distribution
- Dedicated instances per team
-
Worktree Cleanup
- Auto-remove merged worktrees
- Archive old worktrees
- Disk space monitoring
-
Caching & Optimization
- Cache common codebase analysis
- Reuse research across similar issues
- Pre-warm frequently used contexts
-
Monitoring & Observability
- Execution time tracking
- Success/failure metrics
- Resource usage dashboards
Error Handling & Recovery #
Error Categories #
1. Input Validation Errors
- Missing required fields
- Invalid JSON format
- Unknown commands
Recovery: Return error JSON immediately, don’t execute
2. Infrastructure Errors
- SSH connection failure
- Claude CLI not found
- Git worktree creation failure
Recovery: Retry with exponential backoff, alert on repeated failures
3. Execution Errors
- Claude generates invalid code
- Tests fail after implementation
- PR creation fails
Recovery: Rollback changes, update Linear with error, allow manual intervention
4. Timeout Errors
- Planning takes too long
- Implementation exceeds timeout
- Claude API timeout
Recovery: Kill process, preserve partial work, update Linear
Retry Strategy #
Attempt 1: Immediate execution
↓ (failure)
Wait 30 seconds
↓
Attempt 2: Retry
↓ (failure)
Wait 2 minutes
↓
Attempt 3: Final retry
↓ (failure)
Alert team + update Linear with error
Performance Characteristics #
Typical Execution Times #
| Operation | Standard Script | Subagent Script | Notes |
|---|---|---|---|
| Planning | 2-5 minutes | 5-15 minutes | Subagent does deeper research |
| Implementation | 5-15 minutes | 10-30 minutes | Depends on complexity |
| PR Submission | 1-2 minutes | N/A | Only standard script |
| Full Workflow | 10-25 minutes | 20-60 minutes | Planning + Implementation + PR |
Resource Usage #
Coder Instance:
- CPU: 2-4 cores during Claude execution
- Memory: 4-8 GB per worktree
- Disk: ~500 MB per worktree
- Network: Minimal (GitHub/API calls only)
n8n:
- Webhook response: <100ms
- SSH overhead: ~1-2 seconds
- Workflow execution: <1 minute (excluding script time)
Monitoring & Debugging #
Log Locations #
Coder Instance:
/home/coder/
├── .claude/logs/ # Claude CLI logs
└── trees/*/logs/ # Script execution logs (if enabled)
n8n:
- Workflow execution logs in n8n UI
- Webhook payload history
- Error traces with stack traces
Linear:
- Issue comment history shows Tofie responses
- Status change history
- Activity timeline
Debug Mode #
Enable debug logging:
DEBUG=1 ./planning.sh < input.json
Output includes:
- Parsed input values
- File paths and existence checks
- Claude CLI prompts and responses
- Git command outputs
Integration Points #
n8n ↔ Coder #
Protocol: SSH Format: JSON stdin/stdout Error Handling: Exit codes + JSON error field
Coder ↔ Claude #
Protocol: CLI invocation Format: Text prompts + file operations Permission Mode: Varies by script
Coder ↔ GitHub #
Protocol: gh CLI (GitHub REST API)
Authentication: Token from Doppler
Operations: PR create, PR list, PR view
n8n ↔ Linear #
Protocol: REST API + Webhooks Authentication: API key + webhook signature Operations: Update status, post comments, read issues
Best Practices #
For Script Developers #
- Always use JSON I/O: Structured data for reliable parsing
- Log to stderr, output to stdout: Keep streams separate
- Include timestamps: Help debugging timing issues
- Handle partial failures: Don’t fail entire workflow for minor issues
- Make scripts idempotent: Safe to run multiple times
For System Administrators #
- Monitor disk space: Worktrees accumulate over time
- Rotate logs: Claude logs can grow large
- Update tokens: GitHub tokens expire periodically
- Test webhooks: Use n8n’s test webhook feature
- Keep n8n workflows versioned: Export and commit to git
For Users #
- Be specific in Linear: Better input → better output
- Review plans before implementing: Catch issues early
- Test draft PRs: Tofie creates drafts for your review
- Provide feedback: Help improve prompts and outputs
- Report issues: Tag DevOps in Linear if something fails
Future Architecture Directions #
Short-term (Q1 2026) #
- Worktree auto-cleanup after PR merge
- Better error notifications (Slack integration)
- Execution time dashboards
- Cost tracking per operation
Medium-term (Q2-Q3 2026) #
- Multi-instance Coder support
- Parallel implementation exploration
- A/B testing different approaches
- Caching and optimization
Long-term (Q4 2026+) #
- Self-healing infrastructure
- Predictive planning (analyze similar issues)
- Auto-tuning prompts based on success rates
- Integration with deployment pipeline