Building a Self-Improving Multi-Agent AI Stack on a Mac Mini
How I built NxtOps - 7 specialized AI agents with a feedback loop that makes the system smarter over time.
I run a marketing team of four people across five business lines. We manage paid media, email sequences, influencer programs, content, SEO, sales enablement, and reporting. The standard approach would be to hire more people or outsource. Instead, I built a team of AI agents.
NxtOps is a self-hosted multi-agent system running on a Mac Mini M2 Pro in my home office. It has seven specialized agents, each with defined permissions, dedicated skills, and a shared feedback loop that makes the entire system smarter over time. The orchestration layer is n8n. The agents run via Claude Code CLI. The memory lives in Qdrant. The tracking lives in NocoDB. The whole thing runs in Docker behind Portainer.
Here's how it works and how to build one yourself.
Architecture Overview
┌─────────────────────────────────────────────────────┐
│ n8n Orchestration │
│ (Cron triggers, webhooks, approval queues) │
└──────────────────────┬──────────────────────────────┘
│ HTTP POST
▼
┌─────────────────────────────────────────────────────┐
│ Agent Bridge (Express.js:3333) │
│ • Per-agent permissions • Kill switch │
│ • Concurrency control (2) • Skill loading │
│ • Startup recovery • Run history │
└──────────────────────┬──────────────────────────────┘
│ Claude Code CLI
▼
┌─────────────────────────────────────────────────────┐
│ 7 Agents (Read-Only) │
│ ops · content · seo · crm · reporting · prospect │
│ · devops │
│ Each: CLAUDE.md + permissions.json + skills/ │
└──────────────────────┬──────────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
NocoDB Qdrant Langfuse
(feedback) (memory) (tracing)
The Permission System
Every agent has a permissions.json that defines exactly what it can do. This is the critical safety layer - agents cannot modify themselves (their directories are mounted read-only in Docker), and each tier restricts what tools are available:
// Example: ops-agent permissions.json
{
"tier": 2,
"allowedTools": ["bash", "mcp"],
"allowedMcp": ["hubspot", "n8n", "nocodb"],
"bashEnabled": true,
"bashAllowList": ["curl", "jq", "cat", "echo", "node", "grep", "wc"],
"maxTurns": 20,
"timeoutMs": 300000
}
Tier 1 agents can only call MCP tools - no bash access. Tier 2 gets scoped bash with an explicit allowlist. Tier 3 (devops only) gets full bash but is restricted to manual triggering with audit logging. No agent can run on a cron schedule at Tier 3.
The Kill Switch
The bridge scans all agent output - both stdout and stderr - for destructive patterns before returning results. If any pattern matches, the run is terminated immediately, logged as CRITICAL, and an alert fires through the watchdog webhook. Patterns that trigger the kill switch:
rm -rf /or home directory deletionDROP TABLE,DROP DATABASE,TRUNCATE TABLEchmod 777, fork bombs, pipe-to-shell patterns- Docker container manipulation (
stop,rm,kill) - System shutdown or reboot commands
- Force push to git repositories
The Self-Improvement Loop
This is what makes NxtOps different from a collection of scripts. The system learns from its own output:
- ops-agent analyzes performance data using the kpi-analysis skill, pulling from HubSpot and Google Ads APIs.
- It generates structured feedback entries with severity (low/medium/high/critical) and confidence scores (0–100).
- Entries are stored in NocoDB in the
agent_feedbacktable with fields for agent, category, recommendation, severity, confidence, and applied status. - Other agents query this table before acting. Their
CLAUDE.mdfiles include instructions to check recent feedback before making recommendations. - Entries are embedded in Qdrant using Ollama (phi4-mini) for semantic search across the feedback corpus.
- Weekly synthesis retires stale entries and identifies systemic patterns. The ops-agent runs this as a scheduled skill.
After 50+ feedback entries, the system's recommendations measurably improve because each agent has access to the cumulative learning of every other agent. The feedback loop was validated end-to-end when the ops-agent proposed a KPI revision (the "Monday Melt" adjustment to account for weekend lead decay) that was confirmed as a real pattern in the data.
n8n Workflow: Watchdog Health Monitor
Schedule Trigger (every 5 minutes)
│
▼
HTTP Request > GET http://agent-bridge:3333/health
│
├── 200 OK > Check 'consecutive_failures' variable
│ ├── Was failing > Send recovery notification email
│ └── Was healthy > Do nothing (exit)
│
└── Non-200 or timeout > Increment failure counter
├── failures < 3 > Log and wait
└── failures >= 3 > Send alert email + Slack notification
The watchdog tracks consecutive failures in an n8n static data variable. After three consecutive failures (15 minutes down), it sends an alert. When the service recovers, it sends a recovery notice. This handles Mac Mini reboots, Docker restarts, and network blips without false positives.
Reboot Resilience
The Mac Mini restarts occasionally - power outages, macOS updates, manual reboots. The system is designed to survive all of these:
- Docker:
restart: alwayson all containers. Portainer manages the stack. - Health check:
curl /healthevery 30 seconds. Container restarts automatically if health check fails 3 times. - Startup recovery: On boot, the bridge validates all agent directories, confirms Claude CLI is accessible, checks log directory exists, and sends a webhook notification when it comes online.
- Persistent data: Feedback lives in NocoDB (external container). Logs persist to a Docker volume. Agent definitions are mounted read-only from the host filesystem.
What I'd Do Differently
Start with 2–3 agents, not 7. Most of the value comes from ops-agent and crm-agent. The others were nice-to-have.
Build the feedback loop from day one. It's the highest-leverage component and gets more valuable the longer it runs.
Use Langfuse tracing from the start. Debugging agent behavior without traces is painful.
Set max concurrency to 2 immediately. Running 4+ Claude Code instances simultaneously hits API rate limits and the Mac Mini's memory ceiling.
Download the Workflow
The Agent Watchdog health monitor workflow referenced in this article is available as a ready-to-import n8n JSON file. It includes the Schedule Trigger, HTTP health check against the agent-bridge /health endpoint, status evaluation logic, and the alert email template. All credentials have been replaced with placeholders.
Download Agent Watchdog Workflow
Requires: SMTP, HTTP access to your agent-bridge /health endpoint.
Edward Chalupa is a digital marketing specialist and founder of Whtnxt, a digital marketing and automation consultancy. Connect with him on LinkedIn or explore more at echalupa.com.