How I Built a Multi-Agent AI Stack on a Mac Mini

When I started building NxtOps, the goal was simple: a self-hosted AI infrastructure that could handle multiple concurrent agents, each with their own permissions and memory, all orchestrated through a visual workflow engine.

Six months later, it's running seven Claude Code agents, a Qdrant vector database, and n8n workflow automation — all on a $1,200 Mac Mini sitting in my backyard office.

Here's how the stack works and what I've learned building it.

Why Self-Hosted?

The economics are straightforward. Running AI workflows through managed SaaS platforms means per-API-call pricing that compounds fast — especially when you're processing leads, enriching data, and generating content at volume. With a self-hosted stack, my cost drops to just the Claude API calls. Everything else — orchestration, vector storage, logging, tracing — runs on hardware I own.

Beyond cost, self-hosting gives me complete control over data flow. When you're processing client data through AI agents, knowing exactly where that data goes matters. There's no third-party platform logging prompts, storing outputs, or potentially training on your data.

The Hardware

The entire stack runs on a Mac Mini M2 Pro with 32GB RAM. The M-series chips are remarkably efficient for this kind of workload — more than enough to handle seven concurrent agents, a vector database, and a workflow engine simultaneously.

Everything runs in Docker containers managed through Portainer, which gives me a clean web UI for monitoring container health, resource usage, and logs without SSH-ing into the machine every time something needs attention.

Agent Architecture

Each of the seven agents is purpose-built for a specific function, with strict permission boundaries:

Per-agent permissions control what each agent can access and modify. Some agents can write to production databases; others are strictly read-only. This prevents a content generation agent from accidentally modifying lead records, or a data enrichment agent from publishing blog posts.

Dedicated memory via Qdrant vector stores scoped per agent. Each agent has its own namespace in the vector database, giving it persistent memory of past interactions, documents, and context without bleeding into other agents' knowledge spaces. This is the same RAG approach I use in my automated job search pipeline for dynamic content generation.

Tracing through Langfuse captures every LLM call — input prompts, outputs, token usage, latency, and cost. This turned out to be the most valuable component of the entire stack (more on that below).

Logging via NocoDB records all agent outputs for audit trails and quality review. Every piece of content generated, every lead scored, every outreach sequence drafted gets logged in a structured database.

Workflow Orchestration with n8n

n8n handles the choreography between agents. Instead of agents operating independently, n8n defines the sequence, conditions, and routing logic that determines which agent handles which task.

A typical lead processing workflow looks like this:

Trigger: New lead enters HubSpot
Agent 1 (Enrichment): Pulls LinkedIn data via Apify, researches the company, and populates missing fields
Agent 2 (Scoring): Evaluates the lead against an ideal customer profile using RAG over historical win/loss data
Agent 3 (Outreach): Generates a personalized outreach sequence based on the enriched data and score
n8n routing: Sends the scored lead and outreach draft to the appropriate nurture track in HubSpot

Each step is its own n8n node, and each agent call is logged in Langfuse and NocoDB. If an agent produces a questionable output, I can trace back through the entire chain to see exactly what input it received and why it made the decision it did.

Results After Three Months

The numbers speak for themselves:

40% reduction in manual data processing time
85% accuracy on lead scoring vs. manual review
$2,000/month saved vs. an equivalent SaaS stack

The lead scoring accuracy is particularly noteworthy. At 85%, it's not replacing human judgment — but it's handling the first-pass triage that used to consume hours of manual review every week. Humans now focus on the edge cases and high-value decisions.

The cost comparison accounts for the equivalent functionality if I were paying for separate enrichment tools, AI writing platforms, vector database hosting, workflow automation, and observability tools. Self-hosting bundles all of that into hardware I already own plus API costs.

The Observability Lesson

If I were starting this project over, I'd invest significantly more time upfront in the observability layer. Langfuse tracing was an afterthought — something I added three weeks in when I noticed inconsistent outputs and couldn't figure out why.

Once tracing was in place, I could see exactly which prompts produced good results and which didn't, identify token waste where agents were receiving more context than they needed, track cost per workflow execution and optimize the expensive steps, and debug multi-agent handoff issues by following a request through the entire chain.

The tracing data also feeds back into prompt optimization. When I see an agent consistently producing suboptimal outputs for a certain input pattern, I can refine the prompt with concrete examples of what went wrong.

Connecting to the Broader Stack

NxtOps doesn't operate in isolation. It connects to the same content automation systems I use for WordPress publishing, the same Activepieces workflows for social media automation, and the same HubSpot instance that powers marketing pipeline management.

The power of the multi-agent approach is that each agent is a specialist. The enrichment agent is great at research but doesn't write copy. The outreach agent crafts compelling messages but doesn't evaluate lead quality. By keeping responsibilities separated and orchestrating through n8n, each agent stays focused on what it does best.

Key Takeaways

You don't need enterprise infrastructure. A Mac Mini and open-source tools get you 90% of the way to what companies pay six figures for. The barriers to sophisticated AI operations are lower than most people think.

Observability is not optional. Add tracing from day one. You will need it, and retrofitting it is significantly harder than building it in from the start.

Per-agent permissions matter. Without boundaries, multi-agent systems create chaos. Define what each agent can read, write, and modify before you deploy anything.

Self-hosting compounds in value. The upfront investment in setup pays back every month in reduced SaaS costs, increased control, and the ability to customize every piece of the stack to your exact needs.

This is the systems-thinking approach to marketing technology that I've been building toward — infrastructure that compounds in value rather than campaigns that expire.

Edward Chalupa is a digital marketing specialist and founder of Whtnxt, a digital marketing and automation consultancy. Connect with him on LinkedIn or explore more at echalupa.com.

How I Built a Multi-Agent AI Stack on a Mac Mini

Why Self-Hosted?

The Hardware

Agent Architecture

Workflow Orchestration with n8n

Results After Three Months

The Observability Lesson

Connecting to the Broader Stack

Key Takeaways

Related Posts

Bots Now Outnumber Humans Online. We Built for the Wrong Half.

Local AI Models: A Plain-English Guide to the Jargon

Connect Google Search Console to Claude Code with an MCP Server

Found this useful?