Feb 03, 2026 Projects

Local LLM Setup: Self-Hosted AI Infrastructure

Self-hosted AI infrastructure providing private, offline capabilities. No cloud dependencies, no subscriptions, complete data sovereignty. Ollama, Qwen 2.5:14B, OpenWebUI, RAG integration.

Project Type: Personal Infrastructure | AI Implementation
Status: Production (2023-Present)
Tech Stack: Ollama, Qwen 2.5:14B, OpenWebUI, macOS LaunchAgents, RAG

The Problem: Dependency and Privacy

If you use AI daily, you're dependent on someone else's infrastructure.

The reality of cloud AI:

Privacy concerns: Every conversation sent to cloud servers
Subscription dependency: Pay monthly or lose access
Internet requirement: No connection = no AI
Data sovereignty: Someone else controls your data
Service changes: Features removed, prices raised, terms changed without notice
Platform risk: What if the service shuts down?

For casual use, this is fine. For building systems, writing daily, or working with sensitive information—it's a problem.

The question: What if AI could run on your machine, with your data, under your control?

The Solution: Self-Hosted AI Infrastructure

The Local LLM Setup is a self-hosted AI infrastructure running on Mac Mini that provides private, offline AI capabilities.

Architecture

Hardware:

Mac Mini (M-series Apple Silicon for efficient inference)
Local storage for models and data
No cloud connectivity required

Core Components:

1. Ollama (Port 11434)

Local model inference engine
Primary model: Qwen 2.5:14B (14 billion parameters)
Models stored locally, no cloud calls

2. OpenWebUI (Port 3000)

ChatGPT-like web interface
Clean, familiar UX for AI conversations
Knowledge base integration (RAG)

3. MCP Bridge (Port 11620)

Model Context Protocol filesystem access
Integration with AI Memory System

Management:

Auto-starts via macOS LaunchAgents (survives reboots)
Managed with custom rtai command
Service monitoring and restart capabilities

Technical Highlights

Model Selection: Qwen 2.5:14B

Why Qwen 2.5:14B?

14B parameters balances quality and speed
Multilingual, strong instruction following
Efficiently runs on Apple Silicon
Comparable to GPT-3.5 for many tasks
Open weights, freely available

RAG Knowledge Base Integration

Continuous Sync:

Watcher script monitors /articles folder
Automatically uploads new/changed files to OpenWebUI knowledge base
Enables RAG-powered responses about my writing and projects

Result: Local AI that knows my work without manual context-loading.

Auto-Start System

macOS LaunchAgents:

Services start automatically on boot
Run in background (no manual intervention)
Managed with custom rtai command

Reliability: System runs 24/7, always available when needed.

Integration with AI Memory System

The Local LLM Setup integrates seamlessly with the AI Memory System:

Shared Context:

Local LLM reads memory.jsonl for project context
Writes new memories back to ledger
Same structured format as Claude/ChatGPT

Workflow:

Start conversation in Claude (cloud AI)
Switch to local LLM for sensitive work
All context preserved via memory.jsonl
Continue conversation offline

Privacy Benefit: Sensitive discussions stay local while leveraging cross-platform context.

The Results

Data Sovereignty:

100% of conversations stay on my machine
No third-party access to my data
Complete control over what gets stored

Offline Capability:

AI works without internet connection
Travel, outages, privacy-sensitive work—always available

Cost Savings:

$0/month ongoing costs (vs $20-80/month for cloud AI)
One-time hardware investment
No subscription lock-in

Performance:

Fast enough for interactive use
Quality comparable to GPT-3.5 for most use cases
Handles complex tasks (code, analysis, writing)

Integration:

Works with AI Memory System for cross-platform context
RAG integration for my writing and projects
MCP filesystem access for file operations

Why This Matters

This setup implements two pillars from my 7 Pillars framework:

Pillar 4 (Knowledge Stewardship):

"Systems that don't require permission. Knowledge management infrastructure you control."

Pillar 5 (Communication Independence):

"Own your channels. AI built to work for you, not on you. Data sovereignty."

Philosophy: Building systems that reduce dependency on fragile centralized infrastructure while still living in the world as it is.

For AI Implementation: Hands-On Infrastructure

This project demonstrates practical AI infrastructure skills:

Self-hosted model deployment
Service orchestration and management
RAG (Retrieval-Augmented Generation) integration
Auto-start and reliability engineering
Local-first architecture design
MCP integration for filesystem access
Performance optimization for Apple Silicon

Real-world application: Not just theory—production system running 24/7 for real work.

Lessons Learned

Start with Quality Models: Qwen 2.5:14B hits the sweet spot for local inference. Smaller models (7B) feel limiting; larger models (70B+) are slow on consumer hardware.

Auto-Start is Essential: Manual service management gets old fast. LaunchAgents make the system reliable and maintainable.

RAG Adds Huge Value: Local knowledge base integration transforms generic AI into personalized assistant that knows your work.

Local ≠ Weak: With the right model and hardware, local LLMs can match cloud AI for many tasks. Not every conversation needs GPT-4.

Privacy is Freedom: Once you experience AI that doesn't phone home, it's hard to go back. The peace of mind is worth the setup effort.

Cloud Still Has a Place: Local LLMs complement cloud AI, they don't replace it. I use both, depending on the task and privacy needs.

The Bigger Picture

The Local LLM Setup is part of my broader OfflineAI infrastructure, which includes:

AI Memory System: Cross-platform context management (JSONL ledger)
Local LLM: Self-hosted inference (this project)
RAG Knowledge Base: Continuous sync for my writing
MCP Integration: Filesystem access for AI tools

Together, these projects form a local-first AI workflow that prioritizes:

Privacy
Sovereignty
Resilience
Integration
Practical utility

Goal: AI that works for me, not on me, with data where it belongs—on my machine.

Related Projects:

AI Memory System - Cross-platform knowledge management
NeighborhoodShare - Tool-sharing with AI categorization

Writing:

Resilient Tomorrow - Community resilience and parallel systems
Velocity Partners - Fractional PMO with AI-augmented workflows

Get in Touch:
Interested in self-hosted AI infrastructure for your team? Let's talk.

Subscribe to my monthly newsletter

No spam, no sharing to third party. Only you and me.