Local LLM Setup: Self-Hosted AI Infrastructure

Self-hosted AI infrastructure providing private, offline capabilities. No cloud dependencies, no subscriptions, complete data sovereignty. Ollama, Qwen 2.5:14B, OpenWebUI, RAG integration.

Project Type: Personal Infrastructure | AI Implementation
Status: Production (2023-Present)
Tech Stack: Ollama, Qwen 2.5:14B, OpenWebUI, macOS LaunchAgents, RAG


The Problem: Dependency and Privacy

If you use AI daily, you're dependent on someone else's infrastructure.

The reality of cloud AI:

  • Privacy concerns: Every conversation sent to cloud servers
  • Subscription dependency: Pay monthly or lose access
  • Internet requirement: No connection = no AI
  • Data sovereignty: Someone else controls your data
  • Service changes: Features removed, prices raised, terms changed without notice
  • Platform risk: What if the service shuts down?

For casual use, this is fine. For building systems, writing daily, or working with sensitive information—it's a problem.

The question: What if AI could run on your machine, with your data, under your control?


The Solution: Self-Hosted AI Infrastructure

The Local LLM Setup is a self-hosted AI infrastructure running on Mac Mini that provides private, offline AI capabilities.

Architecture

Hardware:

  • Mac Mini (M-series Apple Silicon for efficient inference)
  • Local storage for models and data
  • No cloud connectivity required

Core Components:

1. Ollama (Port 11434)

  • Local model inference engine
  • Primary model: Qwen 2.5:14B (14 billion parameters)
  • Models stored locally, no cloud calls

2. OpenWebUI (Port 3000)

  • ChatGPT-like web interface
  • Clean, familiar UX for AI conversations
  • Knowledge base integration (RAG)

3. MCP Bridge (Port 11620)

  • Model Context Protocol filesystem access
  • Integration with AI Memory System

Management:

  • Auto-starts via macOS LaunchAgents (survives reboots)
  • Managed with custom rtai command
  • Service monitoring and restart capabilities

Technical Highlights

Model Selection: Qwen 2.5:14B

Why Qwen 2.5:14B?

  • 14B parameters balances quality and speed
  • Multilingual, strong instruction following
  • Efficiently runs on Apple Silicon
  • Comparable to GPT-3.5 for many tasks
  • Open weights, freely available

RAG Knowledge Base Integration

Continuous Sync:

  • Watcher script monitors /articles folder
  • Automatically uploads new/changed files to OpenWebUI knowledge base
  • Enables RAG-powered responses about my writing and projects

Result: Local AI that knows my work without manual context-loading.

Auto-Start System

macOS LaunchAgents:

  • Services start automatically on boot
  • Run in background (no manual intervention)
  • Managed with custom rtai command

Reliability: System runs 24/7, always available when needed.


Integration with AI Memory System

The Local LLM Setup integrates seamlessly with the AI Memory System:

Shared Context:

  • Local LLM reads memory.jsonl for project context
  • Writes new memories back to ledger
  • Same structured format as Claude/ChatGPT

Workflow:

  1. Start conversation in Claude (cloud AI)
  2. Switch to local LLM for sensitive work
  3. All context preserved via memory.jsonl
  4. Continue conversation offline

Privacy Benefit: Sensitive discussions stay local while leveraging cross-platform context.


The Results

Data Sovereignty:

  • 100% of conversations stay on my machine
  • No third-party access to my data
  • Complete control over what gets stored

Offline Capability:

  • AI works without internet connection
  • Travel, outages, privacy-sensitive work—always available

Cost Savings:

  • $0/month ongoing costs (vs $20-80/month for cloud AI)
  • One-time hardware investment
  • No subscription lock-in

Performance:

  • Fast enough for interactive use
  • Quality comparable to GPT-3.5 for most use cases
  • Handles complex tasks (code, analysis, writing)

Integration:

  • Works with AI Memory System for cross-platform context
  • RAG integration for my writing and projects
  • MCP filesystem access for file operations

Why This Matters

This setup implements two pillars from my 7 Pillars framework:

Pillar 4 (Knowledge Stewardship):

"Systems that don't require permission. Knowledge management infrastructure you control."

Pillar 5 (Communication Independence):

"Own your channels. AI built to work for you, not on you. Data sovereignty."

Philosophy: Building systems that reduce dependency on fragile centralized infrastructure while still living in the world as it is.

For AI Implementation: Hands-On Infrastructure

This project demonstrates practical AI infrastructure skills:

  • Self-hosted model deployment
  • Service orchestration and management
  • RAG (Retrieval-Augmented Generation) integration
  • Auto-start and reliability engineering
  • Local-first architecture design
  • MCP integration for filesystem access
  • Performance optimization for Apple Silicon

Real-world application: Not just theory—production system running 24/7 for real work.


Lessons Learned

Start with Quality Models: Qwen 2.5:14B hits the sweet spot for local inference. Smaller models (7B) feel limiting; larger models (70B+) are slow on consumer hardware.

Auto-Start is Essential: Manual service management gets old fast. LaunchAgents make the system reliable and maintainable.

RAG Adds Huge Value: Local knowledge base integration transforms generic AI into personalized assistant that knows your work.

Local ≠ Weak: With the right model and hardware, local LLMs can match cloud AI for many tasks. Not every conversation needs GPT-4.

Privacy is Freedom: Once you experience AI that doesn't phone home, it's hard to go back. The peace of mind is worth the setup effort.

Cloud Still Has a Place: Local LLMs complement cloud AI, they don't replace it. I use both, depending on the task and privacy needs.


The Bigger Picture

The Local LLM Setup is part of my broader OfflineAI infrastructure, which includes:

  • AI Memory System: Cross-platform context management (JSONL ledger)
  • Local LLM: Self-hosted inference (this project)
  • RAG Knowledge Base: Continuous sync for my writing
  • MCP Integration: Filesystem access for AI tools

Together, these projects form a local-first AI workflow that prioritizes:

  • Privacy
  • Sovereignty
  • Resilience
  • Integration
  • Practical utility

Goal: AI that works for me, not on me, with data where it belongs—on my machine.


Related Projects:

Writing:

Get in Touch:
Interested in self-hosted AI infrastructure for your team? Let's talk.

Subscribe to my monthly newsletter

No spam, no sharing to third party. Only you and me.