Local LLM Setup: Self-Hosted AI Infrastructure
Project Type: Personal Infrastructure | AI Implementation
Status: Production (2023-Present)
Tech Stack: Ollama, Qwen 2.5:14B, OpenWebUI, macOS LaunchAgents, RAG
The Problem: Dependency and Privacy
If you use AI daily, you're dependent on someone else's infrastructure.
The reality of cloud AI:
- Privacy concerns: Every conversation sent to cloud servers
- Subscription dependency: Pay monthly or lose access
- Internet requirement: No connection = no AI
- Data sovereignty: Someone else controls your data
- Service changes: Features removed, prices raised, terms changed without notice
- Platform risk: What if the service shuts down?
For casual use, this is fine. For building systems, writing daily, or working with sensitive information—it's a problem.
The question: What if AI could run on your machine, with your data, under your control?
The Solution: Self-Hosted AI Infrastructure
The Local LLM Setup is a self-hosted AI infrastructure running on Mac Mini that provides private, offline AI capabilities.
Architecture
Hardware:
- Mac Mini (M-series Apple Silicon for efficient inference)
- Local storage for models and data
- No cloud connectivity required
Core Components:
1. Ollama (Port 11434)
- Local model inference engine
- Primary model: Qwen 2.5:14B (14 billion parameters)
- Models stored locally, no cloud calls
2. OpenWebUI (Port 3000)
- ChatGPT-like web interface
- Clean, familiar UX for AI conversations
- Knowledge base integration (RAG)
3. MCP Bridge (Port 11620)
- Model Context Protocol filesystem access
- Integration with AI Memory System
Management:
- Auto-starts via macOS LaunchAgents (survives reboots)
- Managed with custom
rtaicommand - Service monitoring and restart capabilities
Technical Highlights
Model Selection: Qwen 2.5:14B
Why Qwen 2.5:14B?
- 14B parameters balances quality and speed
- Multilingual, strong instruction following
- Efficiently runs on Apple Silicon
- Comparable to GPT-3.5 for many tasks
- Open weights, freely available
RAG Knowledge Base Integration
Continuous Sync:
- Watcher script monitors
/articlesfolder - Automatically uploads new/changed files to OpenWebUI knowledge base
- Enables RAG-powered responses about my writing and projects
Result: Local AI that knows my work without manual context-loading.
Auto-Start System
macOS LaunchAgents:
- Services start automatically on boot
- Run in background (no manual intervention)
- Managed with custom
rtaicommand
Reliability: System runs 24/7, always available when needed.
Integration with AI Memory System
The Local LLM Setup integrates seamlessly with the AI Memory System:
Shared Context:
- Local LLM reads memory.jsonl for project context
- Writes new memories back to ledger
- Same structured format as Claude/ChatGPT
Workflow:
- Start conversation in Claude (cloud AI)
- Switch to local LLM for sensitive work
- All context preserved via memory.jsonl
- Continue conversation offline
Privacy Benefit: Sensitive discussions stay local while leveraging cross-platform context.
The Results
Data Sovereignty:
- 100% of conversations stay on my machine
- No third-party access to my data
- Complete control over what gets stored
Offline Capability:
- AI works without internet connection
- Travel, outages, privacy-sensitive work—always available
Cost Savings:
- $0/month ongoing costs (vs $20-80/month for cloud AI)
- One-time hardware investment
- No subscription lock-in
Performance:
- Fast enough for interactive use
- Quality comparable to GPT-3.5 for most use cases
- Handles complex tasks (code, analysis, writing)
Integration:
- Works with AI Memory System for cross-platform context
- RAG integration for my writing and projects
- MCP filesystem access for file operations
Why This Matters
This setup implements two pillars from my 7 Pillars framework:
Pillar 4 (Knowledge Stewardship):
"Systems that don't require permission. Knowledge management infrastructure you control."
Pillar 5 (Communication Independence):
"Own your channels. AI built to work for you, not on you. Data sovereignty."
Philosophy: Building systems that reduce dependency on fragile centralized infrastructure while still living in the world as it is.
For AI Implementation: Hands-On Infrastructure
This project demonstrates practical AI infrastructure skills:
- Self-hosted model deployment
- Service orchestration and management
- RAG (Retrieval-Augmented Generation) integration
- Auto-start and reliability engineering
- Local-first architecture design
- MCP integration for filesystem access
- Performance optimization for Apple Silicon
Real-world application: Not just theory—production system running 24/7 for real work.
Lessons Learned
Start with Quality Models: Qwen 2.5:14B hits the sweet spot for local inference. Smaller models (7B) feel limiting; larger models (70B+) are slow on consumer hardware.
Auto-Start is Essential: Manual service management gets old fast. LaunchAgents make the system reliable and maintainable.
RAG Adds Huge Value: Local knowledge base integration transforms generic AI into personalized assistant that knows your work.
Local ≠ Weak: With the right model and hardware, local LLMs can match cloud AI for many tasks. Not every conversation needs GPT-4.
Privacy is Freedom: Once you experience AI that doesn't phone home, it's hard to go back. The peace of mind is worth the setup effort.
Cloud Still Has a Place: Local LLMs complement cloud AI, they don't replace it. I use both, depending on the task and privacy needs.
The Bigger Picture
The Local LLM Setup is part of my broader OfflineAI infrastructure, which includes:
- AI Memory System: Cross-platform context management (JSONL ledger)
- Local LLM: Self-hosted inference (this project)
- RAG Knowledge Base: Continuous sync for my writing
- MCP Integration: Filesystem access for AI tools
Together, these projects form a local-first AI workflow that prioritizes:
- Privacy
- Sovereignty
- Resilience
- Integration
- Practical utility
Goal: AI that works for me, not on me, with data where it belongs—on my machine.
Related Projects:
- AI Memory System - Cross-platform knowledge management
- NeighborhoodShare - Tool-sharing with AI categorization
Writing:
- Resilient Tomorrow - Community resilience and parallel systems
- Velocity Partners - Fractional PMO with AI-augmented workflows
Get in Touch:
Interested in self-hosted AI infrastructure for your team? Let's talk.
No spam, no sharing to third party. Only you and me.