Engineering principles
The decisions that shaped the stack โ and that guide every future choice.
Open by default
Prefer open-source over SaaS where reliability is comparable. You can always self-host the full stack.
Data sovereignty first
Your agent conversations, documents, and tenant data never leave the database without your explicit action.
Model agnosticism
LLM providers are swappable. Groq, Gemini, Anthropic, or local Ollama โ change a config line, not code.
Multi-tenant by design
Tenant isolation at the database schema level โ not just a WHERE clause โ so data leakage is structurally impossible.
Lean runtime
No heavyweight frameworks. Vanilla JS frontend keeps the client bundle under 50 kB with zero build tooling.
Auditable logs
Every agent action and tool call is written to an audit log. You can reconstruct exactly what any agent did and why.
Backend & API
Python 3.11
Language
The application runtime. Chosen for its mature AI/ML ecosystem, readable code, and broad library support for every integration we need.
Flask
Web framework
Lightweight WSGI framework handling all HTTP routing, blueprints, and session management. Jinja2 templating for server-rendered pages.
Gunicorn
WSGI server
Production WSGI server running 3 worker processes behind Nginx. Handles concurrent requests with graceful restarts on deploy.
Flask-Login
Auth sessions
Session-based authentication with secure cookie storage, login_required decorators, and remember-me persistent sessions.
Data Layer
PostgreSQL 16
Primary database
All structured data โ users, tenants, agents, tasks, schedules, chat history. Multi-tenant isolation via per-tenant schemas (
tenant_{id}).pgvector
Vector search
PostgreSQL extension for storing and querying 768-dim embedding vectors. Powers Knowledge Base and Memoria semantic search without a separate vector DB. All tenants unified at 768-dim for consistent cosine similarity.
psycopg2
DB driver
Battle-tested synchronous PostgreSQL adapter. Used with connection pooling and RealDictCursor for ergonomic row-as-dict access patterns.
Docker
Database container
PostgreSQL runs in a Docker container on the VPS with persistent volume mounts. Isolated from the application process for clean upgrades. Enterprise self-hosted deployments use a full Docker Compose stack (pgvector, LiteLLM, AEGIS app, nginx, optional Ollama) for a complete on-premise setup in 2โ3 hours.
AI & Agent Inference
OpenClaw Gateway
Agent orchestrator
Internal agent orchestration layer. Routes messages to the correct agent, loads SOUL personality profiles, and manages the tool-calling loop.
LiteLLM Proxy
Model router
Translates between API formats. Enables seamless switching between Groq, Gemini, Anthropic Claude, and local Ollama with automatic fallback chains.
Groq
Primary LLM provider
Ultra-fast inference via Llama 3.3 70B. Used as the primary model for all 7 agents โ fast enough for real-time chat, powerful enough for complex reasoning.
Google Gemini
Fallback LLM
Gemini 1.5 Flash serves as automatic fallback when Groq hits rate limits. Transparent to the user โ the agent response is the same quality.
Ollama
Local inference
Runs local models (Qwen 2.5 Coder, nomic-embed-text) for embedding generation and code tasks. Zero egress cost. Fully air-gapped option for sensitive deployments.
text-embedding-3-small
Embedding model
1536-dim embeddings for Knowledge Base ingestion and semantic search. Can be replaced with nomic-embed-text via Ollama for a fully local setup.
Integrations & Channels
Telegram Bot API
Primary notification channel
Agents deliver results, workflow completions, and Boardroom syntheses directly to your Telegram. Long-polling keeps it free of webhooks and public endpoints.
Google OAuth 2.0
Auth & workspace
Two separate flows: Google Sign-In for authentication (PKCE flow, no client secret in browser) and Google Workspace OAuth for Calendar/Gmail agent access.
HubSpot OAuth
CRM integration
OAuth 2.0 connection to HubSpot CRM. Sales agent reads contacts and company data to enrich outreach. Tokens stored encrypted per tenant.
HeyGen
AI video generation
Luna (Creative Producer) generates personalized avatar videos from agent-written scripts. Completed videos are delivered via Telegram.
ElevenLabs
AI voice synthesis
Voice clone synthesis for audio briefings. Faster than video โ agents can produce audio updates in seconds and deliver them as voice messages.
Gmail SMTP/IMAP
Email channel
Inbound email parsing and outbound delivery via Gmail. Agents can read, draft, and send on behalf of your connected inbox.
Microsoft Bot Framework
Teams connector โ Enterprise
Azure Bot resource + Bot Connector REST API for Microsoft Teams. Employees message the AEGIS bot in Teams exactly as they message each other โ all processing happens server-side. Token cache is thread-safe for multi-worker deployments. Enabled via
TEAMS_ENABLED=true env var.python3-saml
SAML 2.0 SSO โ Enterprise
OneLogin's python3-saml library implements SAML 2.0 SP-initiated SSO. Enterprise customers paste their Azure AD / Entra Federation Metadata XML once โ employees then log in with existing corporate credentials. SP certificate generated with OpenSSL, stored in environment variables.
Frontend & UI
Vanilla JS + CSS
Client-side
No frameworks, no build step, no hydration latency. The entire client bundle is under 50 kB. Pages render in <200 ms even on slow connections.
CSS Custom Properties
Design system
Theming via CSS variables โ light/dark mode, brand colours, spacing. Instant theme switching with zero flash via an inline anti-flash script in
<head>.Jinja2
Server-side rendering
Flask's default template engine. All pages are server-rendered HTML โ fast first paint, excellent SEO, no client-side routing complexity.
Inter (Google Fonts)
Typography
Variable-weight Inter for all UI text. Loaded asynchronously with a system-font fallback stack to prevent layout shift.
Infrastructure & DevOps
VPS (Linux)
Compute
Dedicated VPS running Ubuntu. Single-tenant capable โ one AEGIS instance per customer is possible for maximum isolation and compliance.
Nginx
Reverse proxy
Handles TLS termination, HTTPโHTTPS redirects, static file serving, and proxying to Gunicorn. Security headers applied at the Nginx layer.
Let's Encrypt / HTTPS
TLS
Auto-renewed TLS certificates via Certbot. HSTS enabled. All traffic is encrypted in transit โ no plain-HTTP fallback in production.
systemd
Process management
Gunicorn and the scheduler run as systemd services with auto-restart on failure. Deployments are a git pull + service restart โ zero downtime in <5 seconds.
GitHub
Version control & deploy
All application code is versioned on GitHub. Deploys are manual git pulls โ giving you full control and audit trail of exactly what ran when.
APScheduler
Cron scheduler
In-process cron scheduler for agent task and workflow schedules. Runs in the same Python process โ no Celery, no Redis, no separate broker to manage.
Document Processing (Knowledge Base)
PyPDF2
PDF parsing
Extracts text from PDF files page-by-page. Pure Python โ no external binary dependencies.
python-docx
Word document parsing
Reads .docx files paragraph by paragraph. Supports modern OOXML format (Word 2007+).
openpyxl
Spreadsheet parsing
Reads .xlsx files sheet-by-sheet with read-only mode for memory efficiency. Extracts cell values as plain text.
python-pptx
Presentation parsing
Extracts text from .pptx slide shapes. Each slide becomes a labelled section in the Knowledge Base chunk.
Skills Engine & Memoria
Skills Engine
Python singleton (skill_engine.py)
Runs versioned business procedures โ 23 skills total (8 system + 15 business templates) with per-tenant overrides, execution logs, and a self-improving feedback loop. Three skills auto-inject live CRM or Memoria context before the LLM call. Auto-Learn Mode (Pro+) runs nightly: detects underperforming skills (avg rating < -0.2 over โฅ3 runs), generates an improved prompt via LLM, auto-applies it, and notifies via Telegram.
Memoria (Knowledge Graph)
pgvector + D3.js force graph
Stores agent insights, vault documents, and user knowledge as typed nodes with edges. Semantic search via pgvector cosine similarity. D3 force-directed graph for exploration. Crystallization service extracts 2-3 insights per agent response automatically.
Crystallization Service
Background LLM extraction
Fire-and-forget daemon threads extract business insights from every agent response. Rate-limited to 5 extractions per 10 min per tenant. Semantic linking finds related Memoria nodes and creates 'crystallized_from' edges automatically.
Service Guardian
Heartbeat watchdog (guardian_service.py)
All background workers write heartbeats to PostgreSQL every loop cycle. Guardian detects stale workers, evicts zombie advisory locks via pg_terminate_backend, and sends admin Telegram alerts. Admin panel shows live service health.
Trust Infrastructure
agent_runs ยท prompt_hash ยท audit_logs
Every LLM call is logged to a per-tenant
agent_runs table: agent, model, trigger type, input/output tokens, latency, cost, and success flag. Every skill run records a prompt_hash (SHA256[:16] of the template) and a schema_valid boolean โ so you always know which prompt version produced which output. Audit events (skill_executed, workflow_triggered, scheduled_task_fired) are written to audit_logs per tenant. 90-day retention, cleaned nightly by the guardian. Migration serialised with pg_advisory_xact_lock to prevent DDL races across gunicorn workers.