9 min read - Self-Hosting Our Infrastructure: The Observability, Security, and Deployment Stack

Infrastructure & DevOps

We self-host everything. Not out of ideology — out of economics and control.

Managed services are great until your monitoring bill exceeds your compute bill, or you need to inspect a query plan on a database you don't own, or a provider quietly deprecates the feature your pipeline depends on. We've hit all three. So we run our own infrastructure, and for a lean consultancy the math works out heavily in our favor.

This post walks through how we actually operate it — organized not by tool, but by the problem each layer solves. If you read Part 1 on our framework stack, this is what sits underneath it.

The Stack at a Glance

Layer	Tools	Role
Observability	Grafana, Prometheus, Loki, k6, InfluxDB, Beszel	Metrics, logs, alerting, load testing, host monitoring
Data	PostgreSQL, MongoDB, Redis	Relational, document, ephemeral state
Deployment	Docker, Coolify	Containers, CI/CD, zero-downtime deploys
AI inference	Ollama	Local LLM serving and model evaluation
Networking & secrets	Tailscale, Cloudflared, Infisical	Mesh VPN, tunnels, centralized secrets
Backup	Duplicati	Encrypted off-site backups, monthly restore tests

Total cost: under 200 EUR/month. Every tool is open-source.

Seeing What's Happening

You can't operate what you can't see. We solved this first, and it's still the most important layer.

Prometheus scrapes metrics from every service — APIs, databases, background workers, the deployment platform itself — on a 15-second interval. Thirty days of high-resolution data, downsampled after that. Loki handles logs: structured output from every container flows in via Promtail sidecars, indexed by label (service, environment, severity, trace ID) without indexing the full log body. That keeps storage an order of magnitude cheaper than Elasticsearch while still letting us answer "what happened at 3:47 AM on the payments service?" in seconds.

Grafana ties it all together. It's where metrics, logs, and performance test results from InfluxDB land in unified dashboards. It's also our alerting system — P95 latency above 500ms, error rate above 1%, disk at 80%, certificate expiring within a week. When something goes wrong, Grafana is usually the first to know.

Before releases, we run k6 load tests against staging — realistic traffic patterns, auth flows, concurrent form submissions — and stream the results into InfluxDB. Grafana dashboards then show P50/P95/P99 latency trends across releases. "Does it work?" is a different question from "does it work under load?", and managed APM tools answer that too, but at 10x the price.

One gap Prometheus doesn't cover well: bare-metal visibility. CPU steal time, disk I/O saturation, memory pressure, per-interface network throughput. Beszel fills that with lightweight agents on every host, feeding into the same Grafana dashboards. Gives us the hardware perspective that container metrics miss.

The whole observability stack runs on a single 4-core, 8 GB VM. About 1.5 GB of RAM, negligible CPU outside query time.

Three Databases, No Overlap

We run PostgreSQL, MongoDB, and Redis. Each has a job, and they don't share.

PostgreSQL gets everything relational and transactional. User accounts, billing, project metadata, audit logs — anything that needs ACID, foreign keys, or complex joins. We're on PostgreSQL 16 with logical replication to a standby and PgBouncer in front to keep connection counts under control.

MongoDB handles document-shaped data where schema flexibility matters more than relational integrity. Form definitions in FormAI are deeply nested JSON structures that change shape constantly as we iterate. MongoDB lets us evolve those without migration scripts. We also use it for event sourcing where append-only document collections fit the domain naturally.

Redis isn't really a database for us — it's an operational primitive. Sessions, rate limiting, caching, pub/sub for real-time features, job queues via BullMQ. All ephemeral. If Redis vanished, no permanent data would be lost; things would just get slower until it came back.

Yes, three engines is more complex than one. But each one stays in its strength zone, and we avoid the antipattern of forcing relational queries into a document store or cramming cache data into a transactional database.

Shipping Code Without the Platform Tax

Everything ships as a Docker container. No exceptions. Reproducible builds, dev/prod parity, runs anywhere.

For deployment we use Coolify — self-hosted, sits on a $20/month VM, and manages every service we operate. Push to main, Coolify builds the image, runs health checks, does a zero-downtime rolling update. Staging, production, and preview environments are first-class. Traefik handles SSL, routing, and load balancing under the hood. CPU and memory limits per service are visible in a dashboard next to build logs and deployment history.

The platform tax is real and it adds up fast. A service that costs $7/month in raw compute runs $25-50/month on Heroku or Railway once you factor in egress, build minutes, and per-seat pricing. Multiply across 15-20 services and the gap is significant.

The tradeoff: we maintain Coolify ourselves. Updates, config database backups, occasional debugging when a build fails in a weird way. Roughly two hours a month. A fraction of what the platform fee savings cover.

Running Models Locally

Ollama gives us local LLM inference. No API calls, no per-token billing, no data leaving our network.

We run it on a dedicated machine with a modest GPU, serving quantized open models for internal use — code review suggestions, documentation drafts, test data generation, meeting summaries. Not every task needs a frontier model. A 7B-parameter model handles most of this perfectly well, and it responds faster than any API endpoint because there's no network round-trip.

It also doubles as our evaluation platform. When we're assessing open-weight models for a client project, we pull them locally, run benchmarks, compare results — no cloud GPU provisioning, no API key juggling. That speed matters when a client asks "which model should we use?" and wants an answer this week, not next month.

The Security Perimeter

When you self-host, there's no cloud provider abstracting away the network for you. You build the perimeter yourself, or you don't have one.

Tailscale is our mesh VPN. Every server, developer laptop, and CI runner joins the network. Internal services — databases, admin panels, monitoring dashboards — live exclusively on the Tailscale mesh. No public IPs, no open ports on the public internet. WireGuard-based tunnels are fast enough that the latency overhead is imperceptible, and identity-based access controls let us set granular permissions per person and per service.

Cloudflared works the other direction: exposing specific services to the public internet without opening firewall ports. Our public APIs and websites connect outbound to Cloudflare through encrypted tunnels. From the outside, traffic routes through Cloudflare's CDN and DDoS protection. From the inside, the server initiates the connection. No inbound ports. That kills an entire class of attack surface.

Infisical holds the secrets. API keys, database credentials, third-party tokens, encryption keys — all centralized instead of scattered across .env files and CI variable stores. Services pull secrets at runtime via Infisical's agent, injected as environment variables. Rotation, audit logs, per-environment scoping — all built in.

Together these three create a zero-trust posture: every connection authenticated and encrypted, secrets centralized with audit trails, public attack surface limited to Cloudflare-proxied endpoints. It's the same security model enterprises spend six figures implementing with commercial vendors. We get it with three open-source projects and zero licensing fees.

When Things Break

Self-hosting means you own the failure modes. There's no "call support" button. Your backup strategy is your insurance policy, and if you've never tested a restore, you don't have a backup — you have a hope.

Duplicati runs our backup pipeline. Scheduled backups of everything that matters: PostgreSQL logical dumps, MongoDB exports, config files, secret vault exports, Grafana dashboard definitions. Everything is encrypted client-side before being pushed to an off-site S3-compatible object store on a different provider than our primary infrastructure.

The schedule: databases every 6 hours (PostgreSQL via pg_dump, MongoDB via mongodump, both scripted to verify integrity before push). Config and secrets daily. Full VM snapshots weekly, retained for 30 days.

We test restores every month. Spin up a clean VM, pull the latest backup set, restore databases, deploy services through Coolify, run the smoke test suite. Under 45 minutes end to end. The day we skip that test is the day we stop trusting our backups.

The Real Numbers

Here's what we actually pay each month:

Resource	Monthly cost
Primary compute (8 vCPU, 32 GB RAM)	~90 EUR
Observability VM (4 vCPU, 8 GB RAM)	~25 EUR
Off-site backup storage (500 GB)	~10 EUR
Domain and DNS	~5 EUR
Tailscale (free tier covers our team size)	0 EUR
Cloudflare (free tier + tunnels)	0 EUR
Total	~130 EUR/month

Every tool in this stack is open-source. The only costs are compute, storage, and bandwidth. The managed equivalent — Grafana Cloud, managed databases, Heroku or Railway, a commercial secrets manager, a cloud APM tool — would run over 800 EUR/month for the same workload.

The tradeoff is time. We spend 4-6 hours a month on maintenance: updates, monitoring review, backup verification, occasional debugging. For our team size, that math works.

This covered how we keep the infrastructure running. Part 3 covers the business layer — CRM, scheduling, email, content publishing, and the automation that ties them together.

If you're thinking about self-hosting and want to know whether it makes sense for your team size and workload, we've been through it and are happy to share what we learned — including the parts that didn't go smoothly.

We should talk.

Exceev works with startups and SMEs on consulting, open-source tooling, and production-ready software.

Get in touch See how we work

Our offices

Follow us