7 min read - Open-Source RAG Starter Stack for Growing Companies

Open-Source RAG Stack

A lot of teams start RAG with a simple request: "Can we chat with our docs?"

Then reality shows up. The wrong policy gets retrieved. A restricted doc leaks into an answer. The model sounds confident, but the source is stale. Users try it twice and go back to Slack.

If you're building a RAG stack for your business, the goal is not to pick the fanciest framework. The goal is to ship a system that is reliable, permission-aware, and measurable.

What you'll learn

The minimum viable RAG architecture for a growing company
The quality levers that matter (and the ones that do not)
How to handle permissions and sensitive data
How to evaluate RAG so trust does not drift
A checklist you can use before you ship

TL;DR

An open-source RAG stack works when retrieval is treated as a product: clean ingestion, permission-aware filtering, and evaluation you run continuously. Start with a minimal pipeline (ingest -> index -> retrieve -> generate), add metadata and citations, then operationalize with monitoring and a rollback path. Most failures are retrieval and access control, not the model.

Open source RAG stack for business: start with the job

Before you choose tools, answer two questions:

Who is asking questions (roles, teams, tenants)?
What is an acceptable failure mode (wrong answer, no answer, human review)?

If you are building internal knowledge search, “no answer” is often acceptable. If you are building customer support, “wrong answer” is expensive.

Write a one-paragraph product definition:

Workflow: who asks what
Allowed data sources
Restricted sources
What must be cited
When to escalate to a human

This definition will do more for quality than any framework choice.

Minimal viable architecture (ingest -> index -> retrieve -> generate)

A basic RAG system has five parts:

Ingestion: where docs come from (docs, tickets, CRM exports)
Parsing: extract text and structure
Chunking: split content into retrievable units
Indexing: store embeddings and metadata
Retrieval + generation: fetch relevant chunks and produce an answer

You can build a starter stack with common open-source building blocks:

Storage: your existing database (often Postgres) for metadata
Vector index: a vector database or vector extension
Orchestration: a thin service layer (avoid over-abstracting early)

Do not over-invest in orchestration until you know what quality problems you are solving.

Picking open-source components (avoid framework addiction)

Teams often get stuck debating libraries. A better approach is to choose based on the constraints you already know:

Where will metadata and ACLs live? If you already run Postgres reliably, use it for metadata and treat vector search as a capability, not a separate product religion.
How will you ingest documents? If ingestion is fragile, everything downstream will look “like the model is bad.”
Can you version the index? If you can’t roll back an index build, you’ll be afraid to improve the pipeline.

The stack is less about brand names and more about whether you can operate it: deploy it, monitor it, and debug it when retrieval goes weird.

Quality levers that actually move outcomes

When users say “RAG is bad,” they usually mean one of these is broken.

1) Chunking and metadata

Chunking is not a fixed number. It is a design choice.

Chunk by meaning (sections, headings), not only by token count
Add metadata: source, owner, date, tenant, ACL
Store a stable document ID and version

Ingestion pitfalls that look like “LLM hallucination”

Before you tune prompts, check ingestion. These issues cause confident wrong answers:

the parser dropped headings and tables, so chunks lost meaning
old docs were indexed twice, and retrieval prefers the wrong version
timestamps and owners are missing, so stale policy wins retrieval

Add basic ingestion validation: count docs, diff versions, and spot-check a few parsed outputs every index build.

2) Retrieval strategy

Start simple, then add complexity when you can prove it helps.

Use hybrid retrieval when keyword matching matters
Add reranking when you have many near-matches
Force citations when trust matters

Debugging retrieval (a simple method)

When an answer is wrong, isolate whether retrieval or generation caused it:

Inspect the top retrieved chunks (titles, dates, owners, ACL).
Ask: “Were the right sources retrieved?” If no, fix retrieval: metadata, query rewriting, chunking, or filters.
If yes, ask: “Did generation use the sources correctly?” If no, fix prompting/output constraints and add rubric tests for grounding.

This saves teams from endlessly swapping models when the real bug is “we retrieved the wrong document.”

3) Grounding and refusal

Decide what happens when retrieval is weak.

If confidence is low: answer with “I do not know” + show top sources
If sources conflict: show both and ask a clarifying question
If the doc is restricted: refuse and point to the right access path

Permissions and security (do this early)

Permissions are not a UI feature. They are part of retrieval.

Practical rules:

Store ACL metadata with each chunk (tenant, team, document visibility)
Filter retrieval by ACL at query time
Redact sensitive fields during ingestion (or keep them in a separate store)
Log access: who asked, what sources were used (without leaking content)

If you cannot guarantee permission filtering, do not ship the workflow.

Multi-team and multi-tenant reality (permissions are a product feature)

If multiple teams use the system, you need a permissions model early, not “we’ll add it later.”

Practical approach:

store tenant/team IDs on every chunk
treat ACL filtering as mandatory in retrieval queries
test permissioning with an explicit evaluation set (“User A must not see Doc B”)

This is one of the places open-source stacks can fail silently: everything “works” until someone sees something they shouldn’t.

Evaluation and operations (how you keep trust)

RAG systems drift. Docs change. Policies update. Embeddings change.

Treat evaluation like a unit test suite.

A minimal evaluation loop

Collect 20 to 50 real questions
For each question: define expected source docs (or doc sections)
Score two layers:
- retrieval: did we fetch the right source?
- answer: is the response correct and grounded?

Run the suite weekly and after major changes.

Fallbacks and rollback

Provide a search-only fallback
Provide human escalation for high-risk queries
Keep the ability to roll back an index version

Index refresh and versioning (the boring part that saves you)

Growing companies change fast: docs move, policies change, and ownership shifts. If your index refresh is ad-hoc, users will get stale answers and lose trust.

Two practical rules:

Define a refresh cadence per source. Some sources can refresh nightly; others weekly. Tie it to how often the underlying docs change.
Version index builds. If a new build makes retrieval worse, you need the ability to roll back to the previous index quickly.

This is also where “open-source stack” discipline matters: you’re responsible for operating the pipeline, not just building the demo.

RAG readiness checklist

Use this before you let the workflow touch important decisions.

Workflow:
- Owner named:
- Allowed data sources:
- Restricted sources:

Security:
- ACL filtering implemented and tested:
- Redaction policy:
- Audit logging:

Quality:
- Evaluation set created:
- Citation policy:
- Fallback behavior:

Operations:
- Index refresh plan:
- Monitoring and alerts:
- Rollback plan:

Treat retrieval as a product

An open-source RAG stack is a trust system. Treat retrieval, permissions, and evaluation as first-class, and your users will rely on it. Treat it like a demo, and it will quietly die. Need help building a production RAG system? Let's talk.

Thinking about AI for your team?

We help companies move from prototype to production — with architecture that lasts and costs that make sense.

Talk to us How we work

Our offices

Follow us