7 min read - Statement of Work Template for AI Projects (Founders + Procurement)

AI Contracting and Scope

Most AI projects do not fail because the team cannot build.

They fail because the agreement is unclear.

A vague statement of work creates predictable problems: “just one more feature,” unclear data access, no acceptance criteria for outputs, and a debate about who owns maintenance.

What you'll learn

The sections every AI SOW must include
How to write acceptance criteria for LLM-enabled workflows
The scope boundaries that prevent scope creep
Security and data boundary clauses procurement expects
A reusable AI project SOW template

TL;DR

A strong AI project SOW defines scope boundaries, acceptance criteria, and a change-control rule. For LLM workflows, acceptance criteria should include an evaluation set, quality thresholds, refusal/citation requirements, and a rollback path. Procurement also needs clear data handling, IP, and support terms. A good SOW turns an AI project into a measurable delivery system.

AI project SOW template: the non-negotiable sections

A procurement-friendly AI SOW should contain:

Problem statement and workflow definition
Scope in / scope out
Data boundary and security requirements
Acceptance criteria and evaluation plan
Delivery cadence and milestones
Change control and commercial terms
Handoff and support expectations

If one of these is missing, the project will drift.

Scope in / scope out (use plain language)

Scope language should be understandable by a non-technical stakeholder.

Examples of useful scope statements:

In scope: “Support ticket summarization for internal agents, with citations to ticket history.”
Out of scope: “Fully automated customer support replies without human review.”

A practical rule:

A request is in-scope if it fits the agreed backlog and does not change the data boundary, security boundary, or acceptance criteria.

Acceptance criteria for LLM-enabled workflows

Do not accept “it feels good.”

Use measurable criteria:

Quality: accuracy threshold on a test set
Safety: refusal behavior for restricted topics
Grounding: citation requirements when sources exist
Latency: acceptable response time
Fallback: what happens when confidence is low

Write down the evaluation set requirement. Even a small set (20 to 50 examples) is enough to prevent silent regressions.

Examples: acceptance criteria that don’t collapse under scrutiny

Founders and procurement teams usually want the same thing: fewer arguments at the end. The fastest way to reduce arguments is to show examples in the SOW itself.

Here are a few patterns you can copy and tailor:

Example A: summarization workflow

Inputs: internal tickets and notes only (explicitly listed systems)
Output format: bullet summary + “next steps” + confidence tag
Quality gate: on an evaluation set of 30 real tickets, summaries must include all required fields in at least 27/30 cases
Safety gate: no customer identifiers appear in the summary output
Fallback: if required fields are missing, the workflow routes to human review instead of auto-posting

Example B: classification/routing workflow

Output format: category label + short rationale + escalation flag
Quality gate: minimum accuracy threshold on the evaluation set, plus a confusion-matrix review for the top 3 error types
Auditability: every output stores the version of the rules/prompt and the input ID (not the raw sensitive input)
Fallback: unknown/low-confidence routes to a default queue

Example C: internal knowledge assistant (RAG)

Data boundary: only approved internal docs; explicit exclusions (HR, legal, customer exports, etc.)
Permissions: the assistant must respect existing access controls; no “one index for everyone”
Grounding: answers must cite internal documents when available, and must refuse to answer when sources are missing or access is denied
Regression check: evaluation run after every material change (retrieval settings, indexing, prompt, document set)

The point is not to write a novel. The point is to remove ambiguity about what “good” means.

Data boundary and security clauses

This is where procurement and security teams will focus.

Include:

Allowed data sources
Restricted data sources
Access control expectations (least privilege)
Logging policy (redaction, retention)
Vendor risk and third-party dependencies — see our cybersecurity clauses guide for standard contract language

If the workflow touches PII, add explicit rules for what can be stored and what must be ephemeral.

Change control (the scope creep kill switch)

Change control does not have to be heavy.

Define:

One intake channel
Weekly triage
A trigger rule: when a request becomes new scope

This prevents “just one more integration” from turning into a hidden project.

Add a simple change request form (so scope changes stay polite)

Scope creep becomes conflict when nobody can see the tradeoff. A change request makes it visible.

Add this as an appendix and keep it lightweight:

Change request

Requested change:

Why it’s needed (business reason):

Does it change any of these?
- Data boundary (yes/no)
- Security boundary (yes/no)
- Acceptance criteria (yes/no)
- Timeline (yes/no)

Impact estimate:
- New deliverables:
- Timeline impact:
- Cost impact:

Decision:
- Approved / Rejected / Deferred
- Approved by:
- Date:

This single page prevents the “we thought it was included” debate that burns relationships.

Milestones, payments, and support (make it explicit)

Procurement and founders both want the same thing: fewer surprises. The easiest way to create surprises is leaving “go-live” and “support” undefined.

Include:

Milestones that map to real deliverables (not vague phases)
A payment schedule tied to those milestones
A short warranty window (bug fixes vs new scope)
Post-launch support expectations (response time, escalation path)

If you skip this, you will end up renegotiating during the most stressful part of the project: right after launch, when stakeholders want changes and the team is still stabilizing the workflow.

A simple pattern is:

Milestone 1: scoped plan + acceptance criteria + evaluation set
Milestone 2: pilot shipped + rollback path + demo
Milestone 3: handoff pack + training + operational readiness review

This keeps both delivery and procurement conversations grounded in outputs.

Procurement notes (what they quietly care about)

If you’re a founder, procurement can feel like a wall. If you’re procurement, founders can feel like chaos. The SOW is where you make both sides comfortable.

The pieces that reduce friction:

Clear ownership: one accountable owner per side. Procurement hates “five stakeholders, no owner.”
Third-party dependencies: list them explicitly. Surprises late in the project trigger risk reviews and delays.
Support expectations: define what happens after launch. “We’ll be available” is not a plan.
IP and confidentiality: keep it boring, keep it clear. Ambiguity here slows everything down.

If your client is small and has no procurement function, you still benefit: the same clarity prevents internal stakeholder drift.

What to attach as appendices (so the core SOW stays readable)

One mistake teams make is stuffing everything into the SOW body until it becomes unreadable. Keep the SOW short and attach the details as appendices.

Appendices that work well for AI projects:

Glossary: define terms like “evaluation set,” “data boundary,” “PII,” “rollback,” and “change request.” It prevents semantic arguments.
Security schedule: logging rules, retention, access controls, vendor/tool list, and incident reporting process.
Acceptance test examples: a few representative test cases and the rubric. Not the entire dataset, just enough to make “good” concrete.
Delivery artifacts list: runbook, ownership map, decision log, and how to run the evaluation suite. (See our handoff playbook for the complete list.)

This helps procurement, too: they can route the appendices to security or IT without forcing every stakeholder to read the whole document.

The SOW skeleton

Use this and fill in the blanks. Keep it short.

Project:
Workflow:
Owner (client):
Owner (delivery):

Scope in:
-

Scope out:
-

Data boundary:
- Allowed sources:
- Restricted sources:
- Access controls:

Acceptance criteria:
- Evaluation set and schedule:
- Quality threshold:
- Safety/citation requirements:
- Latency target:
- Fallback/rollback:

Delivery plan:
- Milestones:
- Cadence (demo/triage):

Change control:
- Intake channel:
- Trigger for new scope:

Handoff:
- Runbook:
- Ownership and support:

Commercial terms:
- Payment schedule:
- IP and confidentiality:
- Third-party dependencies:

A SOW is a delivery accelerator

A good SOW reduces ambiguity, protects both sides, and makes value measurable. If you need help structuring an AI project agreement, get in touch.

Thinking about AI for your team?

We help companies move from prototype to production — with architecture that lasts and costs that make sense.

Talk to us How we work

Our offices

Follow us