The Problem
Regulated organisations want AI systems that act — not just summarise. They want an agent that retrieves the relevant control from a policy document, assesses how a proposed change affects it, and creates a ticket to track the gap. The operational desire is clear. The architecture problem is not.
A single monolithic LLM call that retrieves, reasons, and acts cannot be audited. You cannot explain which document grounded which claim, which agent made which decision, or whether a human reviewed the action before it executed. In a regulated environment, an answer that cannot be traced to a specific source is not an answer — it is a liability. An action that executes without human review is not automation — it is risk.
The architecture challenge is not building an agent that can do these things. It is building one that does them under governed control — with an audit trail that satisfies a compliance team, a guardrail layer that satisfies a CISO, and a human approval gate that satisfies both.
How It Works — The Golden Workflow
A compliance question arrives. Three specialist agents handle it in sequence, supervised by a LangGraph state machine. Every transition is traced in LangSmith and appended to an immutable PostgreSQL audit log.
Denied topics (hacking, evasion, harmful content) and PII patterns are caught at the boundary. The request never reaches the supervisor if it fails. Application-level guardrails — API-compatible with AWS Bedrock Guardrails for the production swap.
The question is embedded and run through a cosine similarity search over seeded regulatory documents (GDPR, ISO 27001, internal governance frameworks) using pgvector with an HNSW index. Top-k chunks returned with source metadata.
Retrieved chunks are assembled into a structured context block. The agent is prompted to cite every factual claim with a numbered reference — un-cited claims are flagged before emission, not just discouraged. Hallucination is constrained structurally, not just instructed against.
PII patterns and harmful content are caught on the response, not just the input. Both layers are production-swappable to AWS Bedrock Guardrails as a configuration change.
The Action Agent generates a structured proposal: a GitHub issue or Jira ticket with title, body, and labels. LangGraph's interrupt_before pauses the graph here. A human must explicitly approve or reject before any write executes. Rejection is also logged.
Every agent step, tool call, human decision, and action outcome is appended to the PostgreSQL audit log with a UTC timestamp. BEFORE DELETE and BEFORE UPDATE triggers raise an exception on any modification attempt — immutability enforced at the database level, not by convention.
The Governance Layer
Six non-negotiable pillars make this system auditable, safe, and honest for a regulated environment. Each maps to a production AWS equivalent described in the ADRs — the production swap is a configuration change, not a rewrite.
Application-level screens for denied topics, PII patterns, and harmful content on both the question and the response. API-compatible with AWS Bedrock Guardrails — production swap is a single config change.
Every agent step, tool call, and human approval appended to PostgreSQL. A database-level trigger prevents any DELETE or UPDATE. The log is tamper-proof by design — not by convention.
The Action Agent proposes write actions — it never executes them. interrupt_before pauses the LangGraph graph until a human explicitly approves. The graph state is preserved during the pause; approval resumes exactly where it stopped.
Every factual claim must carry a numbered source reference. Responses where is_cited=False are flagged before emission. Hallucination is constrained at both the prompt and the enforcement layer.
The Bedrock IAM user is scoped to bedrock:InvokeModel on target model ARNs only — no other AWS permissions. In production on AWS compute, this becomes an IAM role with no long-lived access keys at all. See ADR-005.
Every agent step and LLM call is traced in LangSmith — supervisor routing, retrieval results, draft generations, guardrail decisions. The full graph execution is visible, replayable, and attributable to a specific run ID.
Two-Tier Deployment Strategy
The system runs at under CAD $35 per month in the demo tier. This is a deliberate architectural decision — not a constraint. Moving to the production tier changes the infrastructure; it does not change the application code. The swap map is documented in full in ADR-004.
| Concern | Demo Tier (Now) | Production Path (AWS) |
|---|---|---|
| Compute | Railway Hobby (FastAPI container) | AWS App Runner / ECS Fargate |
| Database | Neon free tier (pgvector) | Aurora Serverless v2 + pgvector |
| Models | OpenRouter → Claude Sonnet | Amazon Bedrock (direct) |
| Auth | IAM access key (scoped, Bedrock only) | IAM role — no long-lived keys |
| Secrets | Railway env vars | AWS Secrets Manager |
| Networking | Public Railway URL | Private VPC + WAF |
| Always-on | No — sleeps after 30 min idle | Yes |
| Monthly cost | ~CAD $20–35 | ~CAD $150–300+ |
The cost-optimisation in the demo tier is not accidental — it is documented. ADR-004 maps every infrastructure component from Railway to its AWS equivalent, states the trigger that would prompt migration (a real client engagement or sustained load), and quantifies the cost delta. The architecture makes the transition path explicit before it is needed.
Architecture Decision Records
Five ADRs document the deliberate trade-offs in this system. Each captures the context, the options considered, the decision, and the consequences — the artefacts that distinguish an architecture engagement from a development project.
| ADR | Decision | The Trade-off |
|---|---|---|
| ADR-001 | LangGraph vs. Amazon Bedrock Agents | Self-orchestration keeps the graph portable and inspectable; managed runtime reduces operational overhead in production |
| ADR-002 | pgvector co-located vs. dedicated vector DB | One PostgreSQL instance for both retrieval and audit eliminates a service; at production scale, a dedicated vector DB (Aurora pgvector or OpenSearch) separates these concerns |
| ADR-003 | Bedrock Guardrails configuration for regulated industries | Application-level guardrails in demo tier; managed Bedrock Guardrails in production add PII redaction, denied topic enforcement, and grounding checks at the API boundary |
| ADR-004 | Railway Hobby vs. AWS App Runner / ECS Fargate | Railway costs ~CAD $35/month with zero infrastructure management; AWS production path costs more but provides IAM roles, private networking, always-on SLA |
| ADR-005 | IAM access key vs. IAM role | IAM roles are unavailable outside AWS compute — Railway cannot assume a role. Access key scoped to Bedrock invoke only is the least-privilege mitigation; production on AWS compute uses a role with no long-lived credentials |