How we used an AI coding workflow to build, test, and validate a PaymentIntents API emulator — then proved behavioral parity by running a curated 250-case snapshot against both the emulator and Stripe's live API.
Third-party APIs like Stripe, Twilio, and Plaid are critical to production systems — but testing against them is expensive, rate-limited, and non-deterministic. AI coding agents change that equation entirely.
Payment APIs enforce rate limits, require live credentials, return non-deterministic IDs and timestamps, and charge real money in production mode. Teams resort to mocking — but mocks drift from reality and miss edge cases.
An AI coding agent reads the official API documentation, builds a behavioral emulator, generates hundreds of test cases with full traceability, and validates parity by running the same suite against both the emulator and the real API.
The technique demonstrated here — AI-driven emulation from documentation — is applicable to any API where live testing is impractical.
Stripe, PayPal, Adyen, Square — Complex state machines (authorize → capture → refund), idempotency requirements, webhook-driven flows. Testing requires sandbox credentials and has rate limits.
Twilio, SendGrid, Vonage — Sending real SMS/emails in tests is costly and non-deterministic. Delivery status callbacks are hard to simulate without a local emulator.
Plaid, MX, Yodlee — Account linking flows, balance verification, transaction history. Production sandboxes have limited scenarios and stale data.
AWS, GCP, Azure APIs — Provisioning resources for tests is slow and expensive. LocalStack exists for AWS, but coverage is always lagging. AI agents can fill the gap.
FedEx, UPS, DHL — Rate calculation, label generation, tracking webhooks. Sandbox environments are poorly maintained and often offline.
Onfido, Jumio, Persona — Document verification flows require real documents in sandbox. An emulator can model the state transitions and response shapes without sensitive data.
A lightweight Python HTTP server that implements Stripe's PaymentIntents API subset, with a dual-target test harness for behavioral parity verification.
Watch packets traverse the lifecycle — from creation to capture, confirmation, or cancellation.
From 1,891 raw doc sentences to 577 core-scope requirements, 559 covered by tests.
8 endpoints implementing the core PaymentIntents lifecycle. Click any file to view its source.
| Method | Endpoint | Description |
|---|---|---|
GET | /health | Health check |
GET | /v1/payment_intents | List intents with pagination |
GET | /v1/payment_intents/{id} | Retrieve a specific intent |
POST | /v1/payment_intents | Create a new intent |
POST | /v1/payment_intents/{id}/confirm | Confirm with payment method |
POST | /v1/payment_intents/{id}/cancel | Cancel with optional reason |
POST | /v1/payment_intents/{id}/capture | Full or partial capture |
POST | /v1/refunds | Create full or partial refund |
The same 250 test cases run against both targets. Watch requests fan out and results converge.
An 8-phase plan executed by an AI coding agent, from documentation ingestion to a fully validated emulator.
emulator/, test-cases/, scripts/, artifacts/, docs/. Configured environment handling for Stripe credentials, deterministic seeding, and retry logic.ApiClient class handles both targets — only the base URL and auth differ. Identical assertion logic, JSONL logging, and automatic retries with exponential backoff for transient failures.ThreadingHTTPServer — no frameworks, no dependencies. Implements the full PaymentIntent lifecycle: create, confirm, cancel, capture, refund. Plus idempotency key support, currency/amount validation, and state machine enforcement.succeeded after any capture, but Stripe keeps requires_capture when final_capture=false and amount remains. Fixed by adding final_capture flag logic, then validated on a clean final snapshot.
A curated final snapshot of 248 generated test cases plus 2 smoke tests ran against both targets. Here are the results.
The data story now points at the curated final snapshot: 250 of 250 cases passed on Stripe and 250 of 250 cases passed on the emulator.
559 of 577 documentation sentences are exercised by at least one test case.
1,512 total test-to-doc mappings across the traceability matrix.
18 uncovered sentences relate to advanced edge cases outside the PaymentIntents core flow.
test_case_traceability.csvAsciinema-style replays of actual console output from the dual-target test suite.
Tests span the full PaymentIntents lifecycle — from happy-path creation to idempotency conflicts, partial captures, and refund edge cases.
| Category ▲ | Count ▲ | Description |
|---|---|---|
CREATE_OK | 40 | Happy-path creates with varying amounts, currencies (USD/EUR/GBP/CAD/AUD/SGD/JPY), and capture methods |
CREATE_ERR | 35 | Validation errors — missing amount, invalid currency, out-of-range values, invalid capture_method |
CONFIRM_AUTO | 20 | Automatic capture confirmation flows (create → confirm → status=succeeded) |
CONFIRM_MANUAL | 20 | Manual capture confirmation flows (create → confirm → status=requires_capture) |
CAPTURE_PARTIAL | 20 | Two-part partial capture sequences with amount_to_capture |
CAPTURE_FULL | 15 | Full capture of manual-capture intents |
CAPTURE_ERR | 10 | Capture errors — over-limit amounts, invalid state transitions |
CANCEL_REASON | 10 | Cancellation with reason values: requested_by_customer, abandoned, duplicate, fraudulent |
CANCEL_CAPTURE | 10 | Cancel from requires_capture state after manual-capture confirm |
CANCEL_ERR | 10 | Double-cancel error handling (already canceled intent) |
REFUND_FULL | 10 | Full refund by payment_intent reference |
REFUND_PARTIAL | 10 | Three-part partial refund sequences |
REFUND_ERR | 15 | Over-refund and invalid reference errors |
IDEMPOTENCY_SAME | 10 | Same Idempotency-Key with same payload → cached response |
IDEMPOTENCY_ERR | 10 | Same Idempotency-Key with different payload → 400 conflict |
LIST_RETRIEVE | 5 | List/retrieve consistency checks |
status_code
Exact HTTP status match
equals
Exact field value match
in
Value in allowed set
prefix
String prefix (e.g. pi_)
gte / lte
Numeric range checks
context_equals
Cross-step variable match
exists
Field presence check
status_code_in
Status in allowed list
Interactive file explorer. Click folders to expand, click files to view source with syntax highlighting.
Clone the repo and run the full dual-target suite in under 3 minutes.
# Clone and install git clone https://github.com/your-org/stripe-payment-simulator.git cd stripe-payment-simulator pip install -r requirements-dev.txt # Quick smoke test (starts emulator automatically) python scripts/phase0_smoke.py # Full dual-target run (Stripe + Emulator) export STRIPE_API_KEY=sk_test_... python scripts/run_dual_target_suite.py # Or run targets individually python emulator/app.py & # Start emulator TARGET=emulator python -m pytest test-cases/harness -q TARGET=stripe python -m pytest test-cases/harness -q