Case Study: Clerk JWT Integration
What happens when you give the same task to PAI and Claude Code? We ran the experiment so you don't have to.
Both tools were given identical instructions to add Clerk JWT authentication support to an existing Go/AWS Lambda service. The results reveal when each approach shines.
The Setup
To understand what PAI actually delivers, we ran a controlled experiment. Both PAI and Claude Code were given identical instructions to complete the same integration task.
The Task: Add Clerk JWT authentication support alongside existing
Autheory JWT support in penny-token-service, a Go/AWS Lambda service that
handles token exchange and validation.
Shared Inputs:
Design Document: A 575-line specification at
docs/CLERK_INTEGRATION_DESIGN.mddetailing requirements, JWT claims mapping, error handling, and test scenariosSame LLM: Both used Claude Opus 4.5 as the underlying model
Same Knowledge Base: Both had access to PAI MCP for domain context about the existing codebase and Pay Theory patterns
Same Codebase:
penny-token-servicewith existing Autheory JWT validation, claims mapping, and middleware patterns
This is as close to an apples-to-apples comparison as you can get. Same task, same model, same context. The only difference was how the work was structured: PAI's phased workflow versus Claude Code's single-session approach.
The Results
Claude Code got to "working" in 11 minutes for $3. PAI got to "shippable" in 71 minutes for $11. The difference is what happens next.
Claude Code
"Working"
PAI
"Shippable"
What PAI's Phases Produced
The extra cost and time went to upfront structure, not rework.
Scope Phase
Defined the exact boundaries of the integration:
What's in scope: Clerk JWT validation, claims mapping, middleware for user extraction
What's out of scope: Autheory modifications, database schema changes
Integration points: Existing token exchange handler, rate limiting middleware
Clear boundaries prevent scope creep and ensure the assistant stays focused on the actual task.
Specification Phase
Generated detailed module specs with:
Typed errors:
ErrClerkJWKSFetch,ErrClerkTokenExpired,ErrClerkInvalidAudienceConstants: JWKS URLs, claim field names, issuer patterns
Test cases: 42 scenarios covering happy paths, edge cases, and error conditions
Specifications become the contract between what you asked for and what gets built.
Implementation Phase
Code generated to match spec exactly:
clerk_validator.go: JWKS handling, token verification, typed errors
claims_mapper.go: Clerk-specific claim extraction with constants
clerk_user_extraction.go: Middleware for rate limiting integration
3 documentation files: Setup guide, testing guide, troubleshooting
Implementation follows the spec, not the other way around.
Validation Phase
22 validation passes confirmed spec compliance:
Type safety: All error types match specification
Test coverage: Every specified scenario has a corresponding test
Documentation: All public APIs documented with examples
Integration: Middleware correctly wired to existing handlers
Validation catches gaps before they become production bugs.
What Claude Code Produced
Claude Code excels at rapid iteration. In just 11 minutes, it delivered working code that passed tests and integrated with the existing codebase. For exploration and prototyping, this speed is invaluable.
What Claude Code delivered well:
Core validator implementation:
clerk_validator.gowith JWKS handling and token verificationClaims mapping extension: Clerk-specific claim extraction added to existing mapper
Token exchange handler: Updated to support Clerk JWT alongside Autheory
Test coverage: 31 test functions covering core functionality
Config and CDK updates: Infrastructure changes for deployment
What remained for developers to complete:
Middleware for user extraction: Rate limiting integration required manual implementation
Operational documentation: Setup guides, testing guides, and troubleshooting docs
Edge case coverage: 11 fewer test scenarios than the specification called for
Typed error constants: Generic errors instead of domain-specific types like
ErrClerkJWKSFetch
Claude Code's output was workingβit compiled, tests passed, and the core flow functioned. The question is whether "working" is the same as "ready to ship."
When to Use Each
Different tools for different contexts. Here's how to choose.
Use PAI When...
Shipping to production
When code needs to be complete, documented, and audit-ready
Complex integrations
Multi-service changes with middleware, error handling, and edge cases
Documentation matters
When setup guides, testing docs, and troubleshooting are required
Compliance and audit needs
Traceable specifications, validation passes, and typed error handling
Use Claude Code When...
Prototyping ideas
Exploring approaches before committing to a direction
Quick fixes
Small, well-scoped changes that don't need extensive documentation
Exploring approaches
Testing multiple solutions to find the right architecture
Time-sensitive spikes
When speed matters more than completeness
Many teams use both: Claude Code for rapid exploration, then PAI when it's time to ship.
The Math
Developer time cost analysis:
Claude Code + Developer Finishing
- Developer cost (30 min @ $100/hr): $50.00
- Claude Code cost: $2.93
- Total: $52.93
PAI (Review Only)
- PAI cost: $11.40
Savings: $41.53 per feature
"Claude Code got to 'working' in 11 minutes for $3. PAI got to 'shippable' in 71 minutes for $11. The difference is what happens next: with Claude Code, you spend developer time finishing; with PAI, you spend it reviewing."
PAI doesn't replace your coding assistant
It makes sure you ship what you meant to build.