Case Study: Clerk JWT Integration

What happens when you give the same task to PAI and Claude Code? We ran the experiment so you don't have to.

Both tools were given identical instructions to add Clerk JWT authentication support to an existing Go/AWS Lambda service. The results reveal when each approach shines.

The Setup

To understand what PAI actually delivers, we ran a controlled experiment. Both PAI and Claude Code were given identical instructions to complete the same integration task.

The Task: Add Clerk JWT authentication support alongside existing Autheory JWT support in penny-token-service, a Go/AWS Lambda service that handles token exchange and validation.

Shared Inputs:

Design Document: A 575-line specification at docs/CLERK_INTEGRATION_DESIGN.md detailing requirements, JWT claims mapping, error handling, and test scenarios
Same LLM: Both used Claude Opus 4.5 as the underlying model
Same Knowledge Base: Both had access to PAI MCP for domain context about the existing codebase and Pay Theory patterns
Same Codebase: penny-token-service with existing Autheory JWT validation, claims mapping, and middleware patterns

This is as close to an apples-to-apples comparison as you can get. Same task, same model, same context. The only difference was how the work was structured: PAI's phased workflow versus Claude Code's single-session approach.

The Results

Claude Code got to "working" in 11 minutes for $3. PAI got to "shippable" in 71 minutes for $11. The difference is what happens next.

Claude Code

$2.93 Cost

11 min Time

31 Tests

0 Docs

None Middleware

"Working"

PAI

$11.40 Cost

71 min Time

42 Tests +35%

3 Docs

Complete Middleware

"Shippable"

What PAI's Phases Produced

The extra cost and time went to upfront structure, not rework.

Scope Phase

Defined the exact boundaries of the integration:

What's in scope: Clerk JWT validation, claims mapping, middleware for user extraction
What's out of scope: Autheory modifications, database schema changes
Integration points: Existing token exchange handler, rate limiting middleware

Clear boundaries prevent scope creep and ensure the assistant stays focused on the actual task.

Specification Phase

Generated detailed module specs with:

Typed errors: ErrClerkJWKSFetch, ErrClerkTokenExpired, ErrClerkInvalidAudience
Constants: JWKS URLs, claim field names, issuer patterns
Test cases: 42 scenarios covering happy paths, edge cases, and error conditions

Specifications become the contract between what you asked for and what gets built.

Implementation Phase

Code generated to match spec exactly:

clerk_validator.go: JWKS handling, token verification, typed errors
claims_mapper.go: Clerk-specific claim extraction with constants
clerk_user_extraction.go: Middleware for rate limiting integration
3 documentation files: Setup guide, testing guide, troubleshooting

Implementation follows the spec, not the other way around.

Validation Phase

22 validation passes confirmed spec compliance:

Type safety: All error types match specification
Test coverage: Every specified scenario has a corresponding test
Documentation: All public APIs documented with examples
Integration: Middleware correctly wired to existing handlers

Validation catches gaps before they become production bugs.

What Claude Code Produced

Claude Code excels at rapid iteration. In just 11 minutes, it delivered working code that passed tests and integrated with the existing codebase. For exploration and prototyping, this speed is invaluable.

What Claude Code delivered well:

Core validator implementation: clerk_validator.go with JWKS handling and token verification
Claims mapping extension: Clerk-specific claim extraction added to existing mapper
Token exchange handler: Updated to support Clerk JWT alongside Autheory
Test coverage: 31 test functions covering core functionality
Config and CDK updates: Infrastructure changes for deployment

What remained for developers to complete:

Middleware for user extraction: Rate limiting integration required manual implementation
Operational documentation: Setup guides, testing guides, and troubleshooting docs
Edge case coverage: 11 fewer test scenarios than the specification called for
Typed error constants: Generic errors instead of domain-specific types like ErrClerkJWKSFetch

Claude Code's output was working—it compiled, tests passed, and the core flow functioned. The question is whether "working" is the same as "ready to ship."

When to Use Each

Different tools for different contexts. Here's how to choose.

Use PAI When...

🚀
Shipping to production

When code needs to be complete, documented, and audit-ready
🔗
Complex integrations

Multi-service changes with middleware, error handling, and edge cases
📚
Documentation matters

When setup guides, testing docs, and troubleshooting are required
✅
Compliance and audit needs

Traceable specifications, validation passes, and typed error handling

Use Claude Code When...

💡
Prototyping ideas

Exploring approaches before committing to a direction
🔧
Quick fixes

Small, well-scoped changes that don't need extensive documentation
🔍
Exploring approaches

Testing multiple solutions to find the right architecture
⏱️
Time-sensitive spikes

When speed matters more than completeness

Many teams use both: Claude Code for rapid exploration, then PAI when it's time to ship.

The Math

Developer time cost analysis:

Claude Code + Developer Finishing

Developer cost (30 min @ $100/hr): $50.00
Claude Code cost: $2.93
Total: $52.93

PAI (Review Only)

PAI cost: $11.40

Savings: $41.53 per feature

"Claude Code got to 'working' in 11 minutes for $3. PAI got to 'shippable' in 71 minutes for $11. The difference is what happens next: with Claude Code, you spend developer time finishing; with PAI, you spend it reviewing."

PAI doesn't replace your coding assistant

It makes sure you ship what you meant to build.