Bringing Claude Code Intelligence to Your SaaS

TL;DR: We built Tuplet, a TypeScript framework for embedding Claude Code-like AI agents in your own applications. One dependency, stateless, serverless-ready. MIT licensed.

‍

The moment we realized we needed to build this

Six months ago, we were adding AI features to a Next.js SaaS product. Nothing fancy - we wanted users to be able to ask questions about their data, generate reports, maybe automate some workflows.

We started with the obvious approach: call the OpenAI API, stream the response, done. It worked for simple Q&A. But the moment we tried anything more complex - "analyze my last quarter's data and create a summary report" - everything fell apart.

The AI would hallucinate file paths. It would start executing before understanding what the user actually wanted. It would get stuck in loops. It had no concept of breaking a complex task into steps.

Then we tried Claude Code.

We opened a terminal, pointed it at a codebase, and asked it to refactor a module. It didn't just start typing. It explored first. It asked clarifying questions. It made a plan. It tracked its own progress. When it hit an obstacle, it reasoned through alternatives.

We looked at each other and said: "Why can't we have this in our app?"

‍

Why existing solutions didn't work for us

LangChain: too much abstraction

We tried LangChain.js first. It's the obvious choice - huge ecosystem, lots of integrations, active community.

But we kept fighting the abstractions. Chains, Runnables, LCEL, Memory, Agents - every concept required learning a new mental model. Our simple "analyze data and make a report" task turned into a sprawling dependency graph of components.

The runtime dependency count was also concerning. We counted 11+ packages just for core functionality, plus additional packages for each provider. For a serverless function with cold start sensitivity, this matters.

More importantly, LangChain is optimized for a different use case. It's fantastic for RAG pipelines, for chaining together multiple data sources, for complex retrieval workflows. But we didn't need retrieval. We needed an agent - something that could think, plan, and execute autonomously.

Building from scratch: too much work

We considered rolling our own. How hard could it be?

Turns out, pretty hard. Here's an incomplete list of what we'd need to build:

Planning logic (how does the AI break down complex tasks?)
Task tracking (how does it know what's done and what's next?)
Clarifying questions (how does it ask for more info without being annoying?)
Tool execution (how does it call functions and handle errors?)
Context management (how does it stay within token limits?)
Cost tracking (how do we know what we're spending?)
Multi-provider support (what if we want to switch models?)
History management (how do we persist conversations?)
Interruption handling (what if the user wants to stop or redirect?)

Each of these is a rabbit hole. Planning alone took us two weeks to get right. We kept finding edge cases: What if the plan is too ambitious? What if the AI gets stuck? What if the user's request is ambiguous?

We realized we were building a framework, not a feature.

What we actually built

Tuplet is the framework we extracted from that work. Here's the mental model:

‍

That's it. No chains, no runnables, no graph definitions. One object, one method call.

But under the hood, a lot is happening:

Planning sub-agent: Before executing anything, Tuplet spawns a planning agent that analyzes the request, identifies ambiguities, and creates a task list.
Clarifying questions: If the planner identifies missing information ("Which region's sales? All products or specific categories?"), it asks before proceeding.
Task tracking: As the agent works, it updates task status in real-time. You can stream these updates to your UI.
Tool execution: When the agent needs to call a tool, it does so with proper error handling, retries, and timeout management.
Context management: Long conversations get automatically summarized to stay within token limits.
Cost tracing: Every request generates a detailed cost breakdown - per model, per sub-agent, per tool call.

The stateless design

This was a hard requirement for us. We deploy to Vercel and Firebase Functions. We can't assume persistent memory between requests.

Tuplet is stateless by design. All conversation state is externalized to a pluggable history repository:

The agent loads history at the start of each request and saves it at the end. No in-memory state, no session reconstruction, no Redis required (unless you want it).

This also means you can run Tuplet agents in parallel. Each request is independent.

The workspace abstraction

Claude Code works with your local filesystem. But our users' data lives in Supabase, S3, and various APIs.

We abstracted this into a "workspace" concept:

‍

You can implement your own workspace by conforming to a simple interface. We've seen people build workspaces backed by:

S3 / R2 / GCS
PostgreSQL (storing documents as BLOBs)
GitHub repos (via the API)
Google Drive
Notion

The agent doesn't know or care where the files live. It just sees a filesystem.

The hard problems we solved (and how)

Problem 1: How do you make an AI plan before acting?

Our first approach was prompt engineering. We'd tell the model: "Before doing anything, create a plan."

This worked... sometimes. Other times, the model would acknowledge that it should plan, then immediately start executing anyway. Classic instruction-following failure.

The solution was architectural. We don't ask the main agent to plan. We spawn a separate planning agent with a different system prompt optimized for analysis and decomposition. This agent's only job is to produce a plan. It has no tools, no ability to execute.

Once the plan is approved (either automatically or by the user), we pass it to the execution agent as a structured task list.

This separation of concerns made a huge difference. The planning agent is calm and thorough because it knows it won't be executing. The execution agent is focused because it has clear instructions.

‍

Problem 2: How do you handle clarifying questions without being annoying?

Nobody wants an AI that asks 10 questions before doing anything. But nobody wants an AI that makes assumptions and gets things wrong either.

We found a balance through a "confidence threshold" approach. During planning, the agent evaluates its confidence in understanding the request. If confidence is high, it proceeds. If confidence is low, it asks - but it asks efficiently.

Instead of:

"What time period would you like me to analyze?" "Which metrics are you interested in?" "Should I include visualizations?"

It asks:

"I'll analyze Q3 2024 sales across all regions, focusing on revenue and growth metrics, with charts. Should I adjust any of this?"

One question. The user can say "yes" or specify changes. This pattern - "here's what I understood, confirm or correct" - works much better than open-ended questions.

‍

Problem 3: How do you stay within context limits?

Claude has a 200K token context window, which sounds like a lot until you're processing a codebase or a long conversation history.

We implemented automatic summarization with a twist: we don't just summarize the whole conversation. We keep recent messages intact (they're most relevant) and summarize older ones progressively.

This preserves the detail where it matters (recent context) while compressing ancient history.

For large files, we borrowed Claude Code's chunking approach. Files over 256KB are read in chunks, with the agent explicitly requesting specific sections as needed rather than loading everything upfront.

Problem 4: How do you make non-Claude models work well?

Tuplet supports OpenAI, OpenRouter (100+ models), and custom providers. But here's the thing: models behave differently. A prompt optimized for Claude might confuse GPT-4. A tool schema that works with Claude might fail with Mixtral.

We solved this with provider-specific prompt adapters. When you use the OpenAI provider, prompts are automatically adjusted for GPT's preferences. When you use OpenRouter with Llama, we adapt again.

This isn't perfect - you'll always get best results with Claude - but it means you can prototype with cheaper models and upgrade to Claude for production.

Real examples from production

Example 1: AI coding assistant in a web IDE

A company building a browser-based IDE used Tuplet to add an AI assistant. Users can highlight code and ask questions or request refactors.

‍

The caching is important here. The codebase context gets cached, so subsequent requests are 90% cheaper. Over a month, they estimated $12K in savings compared to uncached requests.

Example 2: Customer support agent

A B2B SaaS used Tuplet for tier-1 support automation. The agent can search their knowledge base, look up customer accounts, create tickets, and escalate to humans.

‍

Key insight: the secrets parameter. The agent can use API keys to call internal services, but the actual values are never exposed in the conversation or logs. This is important for security audits.

Example 3: Data analysis pipeline

A fintech used Tuplet for ad-hoc data analysis. Users ask questions in natural language; the agent writes SQL, executes it, and visualizes results.

‍

The onProgress callback is how they built a real-time UI showing the agent's thinking process, current task, and execution progress.

What Tuplet is NOT good for

Honesty time. Tuplet isn't the right choice for everything:

RAG pipelines: If your primary need is retrieval-augmented generation with vector stores and embeddings, LangChain is probably better. Tuplet doesn't have built-in vector store integrations.

Complex multi-model orchestration: If you need to chain 5 different models together with complex routing logic, you might want something more flexible. Tuplet is opinionated about its architecture.

Non-agentic use cases: If you just need simple prompt → response without planning, task tracking, or tools, Tuplet is overkill. Just use the API directly.

Python shops: Tuplet is TypeScript-only. If your backend is Python, look at the original Claude Code architecture or frameworks like AutoGen.

Performance and cost

Some real numbers from production deployments:

Cold start: ~150ms on Vercel Edge Functions, ~300ms on AWS Lambda (Node.js 20).

Typical request latency: 2-15 seconds depending on task complexity. Most of this is LLM inference time, not framework overhead.

Cost with caching: For repeated similar requests (like IDE assistance where the codebase context is stable), we see 80-90% cost reduction with Claude's prompt caching.

Memory usage: ~50MB baseline. Scales with conversation history and workspace size.

The technical decisions we're most proud of

1. Single dependency: Tuplet has exactly one runtime dependency (the Anthropic SDK, which is optional if you use a custom provider). We aggressively avoided dependency creep.

2. TypeScript-first with strict typing: All tool parameters are typed. The compiler catches mistakes before runtime.

‍

3. Streaming by default: All responses stream. You can show users the agent's thinking in real-time.

4. Observable everything: Planning, execution, tool calls, token usage - all exposed through typed events. Build whatever UI you want.

5. Interruptible: Users can stop or redirect the agent mid-execution. Partial results are preserved.

‍

What's next

We're actively working on:

MCP (Model Context Protocol) support: Use Tuplet agents as MCP servers or connect to external MCP tools.
Better memory: Long-term memory that persists across conversations.
Agent-to-agent communication: Let agents spawn and coordinate with other agents.
Eval framework: Built-in tools for testing agent behavior.

We're also exploring tighter integrations with specific platforms (Supabase, Firebase, Vercel) to make setup even simpler.

Try it

‍

GitHub: github.com/anetrebskii/tuplet
npm: npmjs.com/package/tuplet
Docs: Full documentation

MIT licensed. Use it however you want.

FAQ

Q: How does this compare to the OpenAI Agents SDK?

OpenAI's SDK is great if you're committed to their ecosystem. Tuplet is provider-agnostic - start with Claude, switch to GPT, use open-source models via OpenRouter. Same code, different provider.

Q: Can I use this with my existing Express/Fastify/Next.js app?

Yes. Tuplet is just a library. Import it, instantiate an agent, call .run(). It doesn't take over your server or require specific middleware.

Q: How do you handle rate limits?

We implement exponential backoff automatically. You can also configure custom retry strategies.

Q: Is the conversation history secure?

History is stored wherever you configure - your database, your security model. We don't send anything to external services beyond the LLM provider you choose.

Q: What's with the name?

A tuplet in music is a rhythm that doesn't fit neatly into the standard beat division - like a triplet in 4/4 time and is also the root of any Twin multiple. I.e. a quintuplet or sextuplet. We liked the idea of something that works outside the expected structure. Importantly it was also available on npm.

‍

𝕏 Share on X in Share on LinkedIn