Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools — benchmark.space

Cognitive Revolution "How AI Changes Everything

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

2026-03-22 100min 277,971 views watch on youtube →

Channel: Cognitive Revolution "How AI Changes Everything"

Date: 2026-03-22

Duration: 100min

Views: 277,971

URL: https://www.youtube.com/watch?v=E993Y8AB0cQ

Karan Vaidya, CTO of Composio, explains how their “smart tool” platform lets AI agents access over 50,000 tools across 1,000+ apps through a single interface. He details how Composio handles tool discovery, authentication, sandboxes, and logging, and how an AI-powered feedback loop continuously improves tools in real time. The conversation explores avoiding model lock-in through robust skills and instructions, translating capabilities across model providers, and why the best agent use cases look

Hello and welcome back to the Cognitive Revolution. Today my guest is Karan Ved CTO of Composable a platform that allows AI agents to access more than 50,000 tools spending more than 1,000 apps all through a single interface. And which is one of the best examples of the smart tool pattern that I've been watching out for since the MCP paradigm was introduced. This is a sponsored episode but as always I have played around with the tool for the last couple of weeks and it's clear to me that Composable does address real problems. For starters core platforms like Gmail, Google Drive and Slack do not make it easy for do-it-yourselfers to grant access to AI agents. The number of clicks required to get started is a serious barrier for casual users. Most other tools are simpler to connect but many aren't popular or well documented enough for AI's to know how best to use them from the start and sometimes quite a bit of iteration is required to get things working well. Looking ahead as people delegate larger

projects agents those agents will often need tools that the humans never anticipated. All of this is indeed made much easier by simply giving your agent access to Composable which allows agents to express high-level intent identifies the right tools for the job and provides authentication execution sandboxes and logging infrastructure that few developers really want to build on their own. In this conversation we get into the details of how Composable works and how they're delivering on the smart tool promise by using an AI-powered continuous improvement process that can detect when a tool isn't working for an agent generate a new version in real time and then swap the upgrade into the agent's context. And which over time in the background automatically identifies and diffuses successful patterns across the entire Composable customer base. One of the most interesting arguments that Karan makes is that excellence in tooling and increasingly in skills can help developers avoid model lock-in. The idea is that while models do have

different default behaviors they are all very good at following instructions which means that if you have very thorough instructions you can probably get similar outputs from any Frontier model. And for cases where that doesn't work Karan and team are also working on meta skills which translate skills from one provider to another reducing switching costs even further. Beyond that we also hear about Karan's favorite agent use cases which notably look more like full jobs than discrete tasks. His perspective on which technology companies are gaining strength from the AI wave which are most threatened and how sticky agent products like Intercom's Finn will prove to be over time. His thoughts on memory platforms payment frameworks and other tools built specifically for AI agents. And how Composable works today which includes individual engineers who manage tens of AI agents and for the team that manages Composable's own agentic pipeline a token bill that exceeds human payroll. For me this conversation couldn't be more timely. Over the last couple of

months I've put in the work to curate the context that Claude code needs to serve as a second brain and a capable assistant and it's become my go-to interface for just about everything that I do on a computer. The next level up will be to get agents doing large-scale projects autonomously on my behalf. And as I enter this next phase Composable will definitely be a part of my stack. I'll report back on how I'm doing of course but for now I hope you enjoy this conversation about building smart tools for AI agents with Karan Ved CTO of Composable. The Cognitive Revolution is brought to you in part by Google makers of the Gemini family of models. Anyone who's followed the show for a while knows that we've been sporadic at best when it comes to posting clips on social media. The hit rate of AI clip makers in my experience isn't super high either because the segments they highlight are kind of beside the point or because the edits end up awkward and are too much trouble to fix. I think a big part of the reason those

problems happen is that most models simply cannot watch the full video. And this is why I'm now sending full videos of the podcast directly to Gemini 3.1 Pro and asking it to select the best clip worthy moments with full awareness of everything that happened in the entire conversation. Gemini is the only Frontier model API that even accepts video inputs right now so there is literally no other product that supports this workflow. And it doesn't stop there either. Once we select clip candidates I apply a layer of agentic editing and then send the rendered clips back to Gemini for a 13-point evaluation. With its scores I'm able to structure a mini social media campaign where I'm working with only genuinely good clips from the start. Honestly it's getting so easy you can do it for just about anything internal company meetings long presentations or even home videos. Visit Google's AI Studio to explore Gemini's native video understanding. Thank you to Google for supporting the Cognitive Revolution and now on with the

show. Karan Ved CTO at Composable welcome to the Cognitive Revolution. Hey Nathan thanks for having me. I'm excited for this conversation. For folks who don't know what Composable is I've been playing around with it a little bit over the last couple of weeks and I've come to think of it essentially as a Swiss Army knife for AI agents. Obviously we're all developing agents for a wide range of use cases some of which are you know ad hoc minute by minute assistance others are you know built in in much more intentional and structured ways into products and all but all these agents need tools and some of the tools we have the time and the luxury of building out in a really intentional bespoke way and then there's just a ton of other things that a lot of people have common needs for and this is where I see Composable coming in with a a sort of ready to go thousand tools that you can plug into your AI agent and give it you know a much broader reach than it would have if

you were building out every tool one by one. That's my takeaway from getting it under the hood a little bit. How do you like the description and what would you add to that for starters? So you're on point in understanding the problem that we solve that we provide thousand plus apps 50,000 plus tools to your agents to anybody building agents >> [snorts] >> but that's not the like the final solution because if at this point where like in the LLM journey we are if you provide thousand tools to the agent it will probably use the wrong blade and suicide via context overload. So that's where we are essentially building the whole agentic tool harness we are the what I kind of call Composable is agentic tool execution layer. >> [snorts] >> So the whole harness for tool execution

that while building agents that you would need to develop or you would want to give your Claude code codex to that's what we provide. That includes like a few meta tools so that you don't have to put in thousands of tools to your agent it includes manage authentication authorization giving the right scopes to the LLM. >> [snorts] >> It like like the problem that I just mentioned you can't give thousand tools to the agent so we do just-in-time tool discovery that's one of the tools and dynamic tool calling around it so that only the right set of tools that the agent needs for a given use case gets loaded into the context. >> [snorts] >> One of the other problems like people face while building agents is that in a bunch of cases direct function calling is not the best for solving the use case things like if I want my agent to process 10,000 emails it will probably context overload in let's say a hundred of them. So that's where we provide like sandboxes where the agent can do programmatic tool calling on top of our

apps where it can process 10,000 or even a million emails via writing code. And on back of it I think the strongest thing that we do is continual learning. So all our integrations are built by our internal agentic pipeline which goes through the agent doing like first getting the developer app all the credentials required like creating the actions then like finding dependencies and testing them in real world scenarios like a bunch of edge cases and that's like the whole process that it goes through and what it gives us is in like a run time when the agent is using us we figure that a particular tool let's say is not usable by the agent for whatever reason there's an error or there's some failure or it's not able to understand the tool in real time that agentic pipeline is invoked and a new version of that tool gets created and the newer improved tools that added and is added into the LLM's context the agent's context. We also

have like continual learning where when we see that like agent is taking a zigzag trace to reach an outcome we convert that zigzag trace because we have the whole end-to-end agentic trace of what it is executing what was the use case etc. We convert that zigzag trace into a set of skills so the next time agent does something similar it will take a straight path making it more reliable robust and token efficient and time efficient as well. We also learn from failures like do's and don'ts of using particular tools use cases pitfalls etc. So that that's the whole harness that we provide. We also have like a notification system so like what we call triggers. So like the agent can be notified let's say when like an email is kind of like received or a Slack message is received or a PR is get created. So, the whole system or harness around the agent communicating with apps, the knowledge work app, knowledge work apps, that's what Composer provides. >> [snorts] >> Okay, cool. I wanted go one by one

through I think all of those topics and go a level deeper on each. For sure. >> Before we do that, though, I would love to understand a little bit more who your users are. And maybe, you know, I'm sure this is changing cuz obviously we have phenomena like open claw popping up and a whole new populations of users coming online and I don't think that process has reached its endpoint by any means just yet. But as I was using the tool, I was kind of like, okay, I see two ways or two two sort of broad scenarios that I might use this in. One is it's become almost reflexive at this point for me to go to Claude code for the first, you know, first whenever I want want to do almost anything on my computer. Even if that's something as generic as like search for an email, uh I'll go to Claude code and ask it to search for the email rather than go into Gmail and, you know, and search directly in Gmail. So, there's I guess I would call that the sort of hobbyist market or like the individual user who has their individual assistant agent. Uh and those folks I

could see really needing to or or really benefiting from something that allows them to expand their toolkits really quickly. Um just just this morning, actually, I was kind of onboarding a teammate who hasn't used Claude code so far and one of the questions she had for me was, how did you give it access to our Google Drive? And I was like, well, actually, that was kind of a pain in the butt. Like I had, you know, it was like, well, Claude actually talked me through the steps, uh but the steps were pretty gnarly. I had to go into the console and create an app and then click over here and this permission whatever. I never remember all the steps. And so, you know, that wasn't super easy. And I can see a lot of people just like stumbling over that or or just for ease of use getting the hey, if I can get a thousand of those um where I don't have to go, you know, Slack is another one, just absolute nightmare of permission adding and all that kind of stuff. So, I see that persona. I am that persona. And then I also see you have like an SDK which seems to be really geared more toward production apps. And for those folks, I'm like, hmm,

that's interesting because how many people want to sort of dynamically bring tools into their app? You know, it seems like it starts to make the app itself potentially kind of unwieldy. On the other hand, I do see a lot of value in, you know, managing off, for example, for, you know, a thousand apps. That doesn't sound like a lot of fun. So, if you can make that a simple process for developers, that sounds quite interesting. But I guess I see these profiles kind of going somewhat different directions or at least getting the bulk of the value from different parts of what you've built. So, interested in how you segment the market and what you see the value drivers um primary value drivers being for those different profiles. >> [snorts] >> Yeah, that's a great question. So, as you rightly pointed, we have like a two focal bifocal product. Uh one is for the prosumer market, which is people using Claude code, open claw, etc. and like plugging in Composer connect. That's like what we call that product.

As a single MCP server inside all these agentic runtimes that anything like Claude X, etc., whatever they're using. >> [snorts] >> And for them, the value prop is you like exactly what you pointed. Like you don't need to go to this MCP server, plug it in, understand the instructions of Google Drive, then Zoom, then this, Data Dog, etc. You just get like one MCP server, which is like connect.composer.dev/mcp. It's as simple as that. You put it inside your Claude code and then like you can manage your authentication directly via Claude code. If you ask it, I want to connect with like a new app, it will give you the link or you can like for like if you like the GUI experience, you can just go to the Composer's dashboard and do it all there with managed permissions, etc., managed scopes. So, there the value prop is like simplicity and getting the power of like almost anything at the fingertips. >> [snorts] >> On the other hand, on the developer side, everybody is building agents at

this point. Be it like startups to kind of bigger enterprises. And one of the biggest problem statement while building agents that people face is they to give them actual real power to be able to do things. They want to give they want to connect their agents with actual knowledge work apps. And that's where we come in, we solve it. Where I think the the the value that we provide there is other than like like auth and all the integrations, managing the scopes. Like like you can give whatever granularity of scopes you want via us, uh controlling it in action level. >> [snorts] >> Providing the whole harness because like at [clears throat] scale, this like you everybody who's building agents wants to create the similar harness that we have seen works really well. The pattern works really well for like a bunch of like agentic like paradigms, specifically chain religions, which everybody's building right now. So, like the whole harness and its bits

and pieces are available like fully in like full modularity. Like people want to use just like our tool discovery with Composer's tools as well as their tools. They can use that. Uh they just want to use workbench, which is our sandbox where the execution can happen where auth and all is controlled by us. They can just use that. Like we just Claude act essentially. So, like like all like bits and pieces, the whole harness is also in in the developer side of things, we have uh on one hand the whole harness where you can just plug in that single thing via MCP, via API, via SDK into your agentic system. We also have bits and pieces of modularity where, okay, if you just want to put this, you can put this. You can put like want to put this, you can put this. And like if you just want to use our tools, you can also do that. Like where you manage the whole harness. You we just provide you the auth and actions. Uh the idea there is people want things like governance, observability, auditability, and that all sits inside

like Composer's dashboard. So, that's where I think like and also we have like like obviously some amazing enterprise customers that kind of gives you trust because you can't This is very critical data set that kind of you wouldn't want to give to any company, but at this point, we have AWS using and building their core agentic product on top of us. Zoom doing the same. Glean doing the same. Airtable. So, a bunch of like tech first hyperscalers are trusting us. That kind of gives that trust level because they've already evaluated us for all the things that you would want to evaluate us on. What are the big things that they want to evaluate you on that maybe I should be thinking harder about when I go cuz I'm a pretty um you know, prolific tester at least of a lot of products and I'm increasingly mindful of not, you know, certainly I don't like give full access to my Gmail or whatever to just anything that I happen to sign into. Um but I do

you know, I do connect a lot of accounts to a lot of things over time. What are the things that are kind of the biggest the most vulnerable attack surfaces or the biggest risk factors that the big companies have beat you up on already to make sure you're solid uh that the rest of us can take to the bank? Yeah, I mean like like first of all, I think providing least to my privileged access control. I think we have done a pretty good job where we have you can define what action you want the agent to give at and like the agent will have access to only those actions. So, you don't like to start with if you don't want to give read and write email like write email or send email access to the agent, you can just give like read. Same thing with Slack, same thing with all the work related apps. So, like that access control is pretty kind of important for like you to have that granular access control. Like that's one. Then second level of control, you can kind of we have like a bunch of ways in which you can control what the agent

can take action on via hooks. So, like while before calling the like tool, you can check what the tool execution is doing and create guardrails around it like human in the loop. And we have like patterns on for all of these kind of pre-built. Like after calling the tool, like before the agent getting the response, you can have those hooks like see what the what the agent is doing, what the agent is going to get like what kind of data the agent is going to have. So, like if you want to have some guardrails around that, you can do that. So, all those type of like guardrails are already present in the product. That's second. And then third, obviously, like compliance is a big thing. So, we have all the like kind of compliances that people want for like SOC 2, etc., which kind of make them somewhat more comf- like comfortable. And like the fourth one is, which is pretty valuable for enterprises, is we also do self-hosting. In some cases, you wouldn't want to use our cloud. So, we also self-host in the customer's VPC, which gives them much more kind of like sense of breathing

there. Like AWS, for example, in case of AWS, we have self-hosted Composer inside AWS. Gotcha. [snorts] Okay, cool. It's a good rundown. Hey, we'll continue our interview in a moment after a word from our sponsors. Everyone listening to this show knows that AI can answer questions. But there's a massive gap between here's how you could do it and here I did it. Tasklet closes that gap. Tasklet is a general purpose AI agent that connects to your tools and actually does the work. Describe what you want in plain English. Triage support emails and file tickets in linear. Research 50 companies and draft personalized outreach. Build a live interactive dashboard pulling from Salesforce and Stripe on the fly. Whatever it is, Tasklet does it. It connects to over 3,000 apps, any API or MCP server, and can even spin up its own computer in the cloud for anything that doesn't have an API. Set up triggers and it runs autonomously

watching your inbox, monitoring feeds, firing on a schedule, all 24/7, even while you sleep. Want to see it in action? We set something up just for Cognitive Revolution listeners. Click the link in the show notes and Tasklet will build you a personalized RSS monitor for this show. It will first ask about your interests and then notify you when relevant episodes drop. However, you prefer, email, text, you choose. It takes just 2 minutes and then it runs in the background. Of course, that's just a small taste of what an always-on AI agent can do. But I think that once you try it, you'll start imagining a lot more. Listen to my full interview with Tasklet founder and CEO Andrew Lee. Try Tasklet for free at tasklet.ai and use code cogrev for 50% off your first month. The activation link is in the show notes, so give it a try at tasklet.ai. Support for the show comes from VCX, the public ticker for private tech. For generations, American companies have moved the world forward through their

ingenuity and determination. And for generations, everyday Americans could be a part of that journey through perhaps the greatest innovation of all, the US stock market. It didn't matter whether you were a factory worker in Detroit or a farmer in Omaha, anyone could own a piece of the great American companies. But now that's changed. Today, our most innovative companies are staying private rather than going public. The result is that everyday Americans are excluded from investing and getting left further behind while a select few reap all of the benefits. Until now. Introducing VCX, the public ticker for private tech. VCX by Fundrise gives everyone the opportunity to invest in the next generation of innovation, including the companies leading the AI revolution, space exploration, defense tech, and more. Visit getvcx.com for more info. That's getvcx.com. Carefully consider the investment material before investing, including objectives, risks, charges, and expenses. This and other information can

be found in the fund's prospectus at getvcx.com. This is a paid sponsorship. Um Let's talk about these sandboxes a little bit. The the paradigm there is like a little bit confusing to me. I'm not exactly sure how to think about like what execution should and does happen in different places. Obviously I run, you know, if I run, which I do, Claude code on my local machine, it is mostly running things on my local machine, right? All the bash commands and stuff are literally happening on my system. There are also some tools, like their search tool, right, that's built in that happens on their side in their runtime, um their infrastructure. And those can communicate back and forth in terms of like, you know, that the result of that search can get sent down to my system and the result of commands run on my system can get sent obviously up to the cloud to be, you know, part of inference. But I imagine that gets kind of fuzzy or

weird when people have a mix of things where they Like how do how do they decide or how do you guide people on deciding what should be run in your infrastructure and what they should run on their own infrastructure? Like I I assume you're you don't it wouldn't make sense for them to try to like bring their whole app into your sandboxes, right? So uh So yeah, how do you kind of how should I kind of unclutter my own mind to think about in general who should be running what? So essentially we have like in our sandbox we provide a ton of utilities which makes it easier for LLM to write code on top of it. Uh that same like Docker image of sorts, we are making it available like locally also very soon. So the same thing if you want to use your like let's say own tools, internal tools, or local tools, um and kind of like like want them to be available inside composer sandbox, you can also do

that. Like that's coming very very soon. But the idea is in the sandbox the agent has to write very minimal code and the auth side of things and a bunch of like side things, mostly auth, and like some kind of abstractions around what the like tool calling are taken care of like already in the primitives that the agents get. So that's the benefit where okay, like like the agent doesn't have to deal with a bunch of things it shouldn't and it just writes very simplified code. And all the things around it like deciding what auth to use, >> [snorts] >> kind of converting things from code to like function calling etc. Whatever it needs to be is done by uh by the the tooling that we are providing internally. Got you. So it's a So the biggest value driver you're highlighting there is actually making tool calling, or not exactly tool calling, but making code writing

easier for the language model by providing and you said mostly auth. What else is kind of in that uh So they I feel generally into the Yeah, category of hardness, right? Uh So what else besides auth do you see and I have experienced that certainly where yeah, as we kind of mentioned, you know, it can be hard to get these things set up. So it's intuitive, at least pretty intuitive to me to say, yeah, if you could provide all, you know, solid auth uh code ready to go so that the LLM doesn't have to recreate that all the time, that sounds like a clear win. What else is like that where there is enough deterministic stuff that you've built out that it makes things a lot simpler for the agent. So like for example, file sharing is the other thing. Like basically we have like mounted folders where everything on in that folders an LLM knows that because it's part of the uh the hardness which is in the description

of the function and system prompt etc. So we have like like mounted folders where everything put in those mounted folders are all like uh by default uploaded to S3 and like we have like shareable links available for them. So whenever agents want to share anything outside, it's very simple. It just like moves or copies files to that particular folder. So things like that like which kind of and like more will be coming, but like like giving like better like whatever it wants to do and we kind of like um with newer uh use cases every now and then, right? So file sharing is definitely one of the biggest ones where LLM has created like going through all 10,000 emails, it has generated a report. Going through a like all the Stripe activities, it has generated a report and it wants to share it with you, uh the user. And like then we make it very easy for them to do that. Like LLM can write code which uses LLM. So we've made it very simple to do that. It's kind of like inception. Like LLM is writing code which uses LLM, but

it's needed for like processing 10,000 emails. Otherwise, how would you do that? So like utilities around that. So all these are like like some minor things, but overall it increases efficacy of agents and uh accuracy of agents to a big extent. Okay, cool. Um On the topic of like discovery or what I think you also call smart MCPs, I've been on the lookout for a while back when, you know, this sort of MCP phenomena first popped up, I was like, where are the smart MCPs? It seems like at first we just had this massive wave of people wrapping APIs in the MCP layer. And okay, that's fine. I was kind of like, okay, sure, but it seems like where this really gets helpful is if the MCP itself is smart in some way so that it can take in not

a very specific command that could equal, you know, it's sort of like this MCP call could have been an API call, but something that is higher order that maybe involves the composition of multiple tool calls or even potentially multiple different APIs working together to do some higher level job. And honestly, I haven't seen a lot of that. You know, I kind of asked and asked and looked around and looked in repos to try to figure out if anybody was doing this and there wasn't much. Now, you guys are doing that and it seems like it's it is a pretty big focus of your of your value. So I'd love to unpack like how that is working and especially maybe getting into a little bit like the sort of progressive disclosure of it because this I think has also come to the fore in developer conversations recently where it's like first it was MCPs going to take over the world and now we've heard heard a little bit of the trough of disillusionment of, well, it's a lot of context blow and maybe you know, CLI is better. I kind of always end up thinking that

like one of my one of my AI mantras is everything is isomorphic to everything else meaning like whether it's an MCP or a CLI whatever like you can probably do progressive disclosure and you can avoid you know, crazy blow if you think about things the right way. It doesn't seem like those decisions are so sharp as they used to be because there is just so much room to create flexibility in the context of intelligent systems. Um, but take me through the smart MCP paradigm that you're developing. What makes it smart? How you're building that to again make things as easy on the agent as possible to do as much for the agent as possible etc. Yeah, for sure. So I think I said mention earlier if you give 1,000 MCPs to LLM it's obvious like we are hitting like 1 million context windows. Maybe in some time it will increase to five or whatever, but

attention is definitely not free. So the lesser you give uh, the better the performance will be and that's where things like just-in-time tool discovery so that your LLM is not suiciding by seeing or overwhelmed by seeing 1,000 tools and you giving the right set of tools is important. That's where given we have like 50,000 plus tools and most of the users want to use all of them uh, because like it's kind of like dynamic, right? Like their users specifically in case of like developers building uh, they want to have give their users power of like whatever composer has. So the idea is like the LLM doesn't see all the tools. It sees like a few tools and then like like dynamically new tools get added to the context. So that's one part of smartness. Uh, the other that I was mentioning was around learning like so what we call is like background learning of sorts. So

whenever we see that like a tool is not like comprehensible by the agent which means that LLM like the LLM is not able to understand what the tool does or it's trying a lot but it's always erroring out. Then in the background automatically a new version of the tool which we feel is kind of very valuable for this particular use case. On a general there are issues in the tools. So a new version of that tool gets created in real-time and gets added to the context. So that's another level of smartness like which is like okay, we can create multiple versions of tool really quickly and get a new version of the tool which might be more like like suited for the particular purpose that the agent is kind of picking at this point to the like to the agent. Then the another thing that happens in the background is skills have gotten really popular, right? And the reason for popularity is it kind of is an abstraction level above tools where you can have instructions to particular use cases baked into skills which

inherently use tools, but there are instructions there are scripts which make it more robust and more repeatable compared to just providing tools to the agent. So what we do is when we have the whole end-to-end agentic trees converting them to a set of skills which are reusable and provide them during the just-in-time tool discovery. So like it's also just-in-time skills discovery of sorts. So the agent already has how to do the tool, what tools to call for this particular but to achieve this particular outcome. Uh, what code to write in the workbench or sandbox to achieve this particular outcome. And then like it just has to maybe the use case is a bit different. So it has to use that skill but like do a bit of fine-tuning for that particular use case and reuse that code or skill for its purpose. So that's the smartness like like background learning and it also include failures. So if like there are some failures that we see happen again and again, we just tell that to the agent beforehand in context. Okay, these are the pitfalls, these are the do's and don'ts of like using particular

tool doing this kind of like achieving this type of outcome. So it's like at the end in my opinion Harness is nothing but like context. You have to engineer the context and that's what like like we are giving some effective context engineering at our tools. Which makes it smarter. Hey, we'll continue our interview in a moment after a word from our sponsors. One of the best pieces of advice I can give to anyone who wants to stay on top of AI capabilities is to develop your own personal private benchmarks. Challenging but familiar tasks that allow you to quickly evaluate new models. For me drafting the intro essays for this podcast has long been such a test. I give models a PDF containing 50 intro essays that I previously wrote plus a transcript of the current episode and a simple prompt. And wouldn't you know it, Claude has held the number one spot on my personal leaderboard for 99% of the days over the last couple years saving me countless hours. But as you've probably heard, Claude is the AI for minds that don't stop at good enough.

It's the collaborator that actually understands your entire workflow and thinks with you. Whether you're debugging code at midnight or strategizing your next business move, Claude extends your thinking to tackle the problems that matter. And with Claude code I'm now taking writing support to a whole new level. Claude has coded up its own tools to export, store and index the last five years of my digital history from the podcast and from sources including Gmail, Slack and iMessage. And the result is that I can now ask Claude to draft just about anything for me. For the recent live show I gave it 20 names of possible guests and asked it to conduct research and write outlines of questions. Based on those I asked it to draft a dozen personalized email invitations. And to promote the show I asked it to draft a thread in my style featuring prominent tweets from the six guests that booked a slot. I do rewrite Claude's drafts not because they're bad but because it's important to me to be able to fully stand behind everything I publish. But still this process which took just a couple of prompts once I had the initial setup complete easily saved me a full

day's worth of tedious information gathering work and allowed me to focus on understanding our guest's recent contributions and preparing for a meaningful conversation. Truly amazing stuff. Are you ready to tackle bigger problems? Get started with Claude today at claude.ai/tcr. That's claude.ai/tcr and check out Claude Pro which includes access to all of the features mentioned in today's episode. Once more that's claude.ai/tcr. So double-clicking first on that first one the the discovery of tools. What do those requests typically look like cuz I I can sort of imagine and maybe you maybe you can answer this with you know, specific examples from specific customers or yeah, however you want but I'm sort of imagining sometimes I might just say like I want to connect to my Google Drive and then it's like okay, great. What are we going to do? We're going to get the Google Drive tool. Pretty

straightforward. Definitely still very nice to be able to have that as a natural language interface. Other times though, I guess I might imagine maybe I don't even know what I want. Like is there a tool for this or you know, how would I go about doing something like that? So I'm kind of interested in the breakdown of the kinds of requests that you get and how many of those today? And this I think there's kind of another version of this question too which is like as we actually get into doing the work how much of the work is happening in ways where the the user sort of specified it. I want to move my I want to go call linear and you know, get some details and then put it over here. You might think of that as sort of automated copying and pasting, right? Like stuff exists and we're kind of moving it around versus really figuring out higher order stuff like I don't really know how to do this but this is

kind of you know, probably what I'm trying to accomplish and then the system figuring out what are the tools, what are the steps to make that happen. So I guess that exists both on discovery and at the level of execution, but the key question there is like where are we in terms of people defining what they want their agent to do and the agent following those instructions versus people describing intent and the system sort of really figuring out how to serve that intent even if the person doesn't have it all mapped out. Sure. I just want to kind of like clarify one thing there. So we have like a pretty intelligent mediator in between. So the direct user request is not what we get. We have like Claude code sitting in between and that sends the request. So it that's very intelligent mediator in between. [laughter] It's kind of like like navigates the user request and sends the right level of intent request to the tools. So like we don't get in

most cases like Claude code already knows the power of what composer can do. So it figures out okay, if the user ask how do I connect to Google Drive? It will directly call like manage connections which is our auth management tool to do that for the user. It won't give a search intent to our like like tool discovery tool. So that's where I think the intelligent mediator which is the LLM or agent tech kind of like run time comes in between. But to answer your question, I think it's becoming more and more intentful. We have all seen how people are using open clause of the world. I think December was a big shift where people realized that these models have gotten to a limit where much more is possible and I can trust it much more compared to before. And that has happened across all domains in my opinion. Engineering was obviously like software

was the first one to bite the bullet. But I think it's happening more and more across knowledge work where people are be- becoming more intentful with these agents and give I think in a lot of cases just give the outcome they want to drive to the agent and let the agent figure out. And then like we are kind of the harness which provides all the right tools for it to be able to do that, but agent is smart enough to throw the right intent with the right tool. Can you maybe give some examples of that? Like instances of people with intent that then gets mapped on to the tools and I guess those could come intuitively it seems like a lot of them would come from individual cloud code or open claw users. But I'd be really interested too if there are examples where that is actually happening in an app that an app developer you know has running in a production environment. I don't know if

that's I don't know if we're getting there yet or even if we want to. I mean I'm still a little bit a lot of these things I do feel like what's the app at that point, you know, is kind of an interesting question, but yeah, give me some examples of like intent that you have seen resolved into actual successful execution. So at this point I think like people are so confident that they give their whole Gmail access to the agent and ask them to okay, go through last month of my email and archive all the emails that don't seem useful. Like so the agent goes and writes code to do that which uses LLM etc. To figure out which emails are not useful. Then obviously like like the other use cases are like I use my open claw for hiring. Uh so my open claw goes through a lot of like GitHub repositories, find good commits of individuals and creates a pipeline for like specifically engineering for me to hire like like good open source agent tech repos, Python repos, TypeScript repos etc. It

will go figure out the best contributors, uh specifically figures out all the enriched data of where they belong to like SF or outside. >> [snorts] >> Uh like their emails, LinkedIn, like all the social data and like I I've also kind of like given its own like it its own email, so it also reaches out on my behalf to them. So it's like end-to-end hiring like like a recruiter job I've literally offloaded. And like there have like thousands of like really good folks it has emailed and I've gotten like in last week or two week I've gotten let's say like 30 40 calls set up. So that that whole thing is like fully done by my agent and Composio. Yeah, like I think and like that's the idea, right? Like a lot of exploratory to actually end-to-end knowledge work is getting offloaded to these agents. If that answers like to certain extent your question. Yeah, I mean if it gets as it gets up to sort of job scale things it starts to be both

pretty intuitive I guess what that ultimately looks like and also potentially quite transformative for many aspects of life. Uh I like that I feel sales agent is doing something like a sales person is doing something similar for sales. Obviously he's not emailing them because they're like very like [snorts] directly emailing them via the agent, but like the agent has drafts ready which they just need to kind of like press the send button. So like that's the level where we are. Like it kind of like does the whole sales figuring out the people that it needs to reach out, the right people who are building let's say agents in different companies uh and etc. Like ready uh with like their data like email, LinkedIn etc. And then the like he has to just press the send button. So how do you make sure that those agents have the right contacts? Cuz this is something I'm also thinking about right now with

So one of one of my favorite things that I've set up and I I kind of can't stop talking about this, so I'll keep it brief cuz I've probably talked about it on a few episodes already. But I have exported to a local database basically the last five years of my communications. All email exported out of Gmail, all Slack, basically every DM platform that I use, all of my calls which I've been recording for like the last three years transcribed. Every like turn in the call becomes sort of a everything's organized into threads. So a Gmail thread is a thread, but also a call single call would be a thread and then each statement back and forth between the the people is like a message as you know kind of kind of corresponds to the emails back and forth. Even the podcast that kind of break down with the transcript and you know put it in there in the same way. So it's like a lot of the communication that I've had, I probably majority is now in this database and it is extremely helpful for getting the context that is needed

to understand like who is this person? What is my relationship with them? Do we have projects that are ongoing? Who all was it if there if I started a project like who is involved with those projects? How did that evolve over time? Whatever. I am a little reluctant to like throw that into some random cloud container you know that I'm just testing out, right? So that still only lives on my local machine. So this kind of brings me back a little bit to like the containers and the context and and what should sit where. Cuz I assume just like me as an individual, you as a company have a lot of information about what makes a good candidate, like who are the good candidates that we had before, you know, good examples, bad examples you would compare them to. It's endless, right? How should people think about like sending these agents off to do long running things, making sure they have the context that they need, but not like putting themselves too much at risk by

my personal database already is a gigabyte, you know, it's my whole life in there, right? So I do need to be a little mindful of where I send that around or like what ports I open up, you know, to access it. And I don't even know I don't even know what all is in there, right? Like I'm sure I've emailed myself credit card numbers and done all sorts of stupid emailed myself passwords and probably you know, recovery codes I'm sure are in there. So I'm sure I've done all sorts of stupid things which could come back to bite me. How do you think about that balance between making sure the content the context is there so they can succeed, but managing the associated risks? So yeah, that's where I think like the the managed access and least privileged access control comes into the picture. And the idea that we at Composio have is you can in future you'll have like not just a single agent, but like multitudes of agents. And the idea is you'll have different profiles for and access control for each agent. Some agent will probably have only read

only access to your data so that they don't they can't do any malicious of sending or kind of like executing anything, but they need all the data and like you will want them to be very self-contained because they are like sort of research agents, so they have like a lot of data, but they don't they don't have the permissions to diverge that data by mistake. So like like you'll have very like tight access control on what they can do, but they have read only access to everything. So that's a profile that you can create inside Composio and manage it like what access permissions that you want. There can be another profile where you have a lot of kind of uh writing permissions that you've given to that agent, but then that agent has very limited personal information or like company wide information given to that agent like just because what if that agent emails like some like secret token by mistake. So like there the granular access control is okay, you want to give a lot

of right control, but like you either have things like human in the loop etc. to kind of guardrail what the agent is doing. And like tight it very controlled. So like these are the types of different profiles that kind of like will exist in the future and you would create like a like different open claws, different cloud codes, different codexes to do like a mix or like all of it. Let's talk about the continual learning aspect. Mhm. I think it's a sounds like a huge value driver and sounds like probably a necessary one for Composio's success. I don't know if you would go that far, but what I see is like the barrier to spinning up a new tool is certainly dropping, but the value of having seen a ton of uses of that tool and being able to figure out what the actual effective pattern is, that's something that's not going to be easy for people to recreate on their

own. So if you can make a step change difference in the results people can get from even even if they you know spin their own tool up real quick versus okay, now we but we've seen it, you know, 10,000 uses of this and and know what has actually been effective. That strikes me as like a as the moat in the and we're all you know, searching for the increasingly elusive moats in the AI space. I'm interested in you described in in some detail already like how it works. I guess the philosophy part of it I'm a little wondering about is how does it work across users? How does it work across apps? There's obviously, you know, a sort of depersonalization or, you know, anonymization aspect to it that I'm sure is like critical. But then also does everybody if you find an upgrade, does everybody already like automatically get that upgrade or do you have to subscribe to upgrades or maybe the upgrades that you're doing are sort of in some cases like very specific to an individual or, you

know, a particular app such that like you feel like you can upgrade it for them, but you don't have to change how it is for everybody else. I I feel like I I both want those upgrades as a user, but then when I do have something that's working well, I'm a little bit afraid of those upgrades just like I'm afraid of even if you know, whatever, let's say obviously the model makers are leapfrogging each other all the time. Similar thing there, right? If a new model comes out, it might be better, but is it going to be better on the thing that I already dialed in to my satisfaction? Like maybe, maybe not. So, I don't know, that all just seems like quite fraught. How do you What's the philosophy that guides you in figuring out who gets what upgrades? That's a great question. I think we like think about it a lot, by the way, internally and that's where like we have designed our infra to make it very easy. Like we call like we have as I was describing like multitudes of versions of tools. Uh and there might be like tens of thousands of a particular tool version that like like of the same tool, you might have like tens of

thousands of versions. So, the idea there is there are some personalized upgrades. So, kind of like like we see that you are using some particular tool in a particular way and the tool can be better specifically for your use case. We'll have the upgrade only for you. Don't fully buy that like fear argument just because I think we all know like the models are changing, the model behavior is changing every other day. So, the way you can control the behavior of the model even when model changes or some of the tools change is like having skills and that's where we're very very thorough about when we change the skills that we have developed for users. So, that doesn't change that often. That's kind of like the fixated level of like repeatable behavior that like the user like you like something the way it is. So, that's kind of like ingrained into your skills like the personalized skills that are created in the back end by us. But like the tools themselves keep on

getting better and better. So, like I think like you were talking about moat, right? I'd want to like we have like a gazillion of instances where we have seen the docs are totally like wrong for a bunch of tools. And because we have gone through so many agentic agents using our tools, like agents use tools in like insane different ways and compared to previous generation where humans were using like these API slash tools, like you hit a newer edge case every now and then. And that just makes like our tools better and better because like you've seen all across like last week we found like a bunch of cases in Google Calendar where our tools are much better. Like it autonomously happened. We just even got didn't get to know in some cases, but like our tools are much better than what docs propose. And that's not true for just one app, it's true across apps. So, I think that's like like a moat which we have developed because like we have gone through so many agents poking hole

around different apps and like like making our tools better. And those upgrades are kind of available all across, right? Why would we want those upgrades to be available to a particular user? Like if our tool is getting better and better, that should be all across. So, that's like how I think about it. Like basically there are some improvements which should happen all across because they are very generic and some skills should be available across because they are kind of like how people like how agents should use particular tools and then some are like very personalized to your in your use case. >> [snorts] >> You said something I thought was quite interesting and I'm not sure everyone would agree with. Um but it was that skills basically tame the models, right? I forget exactly how you said it, but you sort of said, you know, models are always changing, of course, but that's where the skills come in. Once you've really defined a skill, it sounds like you're of the opinion that you can kind of swap out models underneath and you'll get pretty consistent behavior even across models. Obviously, I know there are caveats with that in the sense that, you know, you

can't massively downgrade to a, you know, a 1B local model and expect to get frontier performance. But if we, you know, take a narrower understanding of that statement and you know, restrict ourselves to like frontier [snorts] models or whatever whatever reference class you want to use, um that's still pretty interesting. I think a lot of people would say they don't feel confident in that. But how confident are are you in that statement? How confident are you that like um yeah, I guess the implication would be all the frontier models are good enough at following instructions that if your instructions are really well built out, then they become kind of interchangeable. Is that a good summary of your view? Yeah, in like a lot like most cases I've seen that kind of true because like in like most cases the skills are pretty well detailed enough in giving decently granular aspect of what you want to achieve the path the trajectory that the LLM should take, the agent should take to achieve a particular outcome

that if the model is good enough and follows instructions in most cases like the kind of like the behavior of the model like or the the trajectory of the models remains consistent. And that's that's a known kind of like pattern that like a lot of research and industry are also seeing like where you get Opus to create a skill and then swap to Sonnet for using that skill which is like okay, the first time Opus being more smarter is better at navigating and figuring out like and reaching the outcome, but once you have done that, you have the skill which Opus created, then you can swap to a cheaper model and achieve a similar issue outcome. Do you think that also holds true even swapping across frontier model providers? It's sometimes not. It like so, I think there are like behavioral patterns across different providers

which make them like different. For example, like specifically I have some examples that I see in day to day. Like like Anthropic models are somewhat more agentic in terms of things like okay, if there are some tools which require polling, then it will wait. It will write code to wait till like for example, you have to poll and then like like continuously poll. I think that polling aspect of like agentic polling is much better ingrained than like Anthropic models somehow and GPT just stops. It waits for user input after that. So, those are some like like behavioral patterns that are different which makes some of the skills like the way GPT will use those skills or would kind of build those skills a bit different from Anthropic side of models, but to majority extent except those like nuances, I think this holds true. Interesting. Um do you in practice like advise people to do

I guess what would you say is the sort of max efficiency play and do you recommend it? Like if I if I go develop a bunch of skills with Opus, what's the cheapest model that I can trade down to that would that you would expect to work, you know, some large majority of the time? For sure Sonnet because that I like do regularly because like I in some cases like I write my like for example, I do I use skills a lot. So, like like in local I'll write like some skills for the first time via Opus, but then like trade off for speed to Sonnet and that works phenomenally well. I don't do it with Haiku because I've seen like that doesn't work really really well. I've tried some with GPTs and like like 90% of the skills just work. There are 10% of the times where like there are some nuances which are present in the skills because of like the model

that wrote the skill, how it operates and they are not exactly plug and play, but in 90 to 95% of the cases just works out the box. Interesting. Would you expect like Gemini Flash to hit that level as well? I think so. I think Gemini Flash is decent in this much so like I think like if not now, maybe like the next iteration, it should. Tried it like myself so I can't like comment directly, but like I've used it in production setting where it feels like very very smooth. Skills as the re-commoditization layer of um what I think I would say the prevailing narrative has been recently that the models are starting to diverge in their not exactly capabilities in the sort of macro, you know, benchmark sense cuz they're all kind of climbing the same curve obviously, but that they're diverging more in like qualitative ways that are hard to wrap your head around and sort of uh thus creating some

stickiness and some you know, some pricing power for companies once they get people using their stuff that it I think there's been the increasing sense it's like harder to switch off. But you're making a provocative I mean not deliberately provocative, but it's definitely it's provoking thoughts in me that it might actually come the boomerang might come back due to all these skills really just getting so thoroughly defined that you don't necessarily need great judgment in a model to do it. You just need good instruction following. That's exactly the same. Yeah, that's where we actually position composure as a one like one short way to being not locked in essentially to a model provider. So if you use composure's hardness, you can use it with Anthropic, with OpenAI, with open source models. So tomorrow, let's say today you are using open like Anthropic. OpenAI is

gradually improving on a bunch of these things. If you want to switch to OpenAI, you can just have like all your auth, all your skills at just a single place and make that switch and get 99% reliability with the switch. The after you decide okay, open source models are becoming equally great and they are probably 10x cheaper. You want to go there, you can make that switch and like still continue to work with 99% reliability. Something I'm I'm still in the middle of doing actually right now is taking the article that >> [sighs] >> I think it was Threek. I don't know the person's personally, but a member of the Anthropic technical staff put out a post just in the last 48 hours or so that was very well received that was like, here's how to use skills. We've learned a lot, you know, here's what we've learned. So I didn't even read that. Instead, I just copied and pasted the whole thing, put it into Claude code and said, here's some best practices that I just heard were, you know, popular and and people

say are are giving good results. Can you go apply these or you know, talk first put go into plan mode first, tell me how you would think about applying these to all the various skills that we've worked on together. And you know, naturally it has a lot of good ideas for what to do there. I wonder if you guys would be maybe already do this, but if if you've already invested or or might think about investing in skills specifically for translating skills from provider to provider cuz I can imagine that 90-95% could easily become 99% if you applied another layer of transforming and and kind of compensating for the known quirks. Is this something you already do? Yeah. Yeah, we are like we are developing a bunch of like metrics and benchmarks around this stuff and we already do some sort of this stuff. Nobody's specifically in enterprises want a lock-in because they want that optionality of moving across providers because like specifically in the current AI days you actually never know who the winner

is. It changes very often. Today it's Anthropic. Tomorrow it's OpenAI, Google, Chinese models. Uh XAI. I mean like like we want that optionality because if you completely lock in and like these skills specifically in my opinion are like very to a certain extent very addictive. And on the other extent are to certain like like creating those like you said like if the skills have those behavioral patterns ingrained like how the model works, they can be harder because they are not structured, it can be harder to kind of change them. And that's where I think like like I think the 95% is very easy 90-95% because most of the models are at that level now, but reaching that 100% mark with these skills which are not structured is actually very hard problem and that's something that we are trying to solve already. And we have solved to like make it like somewhat better, but we want to solve it to like probably 100%.

Yeah. Okay. That's really interesting and I do feel my worldview changing a little bit in real time here toward expecting, you know, a little more commoditization, less pricing power. That that also kind of brings to mind like another angle of thinking about lock-in or moat or whatever is memory. Although again, I think there are ways to get around this like Claude recently did this thing where they said like, go ask ChatGPT this, come paste the answer in over here and you'll be like, you know, we'll pick up right where you left off in terms of memory. So that too is not really a a great moat. But in or at least it's debatable. But in looking at all the tools that you have and just browsing through them, I was kind of struck by one, you know, that of course there's like many tools that are these sort of just relatively simple API wrappers where these APIs have existed and now they can kind of be used by agents. Okay, that's sweet. A lot of automation

power that way. But then there's kind of another class of tool that is something that is built for agents in the first place. Memory being one where I saw you have Mem Zero and Zap and probably some others. Um but those are things that are specifically designed to unhobble the AI or enable the agents if you will. Um there's also some things that are around like allowing agents to transact, you know, whether in fiat or cryptocurrency or whatever. Tell me what what does that landscape look like? What are the agent enabler tools that are actually important, that are actually working um that are maybe even strategically important like memory could be one I can imagine where decoupling memory from your model provider if it works well enough, you know, could be another way to sort of insulate yourself from lock-in effects. Um but I would love to hear your

survey of the this new generation of tools built for AI specifically. Yeah, I think like we are very bullish on all of these agent enabler tools like you mentioned memory base, Mem Zero, Super Memories, Zap. Payment based like this kind of like Skyfire. We have like traditional e-commerce ones like Shopify as well which are almost now like agent enablers in that sense. They are all gearing towards it. So I think like we partner with all of them and all of them like partner with us because we want the people building on top of composure to have access to the best quality agent enablers to kind of like like improve the ecosystem. Like I think the bunch are like a bunch of good ones are already there. More are obviously coming day to day and I think like our position is just to be like giving access to our customers all the best quality ones. So I don't think I

have like like any choices there per se, but like we like to give all of them to our users to make the choice. Are there any even I understand the idea that you can't maybe, you know, pick favorites among your partners, but are there classes of these things that you find to be particularly powerful? I think like all of them are getting decent usage I would say like the memory memory is obviously a big one like everybody wants to use that. Uh even payment side there are like ones like Skyfire like for different use case payment and bunch of these things like like that are getting like increasingly better adoption given like the open claw movement. Uh specifically there are like a lot of background agents are running and want to do commerce stuff. Um like search obviously is one of the big use cases where people use things like Exa, Firecrawl

uh the really etc. to do stuff. So I think like like like obviously it's kind of like broad. So like depending on the use case and the problem statement that the builder is solving or like people are using in their open claw or Claude, I think all of them or a mixture of them are being used heavily. >> [snorts] >> Are there any kind of missing categories or missing tools that you're like, why has nobody built this yet? I think yeah, it's kind of like I think every like there are so many that's coming up like at this point I might have to think more to come up with an answer here. Uh but at this point I think I know like people are building every and each sort of like from whatever a human does uh and need like like they are like there's like human delegation from agent to human delegation type of thing that's happening. Uh bunch of things is happening, but

like yeah, I think I don't have like an answer here out of the box. But like still I would say the most users still coming from like the traditional softwares because that's where all the data sets of the users. So things like Slack, Salesforce, I think those are the major users still because they are the kind of like that's the system of records. >> [snorts] >> Yeah. Um Perfect transition. I'm interested in your take on those platforms and how like which which ones are kind of maybe even advantaged or at least like safe from major disruption by the AI wave versus which are more likely to be under threat. You know, people are of course tired of paying their Salesforce bills and their Slack bills. And I think there's been in my mind some interesting but I think kind of confused debates about this where one person will say, I spun up I had my

you know coding agent spin up a Slack clone and look what it did. And then another person will say, well, that's nowhere near what Slack really has to do. I mean, you think about all the complexity that Slack really has to handle. And then I always kind of think in my head like, sure, but for that one user and maybe their small company, all that Slack complexity was irrelevant anyway, you know, so it's just a bunch of stuff that they're paying for that they're never even using. I'm not really sure where that settles though, and obviously there's a lot of different categories from you know, collaboration to task management to you know, customer service platforms of all kinds. What's your take on you know, with the obvious disclAIMEr that this is not investment advice like what what are you buying? What are you selling based on what you're seeing? Yeah, so I think like like there are twofold takes on that. Like one is I think in my opinion the core infrastructure layer is getting much much more stronger.

Because like as you rightly said, making software is getting easy. So a lot of software will get built on the core infrastructure pieces. So and like your dependence on software is just going to increase more and more because like now you're chatting with your agent more than you're chatting with a human doing some task. So your dependence is increasing. So like the the core infrastructure that's driving it is like getting massively more like stronger. Things like obviously like AWS, Cloudflare, all these kind of like base infrastructure because I think it's very hard for like anybody to rebuild those infrastructure pieces. So that's one. On like a bunch of these like the whole SaaS world, right, that you're kind of like talking about. I think there the way I am at least thinking about it is the interface in which this obviously like like some places where you can build a mini version of some big bigger like SaaS app for your particular small niche use case, that's there.

But in most cases, what will happen is the interface with which you'll use a lot of these SaaS apps is going to change. And that's where I think startups will also build those new interfaces and post a competition to the SaaS app. But like in this particular wave, I think all these like older SaaS softwares like Salesforce, Slack, etc. are also pretty fast to catch up on and they are also coming up with their like new agentic interfaces to operate on. And I think that's just like like okay, who will kind of like be fast enough and do innovation. And like we are here to support both honestly. Like in our case like at Compozio, we are working with startups. At the same time we are working with like the like incumbents building new agentic interfaces. So it's all about like like who kind of like like who is fast enough to build the new interface for their users to operate on. >> [snorts] >> So

yeah, I've been I've been back and forth on that myself a little bit. I think one one paradigm I had bought into and I think I still mostly do is that things like Salesforce will probably not be disrupted by startup CRM competitors because sure, you can go build a AI first CRM or whatever and it might be sweet, but it's still going to take a long time to sell the big customers. And by the time you can go win a lot of that business, they'll figure out how to clone your best features and sort of you know, close ranks and and prevent you from taking you know, too much of their customer base away. And so if I think about like how much of the AI value the value of the AI wave accrues to incumbents versus AI first, you know, challengers to those incumbents, I think I go mostly incumbents. But the flip side of that is what about people bringing things in

house? So another example that I've been thinking about because I actually had a chance to study this a little bit is Intercom. Um and they have done you know, I think especially the last week there's been a bunch of praise going around for Intercom saying like you know, this one of the companies that has has adapted the best. So I'm not picking on them by any means. In fact like you know, they they provide a past guest on the show by the way as well. They provide a really strong example of what it looks like for a company to catch the wave and and ride it successfully. And one of the big things they've done of course is create this Fin agent which has basically the ability to resolve you know, some I think when we when I spoke to them it was like approaching like 70% of all customer service tickets, you know, for everything they see across you know, many many thousands of customers. 99 cents each, flat pricing. >> [clears throat] >> Okay, cool. That's sweet. I sounds like it's driven a huge growth boom for them. And you know, easy to see why because if I'm paying humans to do this and the AI can respond 24/7 instantly.

Uh I looked into our data actually. My my company is a Intercom user and we've prided ourselves on customer service. But the speed to response that the Fin agent gives is just like we just can't match that with our staffing size. So okay, it has all these examples or advantages I should say. But then I looked at the Compozio tools associated with Intercom. 133 tools just for Intercom. Which I'm not sure about this, but it sure looks to me like I could do literally anything I wanted to do or needed to do with the entire Intercom platform through those tools. And then that got me thinking, well, maybe this Fin agent is actually going to become a skill for me. And rather than pay them 99 cents each, what I really need to do is dial in

exactly what I want. And I can probably do that better owning the skill on my side versus like trying to populate it on their side. Do it all through the tools. And I probably save I don't know, 90% right? I mean, imagine it would be like 10 cents in token cost. So what's your take on that? Do you think that people will start to cannibalize you know, these agents are awesome, but it seems to me like it wouldn't be necessarily that hard for a lot of companies to say, Fin's doing well, but they're not doing some things quite so well and I've got all these tools. So give me the Compozio 133 tools. Let's work on our own skill and we'll get this thing better than Fin was at 10% the cost. Is that realistic? Why wouldn't that be realistic? So by the way, Fin is a great product. I kind of like like we used it early on in like when we were just launching the prosumer side of Compozio because like obviously

you have a ton load of like support that comes when it's a prosumer product. Uh so we used it early on and it's like like really easy to get started. And and that's what they're solving honestly in my opinion. A lot of companies don't want to spend a lot of time. They want to get started and offload this to someone else. And like Fin is great for them. I think where you made the right point was like like if you want a lot of customizability, uh [snorts] I think more than cost it's like cost is obviously a big reason, but like like the customizability and like governing your agent to like building gives you that freedom of what you want to do. You can create multiple skills. You can like like specifically limit your agent to have particular tools. You can give access to other apps that Compozio has while building those support agent. I think that customizability is what people prefer when kind of making that build versus buy decision. And that's where I think Fin is great product for a lot of people by the way. And that's why it's doing great. But

like there are there will be some people who would want to customize and make the agent more powerful and that's where I think we come in and provide those 100 plus tools for Intercom and you can do whatever you want. Yeah, I agree with you that they've done really well. So just to reiterate that point. But I do I do see the world where just a friction keeps getting so low. I mean, and in part because you've already got the 133 tools, in part because somebody can publish their skill, in part because I can give that skill to Claude Code and say, hey, interview me about my use cases and how I would change this skill to suit me and then it's already plugged into all the 133 tools. And then I'm just like, wow, the barriers are really kind of crumbling everywhere around everywhere I look, you know, it seems like the um the barriers are really getting pretty easy to overcome. Um I guess bottom line like if you're projecting and nobody can project 5 years into the future, even 2 years is a

long time. But do you think like a year from now, would you expect that a lot of companies are actually trying to realize these gains and say, "Hey, we were spending a million dollars a year on a million tickets with Finn, but let's do our own thing and you know, try to save 80 90% with custom skills." Like do you think that will be a movement? I like I don't know it will be like a widespread movement. That will happen in bits and pieces. But for some like companies, I think spending that much time is probably not worth it. For some companies, it is worth it. So kind of like that will be the idea. But I agree. I think the friction will keep getting We will make sure that like from our side the friction keeps getting like lesser and lesser. I think like as you rightly pointed out like there will be skills around it. I think that definitely like like build versus buy is like as these models are getting better and better, definitely people will inch towards build compared like to buy in the future.

What's a cool little bit about agent to agent? So far we've talked about agent to tools and then agent to smart tools. And then I think smart tools start to look like agents in their own way, right? I again, I think it's kind of everything is a little bit fuzzy. The boundaries between a a tool, a smart tool, and an agent I think are not always super crisp. But maybe when I think about agents, I guess I think about representing someone's interest. You know, if I was going to try to to venture a conceptual difference between a tool and an agent, it would be that the tool is exposed for me to use to serve my own interests. And the agent is something that may represent somebody else's interests, but I can still interact with that agent. So you may offer your own definition. Are you how are you kind of planning for agent to agent as part of the future of Composio? Yeah, for sure. I mean like we think about it a lot. And I have some models in which how I think about like

multi-agent world or like agent to agent. By the way, I just want to call out we have a bunch of tools at Composio which are agentic. So essentially for example, there are a bunch of tools where you can do almost anything in a natural language format on a particular app. That's inherently internally an agent that does that thing for you. >> [snorts] >> I think there are kind of like like some places where agent delegation works really well. In some places, it doesn't. In a lot of places, I feel like like we all know like the main agent has the full context about the use case, about what the user wants. It has all set of tools to figure out any more like things that it wants or not. >> [snorts] >> So that's where I think like that the power of giving the right set of tools to the agent if you're not

overloading the context is generally better. So like to give you an example, if let's say I have to book an appointment with someone. And let's say I have like a sub-agent which can do that for me. But my main agent has all the right set of things like my calendar, etc. to figure out if I have like a collision at that point or not. But my sub-agent probably doesn't have it. It's a sub-agent of let's say an appointment booker or whatever, right? If I just delegate the task to the sub-agent, then it's possible that it books an appointment at a time where I have a collision of with like my other some important like board meeting, for example. And then I'll be in a kind of like what happened situation. That's where I think like like sub-agent, because it has all the context, my calendar, my this, my that, will do a much better job if it is just given a tool. So like the way I look at it is if there's like in general like 1 2%

context task, this is not a huge context task, right? It doesn't kind of overload your context a lot. Then it's better to conveniently provide it as a tool. And if it's like more exploratory task where a lot of context will be used, for example, deep research. We all know like in most companies just like a parallel sub-agents run to in different streams and do research on a particular topic and condense the results and give it to the main agent. In those cases like where it's more exploratory, it's very context like it takes a lot of context, it's better to use sub-agents. In a lot of like in a lot of use cases, it's better to just bring it back to the main agent and give it the right set of smart tools. Yeah, so it's like there's no single answer here. It will be a mix and match of like the problem statement on what's the best for that. Are you seeing anything meaningful today in the agent agent space? Any examples you would highlight? It feel has felt largely to me so far like it's kind of

still very much theoretical, right? Like I there's a lot There's been a lot more talk of agent to agent that I've seen actual agent to agent happening. I think Claude team of agents is a very interesting paradigm of what they have done, right? Where there's a shared task list that like that all the sub-agents have and they can they kind of like map onto the shared task list. They can kind of like do inter like agent communication via that shared task list where kind of they assign somebody else something some task that they want like another agent to do, etc. That's a pretty good shared agent paradigm that I've seen in like case of Composio, as I mentioned, we have like like kind of like some tools which are exposed as agents which are somewhat like, "Okay, you can do a two-and-fro as well." Where you give the task to the the sub-agent tool which essentially is like natural language, "Go and find this or go and do this in pure natural language on this particular app." And then if it requires something, it will give it in response and then you can use that like sort of

session ID to control the conversation again and again two-and-fro. So those are some paradigms, but like I like I think it's still very early, but like like it is in production. Like like Claude team of agents is available. They have like agents API, sub-agents API that's coming. Like we have like some preview version of it available where you can manage sub-agents. The agent can manage sub-agents. We have like our internally we like open sourced this thing called agent orchestrator. A lot of our internally like agent like engineers use like a single orchestrator agent to manage 20 to 30 Claude coder code agents. And that like single orchestrator is kind of like figuring out what like different agents are doing, if any action I like is needed from the engineer or like user of it to kind of control these agents. So those are some interesting paradigms that kind of like people are doing. What is your um cost structure for running Composio look like in terms of

human cost versus token cost? And how is that shifting or how and or how do you expect it to shift over time? As I mentioned, right? Like like all of our like integrations are actually built by agents. So the engineering team is actually building agents to build those integrations. So in that particular setting two to three years back, that was like not the case. I think everybody had like a big ass team, specifically the like the tool providers to kind of build these agents. So we have like like literally like three-member team which is kind of like doing it all like like the setting up the whole agentic pipeline. And most of I think we like over last month, probably spent a hundred K on the pipeline that builds those agents. So I think like like our kind of like token cost is definitely much higher than our human cost right now to answer your question. Wow. So you have a three-member engineering team? Three humans?

For the agentic. The overall we have like around 15 people. But for the team that kind of builds the whole end-to-end agentic pipeline that builds, accesses, improves agents, sorry, tools over time, that's just like a three-member team. And do you see the engineering team growing substantially in the future? Is there anything that will um require a lot more headcount? Or is it just going to be like a lot more tokens? I think like at Composio, we are definitely hiring. It's just our bot is too hard like sometimes I feel that we're tired. That's why we are not able to hire faster. But we definitely need humans to control the agents. I think we are still not at the point where like agents work fully autonomously without any supervision. But you expect I know I mean we're getting a lot different predictions around what the future of software, you know, the future of the software labor market looks like. Do you think that you're were

It sounds like you think token cost will grow faster than human costs. But do you think human do you think the size of the human team like levels out at some point? Or you know, does it um What what's the sort of scaling law of humans at Composio? Yeah, I mean like very honestly, we are a startup, right? So I think like the like in startups, the kind of like human versus AI scaling law operate a bit differently because we are already like very very AI first in that sense. But we still need humans to kind of like make decisions in certain cases. And that's where in our case we are hiring. but we are all seeing like what's happening across the board and like bigger techs where I think [snorts] that the Like in our case, LLM usage is already a multiple of like human capital. Like LLM capital or token spend is a for even internal development is kind of like multiples of

like human capital. And that's where I think like in our case we need more humans to spend more tokens. Like that's the idea. But >> I see. >> That's not true for like incumbents where we know like it's not the case and there I think like the human capital is much higher than like probably tokens spent. The ratio I think to answer your question, I think as models improve I think that ratio will definitely go towards more token usage compared to human. One other question I had for you in terms of business strategy is this kind of connects back to the agent concept as well. Although it's it it doesn't require the agent paradigm. It's just like why not resell more? Why not try to control the customer relationship more? Obviously in some cases, you know, I have a Slack account. That's my Slack account and I want to keep it in my Slack account. I don't want to use Slack through you. That wouldn't really make any sense. But then there are other things and I'm thinking yeah, things that I tried with Composable Brand Fetch

to just like get logos of companies or whatever that I need, their color schemes, Perplexity or Brave Search API or X AI which you mentioned. Like any of these sort of generic utility style APIs, as far as I saw, it seemed like still you were allowing them to or not even not even allowing, but sort of the only way that I could connect those accounts was to give a API key that I already have for those accounts. And that got me wondering, why not just like take the money yourself and have, you know, your own big Perplexity bill that your users you know, can kind of pay their way, but through you. Seems like that would if from my perspective it seems like that would be to your advantage and also might be a nice like friction reducer for for users cuz I was like, okay, well I guess I got to go get my Brave Search API key now. But if I already had a you know, payment method on file with you and it was like, oh sure, I'll enable

I think Brave Search is you know, $5 per thousand calls or whatever. You could even charge me six, you know, for convenience and it seems like it would work. Is that something you think you will do or is there a reason you're not doing it today? Definitely that's like a place where we are moving. Right now we do have a bunch of services that are bundled into our paid plans, but I think like like we are pretty soon launching this thing called like premium toolkits where what we exactly what you exactly mentioned, like a single wallet with Composable can give you access to whatever you enable, like all these services like just via a single place, like a single dashboard where you can enable that. That's kind of like like something that we'll probably be launching by the time this episode comes or like like in next kind of couple of weeks is the plan and you can like like set up your credits etc. just at like Composable's end and like use all these services. So that you don't get overwhelmed by like maintaining so many accounts, different billing etc. at

different pieces. Yeah, okay. Cool. I look forward to that. Um I think these are pretty much all the angles that I wanted to cover. What have I not touched on that is on your mind that um that that I should have uh thought to ask about already? No, I think like like yeah, I think one of the things that like a lot of people on Twitter are talking about is around MCP versus CLI. I think that's kind of like a pretty heated debate right now. Specifically with GitHub CLI there's that. I think I have my viewpoints there. Kind of it's like an interesting because it affects us a lot and we are actually like I mentioned, right? We are just like the like reliable tool execution layer. And that's why we are launching a universal CLI like next week which is like the last week of March. I don't know depending on when the episode comes out. But like like with a single CLI you can access all the different apps. So you don't

need a GitHub CLI for just GitHub, Vercel CLI for this Vercel, the CLI for that etc. So like a single CLI which can manage all your apps like with like a single kind of like point of like usage. >> [snorts] >> And like but like I think more towards like CLI versus MCP, I think like I don't know what's your view there, but I personally think it will be a multipolar world because you can't compare GitHub CLI which has been used since eternity by me, by other engineers, which is baked into the pre-training data. Obviously the models will use them greatly. But MCPs are also now like in the post-training data with cloud codexes of the world already being like you being using MCP so much. So I think on terms of in terms of accuracy, I think like it's going to be a multipolar world. And I personally think MCP definitely gives you a lot more control. The MCPs are better for like observability, traceability and and for that reason better for enterprise use cases. Is that fundamental? I mean, it feels to me like everything sort of can be patched and

like couldn't you create hooks on the CLI? I mean, I had a I sort of had a question on this in the outline as I'm sure you originally saw, but then I as we were talking I kind of came to the conclusion that maybe this debate is a is much ado about nothing in the sense that in the end they can both work. They may have some relative strengths and weaknesses now, but as they mature, it's kind of two sides of the same coin. That's that's kind of where I think we're headed. Would you dispute that or you know, what edits would you make to that outlook? No, I think I agree. I think that's where I was kind of going towards. It's like not going to be a unipolar world. I think like both of them will like coexist. And I think it definitely like like kind of like one of the deciding factors will be like where the like more and more tokens are getting spent. So like because that will go to the agentic traces and get added more and more and like that traceability will improve over time. But I think it will be a bipolar world.

Got you. Yeah, makes sense. Anything else we should touch on? Anything else you want to make sure people know about Composable before we break? Um no, I think like like yeah, like we are hiring in SF. People listening if anybody is interested in building the future of agentic tool execution, I'd love to talk. Perfect. Arjun CTO of Composable, thanks for being part of the cognitive revolution. Thanks, Nathan. 50,000 blades >> [music] >> one that finds the cut. >> [music] [music]

>> 50,000 [music] blades, but only one will sing. The right one finds you, that's the beautiful thing. Lost [music] in noise, now learn to hear the tone. Zigzag through the maze till the straight path [music] is shown. Every stumble written, lessons carved in steel. Sharper every morning, [music] that's the way we heal. They will know. Only eyes will turn. Fire in the engine through [music] the night. Array everything aligned, everything aligned. Tools that learn. Ha, tools that learn. Our edge gets sharper with the turn. >> [music] >> Agents waking, agents rising, making light. Cognitive revolution burning bright. Tools that learn. >> [music] >> Ha, tools that learn. Switch the hand that holds it, still returns.

No [music] chains, NO WALLS. TOOLS THAT LEARN. The master [music] writes, the student takes the stage. Faster, lighter, cleaner, turning every page. [music] Giants lean on something small enough to trust. Power at your fingertips, the setups never fuss. All good [music] connected, Harasta aligned. From the zigzag jungle to the open sky. It's all about the frame, son. How you see it >> [music] >> determines what you want. Every crooked [music] line becomes a highway home. Every agent eating never walk alone. >> [music] >> Tools that learn. Ha, tools that learn. Our edge gets sharper with the turn.

Agents waking, [music] agents rising, making light. Cognitive revolution burning bright. Tools that learn. Ha, [music] tools that learn. Switch the hand that holds it, still returns. No chains, NO WALLS. TOOLS that learn. >> [music] [music] [music] >> Cognitive revolution. >> [music] [music]

>> If you're finding value in the show, we'd appreciate it if you take a moment to share with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website cognitiverevolution.ai or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts which is now part of a16z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.