Two Years Inside an AI Startup

April 3, 2026

In September '24 I spent about a month building out our custom reports feature from scratch. The idea was to let firms upload a template (an investment memo, a past report, whatever format they used internally) and have the system replicate it for them, long-form, populated with real data. Sounds straightforward. It wasn't. At the time, there was no clean native way to get a model to do what we needed it to do, so I built a series of loops through our LLM ops tool to mimic function calling by hand. No built-in tool support. Just logic chains doing what proper function calls would eventually do, strung together carefully enough that the whole thing held. It took about a month of real work, testing, breaking it, adjusting the architecture, testing again. When it finally ran the way it was supposed to, I was proud of it. Not in a performative way. In the way you feel when something hard finally works.

Then two things happened fast. Claude's extended thinking dropped in September. MCP (the Model Context Protocol) released in November. Neither of those things broke what we'd built. The code still ran. But the architecture was already a generation behind the direction everything was moving. The approach that had taken a month to get right was now the long way around a problem that had a much cleaner solution. We rebuilt it. That took three more weeks. Then in January 2026, as the product matured and the requirements got more demanding, we started over from scratch entirely.

Three builds. About two and a half months of cumulative work across different points in time, on the same core feature. And the honest thing to say about that is not that it was painful or frustrating, though it was sometimes both. The honest thing is that it was just the job. That's the pace. If you're not rebuilding something, you're probably not paying close enough attention.

How I got in the door...and stayed in

My title when I joined was Prompt Engineer. I was 21, right out of college, and I want to be clear that at the time that title actually meant something. Models in 2023 and early 2024 were sensitive to how you talked to them in ways that are hard to explain now. The difference between a mediocre prompt and a well-constructed one wasn't marginal. It showed up clearly in the output. Specificity of instruction, how you framed the role, how you structured the context, what you put at the top versus the bottom, how you handled edge cases in the prompt itself. These were real skills. People who were good at it got noticeably better results than people who weren't, and companies were starting to figure that out. That's how I got in the door.

What I didn't expect was how fast that edge would compress. Not disappear. Prompting still matters, and anyone telling you it's completely dead is overcorrecting. But the gap between a good prompt and a bad one has narrowed considerably as models have gotten smarter and more capable of inferring intent. More importantly, the thing that actually moves the needle now isn't the prompt at all. It's everything around it. The data you feed in. How you structure the context window. How you manage memory across a long agentic loop. How your pipeline retrieves the right information at the right moment. The prompt is still there, it's just no longer the lever.

That shift happened slowly and then very fast, which is its own kind of disorienting. Two years in, my actual job looks nothing like my title. I run demos, help with hiring, lead product decisions, and spend a lot of time building (with Claude Code ofc). And my work with prompting has become minimal, primarily focused more on architecture shifts and context management. The title hasn't changed. Honestly, I'm not sure what you'd even update it to. The role changes faster than any job description can keep up with, and I think that gap between title and reality is a pretty accurate summary of what this industry looks like from the inside.

The Prompt Is The Last Ten Percent

The model is not the product. The model is the ingredient. That distinction has gotten more important as the models have gotten better, not less.

When prompting was the edge, at least the skill was visible. You could see it in the output. You could study it, improve it, compare it. What replaced it is harder to see and harder to explain to someone who hasn't built it. It's architecture. Specifically, how you design the system around the model. How you process and index data before it ever reaches the model. How you store metadata so the right context surfaces at the right moment. How you manage memory across a long agentic run so nothing falls apart midway through. Every company building vertical AI has their own answers to those questions. The answers differ a lot more than the demos suggest. And that's where actual differentiation lives now, not in who wrote the cleverer prompt.

Allow me to use finance as an example since I work in that field. In finance alone, the data problem alone is enormous. Financial firms don't just have data. They have decades of proprietary research, internal models, investment frameworks, and institutional views that took years to build. That is their edge. When we build AI for those firms, the architecture has to understand and preserve that edge. How we ingest a firm's documents, how we index and retrieve them, how we make sure the output reflects that firm's specific lens rather than some generic synthesis of the internet. That's the hard work. The prompt is maybe the last ten percent of the problem.

This is why I stopped thinking of my job as learning tools and started thinking of it as tracking the entire field. Papers, releases, new model capabilities, what competitors are shipping, what the labs are hinting at. Not because it's fun to be obsessive about it, though sometimes it is. Moreso to be honest it's exhausting to try and keep up. But, if you let your attention drift for a few months, you can miss a shift that makes your entire approach look like the long way around. I've seen it happen to people.

"The person who is the star of the previous era is often the last one to adapt to change, and tends to fall harder than most."

-- Andy Grove, Only the Paranoid Survive

This space runs on something close to natural selection. If you're not on top of your shit, you fall off. There's no slower version of that.

The Best The Product Has Ever Looked

There's a version of the AI space that exists entirely on X and in launch videos and it has almost nothing to do with what I see from the inside. The demo is almost always the best the product has ever looked. The launch video is marketing. And a lot of the people on your timeline posting about every new model release and telling you this changes everything are either getting paid to say it or just farming engagement. After two years of this I've learned to read past most of it.

Working in this space does something to your ability to read content. After a while you can spot AI-generated takes almost immediately. The phrasing, the cadence, the way certain sentences are structured. I use AI in my own writing process so I'm not making a purity argument here. The issue isn't using AI. It's when there's no actual person behind the output. No real perspective, no specific experience, just an LLM running on vibes. You can feel the absence of a person in it. And right now a lot of what gets shared about AI is exactly that.

Every AI company, including ours, shows you the product at its best. That's rational. But there's a real gap between a controlled demo and what the product does when it meets your actual data, your actual workflow, and your actual edge cases. The only way to know if a tool works for you is to use it yourself, in your real environment, on a real problem. Not a curated sample. Not a video. You. One honest review from someone who integrated a tool into their workflow for a month is worth more than a hundred launch tweets combined. Find those people and try to ignore the rest.

Moore's Law Didn't Show Up

In early 2024 there was a consensus in the industry that AI costs were going to keep dropping. The logic made sense. Token prices had been falling since the first GPT-4 API launched. More competition between labs meant more pressure to lower prices. Moore's Law applied to inference. Commoditization was coming. Everyone said so.

Then reasoning models dropped. Then function calling became standard across the major models. Then agentic workflows started becoming the norm, where instead of a single prompt and response you had a model running a chain of steps, calling tools, checking outputs, looping back. A task that used to take one API call now takes twenty. A response that used to take three seconds now takes two minutes. The token counts exploded. The bills went up.

We traded cheap and fast for expensive and smart. For our customers that tradeoff was easy. Finance firms are not optimizing for cheap. They're optimizing for accurate, thorough, and defensible. They'll wait ten minutes for a report that's right. They'll pay more for it. The quality jump from basic generation to reasoning-enabled agentic output was big enough that it wasn't a hard sell. But it quietly buried something the whole industry had been telling itself.

The expectation was Moore's Law. What we actually got was closer to the opposite. Each new generation of capable models has gotten more expensive to run. Gemini 2.5 to Gemini 3 was more expensive, not less. The reasoning tokens, the extended context, the tool calls across a long agentic run all add up in ways that early 2024 projections didn't account for. The people confidently predicting commoditized AI infrastructure by 2025 were wrong. Costs came down on the simple stuff and went up on the stuff that actually works. People know it but nobody is saying it. And it is impacting margins and the way initial investment and cost structures were built out at these companies.

Standing Still Was Never An Option

People call some ai companies or prodicts wrappers. The implication is that there's no real work involved, that you're reselling someone else's intelligence with a nice UI on top. I understand where it comes from. If all you've seen is the surface, it's a reasonable assumption.

But there's something missing in that assumption.

Take us for example. We're a small team competing against companies with billions in funding, dedicated research divisions, and the ability to ship model improvements that make our existing architecture obsolete overnight. That's not a complaint. It's the environment. Every architectural decision we make has to account for the fact that the ground can shift at any moment. We've rebuilt core features from scratch multiple times, not because we made mistakes, but because the right way to build something in November 2024 was different from the right way in January 2026. Standing still was never an option.

The finance context makes this harder in specific ways. The firms we work with have spent years building their analytical edge. A proprietary framework. A research process. A way of thinking about markets that's specific to them. They're not looking for generic AI output. They need something that understands their context deeply enough to extend it. Building that requires decisions at every layer of the stack. How documents get processed, how metadata gets structured, how retrieval works, how the system maintains a firm's perspective across a long generation run. None of that shows up in a demo. None of it is visible from the outside. You only see it when you put the product next to the alternative and the output is noticeably different.

The firms that use our product chose it over going directly to the underlying models. That comparison happens regularly and the result holds. That doesn't happen because we wrote better prompts. It happens because of two years of decisions at the architecture level, built specifically for this domain, by a team that has been paying close enough attention to rebuild when necessary. There are plenty of products out there with a nicer ui and routing to Claude or GPT, those may be wrappers. But there is a second level, those building real products on top of models with some level of differentiation. I like to call those Orchestrators.

Comfortable Going Obsolete

Kind of off topic but the marketing problem is one I haven't solved. We have a product that holds up in direct comparisons with the best tools in our space, but that doesn't automatically become distribution. Cursor had first-mover advantage in developer tools. Midjourney caught the AI image wave early. When you have that kind of momentum, the product and the growth reinforce each other and it gets hard to displace you. When you don't, when you came later or targeted a smaller market or didn't go viral on launch day, you're in a different fight. A better product doesn't guarantee you win it. I'm working on this right now. I don't have the answer. I think a lot of people building AI products are in the same spot and almost none of them will say so.

But recently I've started to worry that using AI constantly is making me slower. Not slower in output, I produce more than I ever have. Slower in the thinking underneath it. As Alvin Toffler wrote about it in the 70's and its still very relevant today.

The illiterate of the future will not be the person who cannot read. It will be the person who does not know how to learn, unlearn, and relearn."

There's a muscle you build from working through hard problems yourself, from writing something from scratch, from figuring something out without immediately reaching for a tool that can do it faster. I've caught myself leaning on AI for things I used to just do. So I've started deliberately writing things by hand, working through problems without help, not because I think AI is bad but because I don't want to lose the ability to think without it. I don't know how to fully balance that yet. I'm not sure anyone does.

Something I'll write about more in depth soon. But there's actual research on this now. The Paradox of AI Assistance: Better Results, Worse Thinking is worth reading if you use AI daily and have had the same feeling. Memory retention drops. Independent reasoning gets harder. Better output, weaker process. That's the tradeoff nobody in the AI space wants to talk about.

The job isn't mastering a skill. It's being comfortable watching the skill you mastered go obsolete, and building the next thing before it does. That's been true every month since I started. I expect it to stay true for a while.

-- Gabriel