AI • Due Diligence

How to Evaluate AI Claims During Due Diligence

Every startup deck says "AI-powered" now. Most investors and founders cannot tell the difference between genuine machine learning and marketing theatre. Here is how to evaluate AI claims without needing a PhD.

Mike Tempest • 10 min read • 28 March 2026

In 2024, every startup became an "AI company". CRM tools added chatbots and called it "AI-powered sales". Scheduling apps wrapped GPT-4 and claimed "proprietary machine learning". Analytics platforms renamed their dashboards "AI insights". The term has become so diluted that it is now almost meaningless.

For non-technical founders and investors evaluating AI startups, this creates a real problem. You know AI matters. You know it can be transformative. But you also know that most "AI" claims are just marketing spin on top of conventional software. The question is: how do you tell the difference?

This guide gives you practical frameworks for evaluating AI claims during due diligence, without needing to understand neural networks or read academic papers. It is based on lessons from building data-driven products at Risika and RefME, and from helping non-technical founders develop AI strategies that serve their business rather than their pitch deck.

The AI Hype Problem

The problem is not that startups are lying about using AI. Most genuinely believe they are. The problem is that "AI" has become a catch-all term that means everything and therefore nothing.

A startup using if-else rules to route customer support tickets will say "our AI triages support queries". A company wrapping OpenAI's API in a nicer interface will claim "proprietary machine learning models". A business using off-the-shelf sentiment analysis will describe it as "advanced natural language processing".

None of this is technically false. But it is also not what most people think of when they hear "AI company". And the gap between perception and reality matters enormously when you are making investment decisions or strategic bets on technology.

The first step in evaluating AI claims is understanding that there is a spectrum from simple automation to genuinely novel machine learning -- and that different points on this spectrum have wildly different technical requirements, cost structures, and competitive moats.

The AI Maturity Spectrum

Understanding where a startup sits on this spectrum tells you everything about their technology risk, competitive moat, and capital requirements.

Level 1

Rules-Based Automation Marketed as AI

This is hardcoded logic dressed up in AI language. "Our AI detects fraud" actually means "we flag transactions over £10,000 from new accounts". "AI-powered recommendations" means "we show you products from the same category you just viewed".

Is this bad? Not necessarily. Rules-based systems can be highly effective, cheap to run, and easy to explain to regulators. But they are not AI, and they should not command AI valuations or be sold as defensible technical moats.

How to spot it: Ask how the system was trained. If the answer involves "business logic" or "expert rules", it is not machine learning. That is fine -- just price it accordingly.

Level 2

Off-the-Shelf LLM API Wrappers

The company sends user input to OpenAI, Anthropic, or Google, gets a response back, and presents it in their interface. This is by far the most common form of "AI" in 2024-2026 startups.

Is this bad? Absolutely not. Some of the best software businesses are built on top of foundational models. The value is in distribution, workflow design, domain expertise, or proprietary data -- not in building models from scratch.

How to spot it: Ask what happens if OpenAI raises prices or changes their API. If the answer is "we would be in trouble", you are looking at an API wrapper. Again, this is not inherently bad -- but the moat is not the AI.

Level 3

Fine-Tuned or Customised Models

The company takes an existing model (GPT-4, Llama, Claude, etc.) and fine-tunes it on their own data to improve performance for their specific use case. This requires real ML engineering skill, proprietary training data, and ongoing iteration.

Is this good? Often, yes. Fine-tuning can create genuine competitive advantage if the training data is unique and valuable. It shows the team understands machine learning beyond just calling APIs.

How to spot it: Ask about their training data pipeline, evaluation metrics, and fine-tuning process. Teams doing this properly will have detailed answers about data quality, labelling workflows, and how they measure improvement over the base model.

Level 4

Proprietary Trained Models

The company has built and trained their own models from scratch or heavily modified open-source architectures for their specific problem. This requires serious ML talent, significant compute resources, and unique training data.

Is this better? Sometimes. For certain problems -- especially narrow, domain-specific tasks -- a custom model can outperform general-purpose LLMs whilst being cheaper to run. But it is also expensive, risky, and often unnecessary.

How to spot it: Look for published research, patents, or deep technical expertise on the team. Ask about their compute costs and training infrastructure. Real model development shows up in the AWS bill and the hiring plan.

5 Questions That Expose AI Theatre

These questions work in pitch meetings, due diligence calls, and technical reviews. The answers (or lack thereof) tell you everything you need to know.

What happens if you remove the AI?

This is the single most revealing question. If the product still works without the AI component, then AI is a feature, not the foundation. That is perfectly fine -- many successful companies have AI features. But it changes the investment thesis dramatically.

Good answer: "The product would not work at all. We are automating a task that requires understanding unstructured data at scale, which is impossible without machine learning."

Bad answer: "It would still work, but users would have to do more manual work." Translation: AI is a nice-to-have productivity boost, not a fundamental capability.

What is your training data and where does it come from?

Machine learning is only as good as the data it learns from. If a startup claims proprietary AI but sources training data from public datasets everyone else uses, they have no moat. If they cannot articulate their data sources clearly, they probably do not understand their own system.

Good answer: "We have 3 years of proprietary transaction data from our customers, labelled by domain experts. We also augment this with licensed datasets from [credible source] and continuously improve it through user feedback."

Bad answer: "We trained it on publicly available data" or worse, "I would need to check with our engineer." If the founder or product lead does not know where the training data comes from, that is a red flag.

What does your model get wrong and how often?

Every machine learning model makes mistakes. Teams that understand their AI can tell you exactly what types of errors it makes, how often, and what they are doing about it. Teams that are just wrapping someone else's model cannot.

Good answer: "Our model currently has 94% accuracy on test data, but it struggles with ambiguous cases where even humans disagree. We are working on a human-in-the-loop system for edge cases and tracking error types in our production monitoring."

Bad answer: "It is extremely accurate" or "we have not really measured that yet." No serious ML team lacks error metrics. If they do not know their failure modes, they do not understand their system.

Can you show me the AI working on data it has never seen?

Demo fraud is rampant in AI startups. The demo works beautifully on cherry-picked examples the founders have practised 100 times. But real AI should generalise to new, unseen data. Ask to test it live with examples you provide.

Good answer: "Absolutely. Give me any [document/image/dataset] and I will run it through the system right now." Followed by a live demo that might not be perfect, but shows the system working on genuinely new input.

Bad answer: Nervousness, excuses about needing to prepare data first, or demos that only work on pre-loaded examples. If they will not test it live, assume the AI is less robust than claimed.

What is your AI cost per transaction?

AI is not free. Running inference on large language models, especially at scale, costs real money. A startup that does not track AI costs per transaction either has trivial usage or does not understand their own unit economics.

Good answer: "Currently about 8p per query using GPT-4. We are testing cheaper models for simple cases and fine-tuning to reduce costs as we scale. Our target is under 3p by end of year."

Bad answer: "We have not really calculated that" or "it is negligible." If they scale to millions of users and AI costs are genuinely negligible, they are probably not doing anything computationally interesting. If costs are not negligible and they do not know the number, they are flying blind.

When "Just an API Wrapper" Is Actually Fine

There is a dismissive phrase in tech circles: "It is just an API wrapper." As if building on top of OpenAI or Anthropic is inherently less valuable than training your own models. This is wrong.

Some of the most successful software companies in history were built on top of other people's infrastructure. Stripe is "just an API wrapper" around payment networks. Twilio is "just an API wrapper" around telecom carriers. Shopify is "just an API wrapper" around payment processors and logistics providers.

The question is not whether you are using someone else's AI. The question is whether your value proposition depends on having proprietary models or whether it comes from somewhere else -- distribution, workflow design, domain expertise, network effects, or proprietary data.

When LLM Wrappers Make Sense

+ Your moat is distribution, not technology
+ You have unique workflow or UX insight
+ Your data is proprietary and valuable
+ You are solving a business problem, not a research problem
+ Your valuation reflects software economics, not AI research

When It Is a Problem

- You claim proprietary AI as your moat
- Your pitch deck emphasises technical innovation
- You are raising at AI company valuations
- Anyone can replicate your product in a weekend
- You have no plan if API costs rise or access is restricted

The bottom line: building on top of LLMs is a perfectly valid strategy. But be honest about where your value comes from, price the business accordingly, and make sure your pitch matches the reality of your technology stack.

Red Flags in AI Due Diligence

No ML Engineer on Staff

If a company claims AI is core to their product but has no one on the team with machine learning expertise, that is a red flag. You cannot build serious ML capabilities by outsourcing to an agency or hoping a backend developer will figure it out.

"Proprietary AI" With No Papers, Patents, or Unique Data

Real machine learning innovation shows up somewhere: published research, filed patents, unique training datasets, or deep technical expertise. If a company claims proprietary AI but has none of these, they are either overstating their capabilities or do not understand what makes AI defensible.

Demo Only Works on Cherry-Picked Examples

If the founders will not test their AI on examples you provide, assume it is more brittle than they claim. Real AI should generalise to new data. Demos that only work on pre-practised inputs are a massive red flag.

AI Costs Not Tracked or Understood

Running AI at scale costs money. If the team cannot tell you their cost per inference, their infrastructure spend, or how costs scale with usage, they do not understand their own economics. This becomes catastrophic when they hit real volume.

Claims That Violate Basic Technical Constraints

"Our model is 99.9% accurate." At what? On what data? Measured how? Accuracy without context is meaningless. Be wary of startups making grand claims about performance without being able to explain the methodology, test set, or error characteristics.

What Good AI Looks Like at Seed Stage

Good AI at early stage is not about perfection. It is about honest, thoughtful approaches to hard problems. Here is what to look for:

Realistic About Limitations

The team can articulate what their AI cannot do as clearly as what it can. They know their error rates, failure modes, and edge cases. They do not oversell.

Honest About the Approach

If they are using OpenAI's API, they say so. If they are fine-tuning an open-source model, they explain which one and why. No hand-waving about "proprietary algorithms" when it is just GPT-4 with a nice wrapper.

Clear Data Strategy

They know where their training data comes from, how it is labelled, and how they plan to improve it over time. Data is the moat in AI businesses -- good teams treat it accordingly.

Cost-Aware

They track AI costs, understand unit economics, and have a plan for how costs scale with usage. They are thinking about efficiency, not just capability.

Human-in-the-Loop Where Appropriate

They understand that for high-stakes decisions, AI should augment humans, not replace them. They have thoughtful approaches to when automation is appropriate and when human judgement is required.

These characteristics signal a team that understands AI as a tool for solving real problems, not as a buzzword for fundraising. That is the difference between AI theatre and AI strategy.

The Bottom Line

Evaluating AI claims during due diligence does not require a PhD in machine learning. It requires knowing the right questions to ask, understanding where value actually comes from, and being able to distinguish between genuine technical capability and marketing spin.

Most "AI companies" are actually software companies that use AI as a feature. That is perfectly fine -- many excellent businesses fit this description. But the valuation, risk profile, and capital requirements of an AI-feature company are fundamentally different from those of an AI-first company.

The five questions in this guide -- what happens if you remove the AI, where does your training data come from, what does your model get wrong, can you demo on unseen data, and what are your AI costs -- will expose more than any amount of hand-waving about "proprietary algorithms" or "cutting-edge machine learning".

Good AI teams are honest about their approach, realistic about limitations, and can articulate their moat clearly. Everyone else is hoping you will not ask the hard questions.

If you are evaluating AI startups and want an independent technical assessment, a Fractional CPTO can help separate the signal from the noise. For more on technical due diligence and AI strategy, see the related articles below.

AIEngineeringStrategy

AI Agent Architecture: What Non-Technical Founders Need to Know Before Building

3 April 202611 min read

Need help evaluating AI claims?

Get an independent technical assessment from a Fractional CPTO who has built data-driven products at scale. Honest evaluation, no jargon, practical recommendations.

Book a Free Day Get in Touch

Frequently Asked Questions

How can I tell if a startup is using real AI or just marketing hype?

Ask specific questions about training data, model accuracy, and what happens if you remove the AI component. Real AI teams can explain their data sources, quote error rates, and describe failure modes. Marketing AI cannot. Also check if they have actual ML talent on staff or if they are just using off-the-shelf APIs with no customisation.

Is it a bad sign if a startup is just using an LLM API wrapper?

Not necessarily. Many excellent businesses are built on top of OpenAI, Anthropic, or Google's APIs. The question is not whether they built the model themselves, but whether their value proposition depends on having a proprietary model. An LLM wrapper with brilliant distribution, unique data, or workflow innovation can be a fantastic business -- as long as the valuation reflects a software company, not a machine learning IP play.

What AI questions should I ask during technical due diligence?

Focus on five areas: What happens if you remove the AI? Where does your training data come from? What is your model's accuracy and how do you measure it? Can you show the AI working on data it has never seen? What is your AI cost per transaction? These questions expose whether the team truly understands their AI or if it is just a black box they hope works.

How much AI expertise should a seed-stage startup have?

It depends entirely on their value proposition. If AI is core to the product, they need at least one person who understands machine learning deeply -- either a founder or an early hire. If AI is a feature but not the moat, using off-the-shelf models with good engineering practices is perfectly reasonable. The red flag is claiming AI is your competitive advantage whilst having zero ML talent.

What are the biggest red flags in AI due diligence?

No ML engineer on staff whilst claiming proprietary AI. Demo only works on cherry-picked examples. AI costs not tracked or understood. Claims that violate basic technical constraints like 'our model is 99.9% accurate' with no context about what that means. And the classic: 'we use AI' without being able to articulate what problem the AI solves that simpler approaches could not.

Mike Tempest

Fractional CPTO

Mike builds data-driven products and helps non-technical founders evaluate AI claims. As CTO at Risika, he scaled fintech infrastructure processing millions of transactions. At RefME, he grew the platform from 0 to 2M users as Head of Engineering. He now provides fractional CPTO services to UK startups, bringing practical AI strategy without the hype.

Learn more about Mike