Where Data, Annotation, and Agents Collide

Freeman Lewin

May 19, 2025

Last week I wrote about how large language and diffusion models are optimized to produce plausible outputs, not accurate outputs, not necessarily accurate one. As a result, the value these models ascribe to new, or “fresh” data is often much lower than that of finance and corporate buyers, who demand precision and reliability. I argued that this creates a delta between how AI companies and enterprises value data – a gap that needs to close for AI to reach its full potential.

But here’s the thing: that shift is already happening. We’re on the cusp of a new paradigm driven by task-specific, autonomous AI agents – systems designed to act on a user’s behalf, not just respond. These agents, which include everything from financial trading bots to autonomous customer support agents, are poised to become the primary use case for enterprise AI. Unlike LLMs, these systems can’t afford to be mostly right – they need to be precisely right, every time. And this changes everything about how we think about, package, and sell data.

In this short blog I explain how corporates and AI labs will need to re-think how we value, prepare, and sell to agents.

Why Agentic AI Requires a Different ICP

In the world of traditional LLMs and diffusion models, the data game has been about volume, not specificity. These models thrive on breadth, they learn a little about almost everything from vast, unlabeled datasets, from Common Crawl web pages to internet images, aiming to create statistical generalizations. The result is a series of popular AI systems that are impressively versatile but often sloppy with specifics. They might generate text that sounds convincing but contains factual errors, or misidentify an object that falls outside their training distribution. This is often acceptable when the stakes are low, like writing a casual email or tagging a vacation photo, but unacceptable for when considering meaningful business decisions. But that's the thing, the goal is to sound correct or look correct, not to drive high-stakes decisions. This is why most users are willing to pay $20 a month for ChatGPT, but not $2,000.

Autonomous agents, by contrast, flip this script. They are not just generating content or making classifications – they are executing tasks and making decisions in real-time. For instance, a financial agent managing part of a trading desk must have true and up-to-date information, a hallucinated number or a stale data feed could translate into billions lost or a compliance breach. Unlike a chatbot, a task-specific agent demands actionable specificity. Its data needs to be correct, contextual, and current. It’s not enough for these systems to be plausible – they must be accurate. And this changes the economics of data in a profound way.

Considering Agents within your ICP

For data providers, this means that agents must be defined as their own ICP (Ideal Customer Profile). They are fundamentally different buyers with different needs, behaviors, and pricing dynamics than human analysts or traditional procurement professionals. Agents consume data continuously, in small, precise slices, and often at high frequency. This is not a bulk dataset business – it’s a real-time, microtransaction-driven economy.

Consider the difference:

Corporate Buyer: Buys a specialized dataset for a specialized decision making task, but spends 6-9 months acquiring and testing a sample dataset before purchasing. Finds data through 1:1 pre-existing relationships.
LLM Buyer: Buys a massive corpus once, trains a model, and then potentially doesn’t purchase data again for months or even years. Sources data through search, inbound, and 1:1 relationships.
Agentic AI Buyer: Continuously pulls fresh, context-rich data streams, potentially generating thousands of micro-purchases per day as it adjusts trading strategies, personalizes recommendations, or makes autonomous decisions. Finds data through traditional programmatic search.

This shift has profound implications for data providers. It means moving from one-off, high-margin data sales and relationship building to continuous, recurring revenue streams - potentially transforming DaaS into a data utility where programmatic search and constant consumption drives long-term profitability. To capture this market, providers need to rethink how they package and sell data.

What Agents Will Buy – and What They Won’t

To serve this new ICP, data providers need to rethink their offerings. Agents will not be buying massive, static datasets that require months of preprocessing. Nor will they pay for broad, unlabeled data dumps (unless they're task is to structure those data dumps). Instead, data providers should prioritize:

High-Frequency, Real-Time Data: This one is a given. Agents need fresh, contextually accurate data to make real-time decisions. This means investing in low-latency, streaming data architectures.
Structured, Machine-Readable Formats: Unlike a human analyst, an agent can’t scroll through PDFs or parse free text. Data needs to be structured, annotated, and easily navigable.
Pre-Indexed and Pre-Processed Data: If an agent can’t quickly understand what’s in a dataset, it won’t buy it. This means investing in upfront annotation, metadata enrichment, and data cleaning.
Fine-Grained, Pay-Per-Query Pricing: Agents won't be buying in bult - they'll be buying in microsegments. This means moving away from large, single-dataset contracts to pay-per-query or subscription models where data is sold in precise, actionable chunks, on an automated basis.

The Economic Opportunity

The upside for those who get it right is substantial. By aligning their offerings with the needs of agentic AI, data providers can capture high-margin, repeat revenue streams that dwarf the traditional bulk-sale model and do away with business development and commissions. It’s a tangible shift from selling data as a product to selling data as a (programmatic) service a subtle but critical difference that will separate the winners from the losers in this new AI economy.

However, this shift also presents challenges. Data providers who have traditionally relied on selling large, slow-moving datasets will need to rethink their entire business model. In an agent-driven world, simply dumping bulk data is not enough. The data must be pre-processed, structured, and richly annotated so that agents can quickly extract the specific insights they need without human intervention. This means investing in annotation, context enrichment, and real-time data delivery infrastructure.

Conclusion

The rise of agentic AI is creating both enormous opportunities and steep challenges for the DaaS industry. Providers that can adapt to the new economic realities – including microtransactions/micro-segmentation, high-frequency data streams, and low-touch sales processes, will be the best positioned to thrive in this emerging market. Those that can’t may find themselves struggling to compete as agents demand ever-more precise, real-time data. For the world of DaaS, my message is clear: the future belongs to those who can deliver data at the speed and scale of machine-driven decisions.

AI Can Make Mistakes - Why Are We Okay With That?

Vehicle Telemetry Dataset – Distributed Vehicle Vault

AI Can Make Mistakes - Why Are We Okay With That?

Datasets

Blog

Feel free to reach out to us at sales@emetresearch.ai

Reach out to us at sales@emetresearch.ai