This post argues that a variety of markets across the AI supply chain -- including the markets where individual consumers or enterprise customers pick between AI products (making decisions such as "ChatGPT or Claude?"), the markets where AI developers try to acquire data for training or retrieval (making decisions such as "should I pay for access to Reddit as training data?" or "should I license Reuters content?"), the markets where regulators or insurers try to commission AI audits, and the very broad set of markets in which people consume all sorts of AI-assisted artifacts (from art to code) -- all stand to benefit from a unified approach to "attestation objects" that capture provenance regarding who contributed to building and evaluating the upstream AI models. Broadly, an attestation object is a verifiable record about who contributed what, in what role, to which model or output, under what validation process.

There is already interest and progress in provenance and attestation within three distinct spaces: the datasheet, model card, and data provenance space; the Frontier AI Auditing space; and the content authenticity space. I think if we get the design of attestation objects right, we can use interoperable schemas across the AI lifecycle and massively improve visibility and traceability across training, evaluation, and usage.

I am especially excited about this idea because I think it can enable progress in one part of the AI supply chain to "spread" and bootstrap progress in other areas. For instance, existing momentum around audits could foster more training data transparency (and better markets for training and retrieval data). Or, parties interested in attestation and provenance for AI-generated outputs (e.g. assessing AI-generated code) could help to promote transparency upstream. Additionally, if communities adopt norms or expectations around transparent AI usage (e.g., an expectation that people using AI to work on open source software also open source their AI transcripts), this may drive further demand for transparency further up the supply chain, and create pressure to support attestations.

Here's a short version of the key claims:

  • Users buying AI products, AI developers buying data, anyone commissioning audits, and anyone consuming AI-generated outputs face a related set of issues stemming from information asymmetry and lack of information generally.

  • The auditing and evaluation space has significant momentum; at present, there is an active ecosystem of third parties providing attested evaluations of frontier models. There are a number of major challenges in this space, but reasons for optimism.

  • For users looking to pay for AI products, there remain major challenges in doing DIY benchmarking of models. It's also challenging for consumers to interpret benchmarks and evaluations. There are currently few opportunities for consumers to incorporate facts about model inputs into their decision-making.

  • For developers looking to pay for data in AI markets, there are core market design challenges that stem from the non-rivalry and non-excludability of information alongside practical challenges with fine-grained assessment of data value. Markets for data are fragmented, opaque, and inefficient at the moment.

  • Each stage of the AI supply chain could, in theory, involve attestations from key actors. Data creators can attest that they included their data in training sets. Auditors can attest to their participation in evaluation processes. Model outputs can include attestation objects that help people verify that a given output really came from a certain model. And downstream artifacts can bundle all these attestations together via flow-down.

If the schema for attestation and provenance can be synchronized across the supply chain, we can get a win-win-win-win.

  • For data creators and buyers, we might establish healthier markets for training data that emphasize attestation objects. This can help avoid likely market failure from non-rivalry and non-excludability in markets mainly focused on exchanging "raw tokens." Put another way: if we can solve the collective action problem of making training data attestations salient in consumer markets for AI products, either through consumer pressure or regulatory action, this could create markets for data that are much less reliant on continuous use of "data protection" technologies. This would be much more harmonious with open data and data commons practices (allowing for e.g. data that is open for research use but requires attestation for commercial use).

  • For auditors, we can help increase demand for high quality audits and create standardization around benchmarking and evaluation.

  • For labs, we can make it easier to sell model outputs without being undercut by distilled models.

  • For people consuming content, we can provide means to verify content authenticity. More generally, people can use all of this upstream information to help them assess that AI-generated content of all types.

While much of my previous writing has focused on markets and ecosystems for training data, I'm excited about all of these angles. (Of course this is the optimistic view -- see the end of this post for a more negative set of "likely objections".)

The longer version of the argument proceeds in four steps. Below, I'll first describe roughly how I think buyers are choosing AI today and why the current approach is incomplete. Second, I briefly summarize the state of play for upstream markets for training data and evaluation labor. Third, I'll describe interoperable attestation objects (that are very viable today using a combination of approaches from across the provenance/auditing/licensing space -- basically this is not far from what the C2PA specs already enable) that can link contributions, evaluations, and outputs. Finally, I'll try to tie the whole thing together by arguing that this kind of attestation layer can improve incentives for buyers, labs, and knowledge workers. I want to motivate connecting these ideas and state explicitly how these existing approaches can be treated as parts of a common AI trust stack, and to argue that connecting them has underappreciated consequences for market design, labor visibility, and downstream product choice.

Because this post also reiterates some points from previous data leverage posts and various external references, I'll add an appendix at the end briefly noting some of these connections.

I'll note that, in addition to momentum around auditing, we're starting to see some evidence that consumers might care about attestation. We're getting closer to a world in which we can shop for an AI product to use for health-related purposes and figure out "exactly how many doctors contributed to training data and exactly how many doctors checked the outputs carefully." See, for instance, the "Introducing ChatGPT Health" blog post, which emphasizes working with "260 physicians who have practiced in 60 countries and dozens of specialties" to obtain "feedback on model outputs over 600,000 times across 30 areas of focus".

Or put another way, this is yet another blog making the case for data transparency and healthy market flow as a coalition-building cause that can unite auditors, content creators, AI labs, and the general public around a mutually beneficial set of incentives.

Core assumptions behind the proposal

Above are the core claims in this post, which really stem from a core set of assumptions:

Ok, with our summarized version of the post out of the way, here's the longer argument:

I. The market problem: buyers of AI products can see outputs, but not the chain behind them

Imagine someone who is choosing from a menu of AI products. Perhaps this person is a consumer who just wants to use AI for small personal projects ("help me build personalized note-taking software"), a researcher seeking some occasional help with research tasks ("help me debug this LaTeX issue"), a hobbyist software developer seeking to find a tool to use heavily, somebody seeking a rigorously tested AI system that can offer medical advice, or a company's CTO looking to make an enterprise contract for org-wide AI access.

Across all these contexts and scales, a person buying an AI product is mainly buying "information outputs" such as answers to questions, documents that fulfill some requested purpose, or pieces of code that can complete certain tasks. A user buys an AI subscription or AI credits, and now can enter an input (a prompt/query) and get an output. Modern frontier models can provide a dazzlingly varied set of outputs: a single tool can first give you something that looks like a search engine results page, followed by a complex working piece of code, a working spreadsheet, and then finally a poem and an accompanying piece of visual art. Some systems are even agentic and can produce outputs that are themselves actionable (e.g., the output is a series of actions, and then the system performs some "actuation," often using the user's computer and/or API keys).

Generally (and we'll get more precise about this later in the article), we choose to spend money (perhaps a lot of money!) on AI if we think those outputs will be "good" in some sense. We might think the AI is good in a statistical sense because we inspected the outputs, or because we read an evaluation study with robust evaluation of many outputs, or because we make some assumptions about how an AI was developed that make it "good." However, users could also consider facts about how the model was built (such as information about the training data).

If you are using AI to make personalized note-taking software, facts about the training data may not be very high stakes. However, if we want to buy an AI product to ask medical questions, we might care deeply about both AI outputs and inputs to said AI. We most likely want to know that a doctor has looked at some of the model's outputs and attested they are "good". We might also want to know that there were some inputs from actual doctors, e.g., content from medical textbooks and research papers used in the development of the original model. Finally, when we get a medically relevant output from an AI system -- e.g. the system we paid to access has printed out some text telling us to take a medicine or take some action -- we ideally want that output to be verifiably linked to information about the evaluation of the model, and ideally the training data as well.

II. How AI products are chosen today: four channels of evidence

If we look at the key factors that might cause someone shopping for AI products today to pick one product over another, I think we can bucket these into four groups. Put another way, most "AI buyer choices" can be traced to evidence or beliefs from one of four channels of information:

  • the "outputs" channel. How good are the actual information outputs from an AI system?

  • the "inputs" channel. How were specific models used in an AI system trained and how was the overall system developed?

  • the "product" channel. Covers factors related to the delivery of the information outputs: pricing, uptime, latency, UX, privacy/compliance, support, lock-in, etc.

  • the "internals" channel. Facts about model machinery, the channel we're furthest from being able to make practical use of, but potentially a very valuable channel.

The product channel matters a lot if you're buying an AI product today. However, it's the channel I'll discuss the least in this post, in part because the focus here is on data-related differentiation but also because generally getting information about product factors currently works pretty well and thus enables relatively healthy market dynamics. We also won't talk much about the internals channel, e.g. making decisions based on the results of interpretability-based studies, though I think this will be increasingly important to think about going forward. Instead, this essay focuses on the missing value of the inputs channel, and on how attestations can connect inputs to outputs.

At present, I think most AI users are choosing their AI primarily based on beliefs about model outputs along with some general assumptions about model inputs. The beliefs about model outputs might range from general notions like "my friends told me Claude Code outputs good code" to "I read all the recent model cards" to "I carefully audit all my AI outputs every week and compare to a personal, secret, hand-crafted evaluation set".

The assumptions people make about model inputs are, at present, along the lines of "Let's assume the frontier labs sourced good data, had a very capable set of people inspect and filter that data, used best-in-class training practices, etc." (Conversely, there is also a certain type of anti-AI sentiment that is driven primarily by inputs-based beliefs, e.g. a focus on stolen data, more than outputs-based beliefs about the quality of the model).

Of course, it's very reasonable to primarily make decisions based on the properties of model outputs. Outputs matter most for most use-cases. If I get good code or correct outputs, I may not care how the model was trained or how it works internally. And if I really just care about the outputs, information about model training is mostly useful as a proxy for output quality.

One reason that I'm interested in highlighting the difference between these channels is because I think this framing can improve AI literacy in a way that's useful for both AI developers and AI consumers. Specifically, I think the current consumer market can be improved by more clearly signaling the distinct facts about outputs and inputs that might drive consumer behavior. This will help consumers make decisions and AI developers differentiate their products.

III. Why outputs alone may not always be enough

The contrast between note-taking software and medical AI helps show where outputs can dominate and where inputs start to matter much more.

How do we know if the information outputs are good? Let's return to our first example: imagine a user who is looking to buy an AI product to help write personal note-taking software (and more generally play around with AI-coded personal software). This user probably wants to send a prompt to an AI model (either via a web browser or local coding agent harness) that looks like this: "write me the code for software I can use for daily note-taking" (with a bit more detail regarding their personal preferences for how the software should work) and then we'll get an output.

Say this user can do a "trial run" and send this prompt to three different AI providers, A, B, and C. How will this user decide which provider product to buy?

Option 1: inspect outputs yourself

One immediately available option would be to look at the output from each model and assess the quality of that output. If the user is already an expert in this domain, perhaps they can just read the code produced by all three models and identify an obvious "best" model. Of course, if they wanted to have some statistical rigor in evaluation they'd probably want to look at more than one sample per model, though this will create some budgeting challenges and basically means the user needs to conduct their own benchmarking study. For a fuller treatment of the topic of looking at model outputs statistically, see e.g. promptstats from Ian Arawjo.

Option 2: rely on published evaluations

Another option that also uses the outputs channel would be to look at the results of an already published benchmarking study, either from the model provider or from a third party. In this case, the user would basically be making the assumption that "other people tried similar inputs, and performance on those related tasks will probably proxy for performance on the task they care about." Here the rigor of the benchmark matters: how many total variants were tried, who checked that the output worked, etc.

When AI developers themselves or third party evaluation organizations produce a benchmarking study, the results of these studies will impact the organizations' reputation, so generally we should expect these kinds of studies to be reasonably consistent in terms of providing real signal. Benchmarking will always be noisy, but over time, there are incentives to evaluate models in ways that accurately reflect differences in capabilities. In other words, this approach should be pretty decent, and if we only ever have access to these kinds of signals for consumer decision-making, we could sustain a reasonably efficient market for AI products.

Option 3: rely on vibes and reputation

A third option after "roll your own evals" or "read the best recent evaluation report" is to simply seek out a very general, vibes-based ranking for the "most intelligent" model or the "best" AI developer. I think this is what many people are doing in practice right now, though there is certainly a population of power users who diligently run their own personal benchmarking processes each time a new model releases.

This approach is also not necessarily that ineffective. A vibes-based approach likely does integrate reputation and social proof in a way that matches other kinds of consumer decision-making and creates decent outcomes. Not everybody needs to read the auditing deep cuts or full review history for every product they buy.

What's interesting is that using vibes to generally rank AI companies actually starts to mix information from the outputs channel and the inputs channel, because people are making some assumptions about training data. I think most people choosing with vibes are indeed assuming that AI developers themselves and third parties have evaluated the outputs of the model and found those outputs to be good, and perhaps that AI developers have special internal-only evaluation practices and employees that are especially skilled at AI evaluation. But this approach also involves some assumptions about training practices in frontier labs. We might be assuming the labs have tried all sorts of training approaches, have sourced the best data they can get their hands on, and have a large team doing whatever they can to make models better. Critically, vibes-based decision-making does involve assumptions about model inputs, and is not entirely inputs-agnostic.

Why this still leaves a gap

However, this is also where we can start to envision some room for improvement in the AI consumer experience. If consumers had more specific knowledge about the inputs to an AI product, both broad facts about the training data and facts about the evaluation process, this could be useful to them. Even as underlying training techniques enable models to become more general and more broadly intelligent, it seems likely that the presence of certain types of data, data about certain topics, and data from certain experts in model inputs will proxy for output quality.

In the medical case, information about the outputs might be more important. If sketchy training data leads to outputs that top doctors give a thumbs up to, we might take that. And overemphasis on training data may create metric chasing, e.g. including social media posts from medical doctors in pre-training data to drive up the volume of tokens from MDs. But input information should still provide some signal. Either we want real medical data in the training set or we want solid evidence our AI system has figured out medicine from first principles.

Output studies are also a proxy for future output quality, though more rigorous studies give us more confidence. Input details are also just useful as proxies. However, barring a massive paradigm shift, this kind of evidence is always the best we're going to get for tools that use statistical learning. Furthermore, a point I think remains under-emphasized is that as we make models more general, the cost to achieve full evaluation coverage of models with incredibly diverse outputs becomes prohibitively expensive.

Many people have commented on issues with evaluating AI systems; see, for example, the "Evaleval" project and its "Every Eval Ever" effort, Raji et al. on "everything in the whole wide world benchmarking", and recent online discussions in which people contrast their personal experience with a model and that model's benchmark results.

The labor required to really, fully, comprehensively evaluate increasingly general technologies will be very costly. We're going to need a whole lot of doctors and scientists to give feedback on a whole lot of LLM outputs to be sure we can use LLMs effectively in those contexts. But we're also going to need a whole lot of general users, with distinct combinations of niche interests, needs, preferences, etc., to give feedback as well. All this feedback will be relevant to evaluation of current models, but also relevant to the training of new models, the design of reinforcement learning environments, etc. It's going to take time to do all this evaluation, and will in some sense involve a kind of commandeering of the entire knowledge economy.

Even napkin math gets big quickly. Suppose we want rigorous evaluation coverage across 30 major medical specialties, with 1,000 representative cases per specialty, 10 independent physician reviews per case, and 30 minutes per review. That is 300,000 physician judgments, or 150,000 physician-hours, for one pass over one model version; at $200/hour for specialist time, that is about $30 million just in doctor-review labor. And that is before benchmark design, adjudication, compliance, or program management.

It's worth thinking about how many hours of physician labor you want upstream of a chatbot prescribing your medicines!

IV. What current "upstream markets" look like: deals, marketplaces, and labor

Before an AI product is brought to market for consumers to consider, developers must also participate in two important upstream markets: acquiring training data and acquiring evaluations. Some companies may get their training and evaluation data without direct outside assistance, e.g. just scrape the training data and use in-house employees to do evals, but I expect as the industry matures both of these domains will become more market-like, with a broader set of data sellers at the training level and evaluation providers to choose from. Note here I'll sometimes use "data" a bit loosely to cover both training inputs and evaluation labor, since both are forms of human knowledge work that developers are trying to source.

There is already an informal market for evaluators. It is not yet as legible or standardized as something like AWS Marketplace, but frontier labs already hire or contract with domain experts, benchmark designers, red teamers, and specialized third-party evaluators. Organizations like METR, along with similar evaluation and auditing groups, are early examples of a more specialized evaluation layer emerging around frontier AI systems.

Now, let's step away from the perspective of someone picking between AI products and instead consider a developer who wants data to build their AI product, or a seller, either an individual or an organization, looking to monetize their content.

Three current market forms

I'll bucket the ways we can buy training data and evaluation labor right now into three broad categories. These are ideal types rather than perfectly separate boxes, but they capture a lot of the current landscape:

  • Big licensing deals. These are bespoke boardroom transactions: a lab or platform negotiates directly with a publisher, forum, data broker, or other large rights-holder for access to a corpus or feed. The deal terms are highly customized and usually cover things like scope of use, refresh cadence, exclusivity, audit rights, indemnity, and payment over time. An example would be Google reportedly paying Reddit roughly $60M/year for access to Reddit data, with Reddit separately disclosing in its S-1 that its January 2024 data licensing arrangements had an aggregate contract value of $203M.

  • Licensing via marketplace. This is more like online shopping for large datasets. Instead of a bespoke negotiation, the seller lists a relatively standardized product with a posted description, delivery format, usage terms, and price, and the buyer can compare options much more quickly. An example would be going to AWS Marketplace to buy 12 months of "Anonymized, non-aggregated granular consumer-level data across all asset classes" from Equifax for $175k. Economically, the appeal here is lower transaction cost and easier comparison across sellers, though the quality and rights picture may still require substantial diligence.

  • Pay individual people, per task or per hour, for data outputs. This is the crowdwork or expert-labor bucket. Here the buyer is not primarily acquiring a pre-existing corpus; they are paying people to generate judgments, labels, rankings, reviews, or other evaluative outputs under a specified protocol. An example might be paying somebody via Prolific at at least $8/hr, and typically more like $12/hr to label web data, or paying domain experts to peer review model outputs. This category matters especially for evaluation, where the scarce input is often expert attention rather than a static archive.

One nearby fourth form, if one wanted to break the taxonomy out further, would be recurring API or feed access rather than transfer of a bulk dataset: paying for ongoing, metered access to a live corpus or stream. I think that form mostly collapses back into Categories 1 and 2 depending on how standardized the arrangement is, so I am leaving it as an adjacent case rather than promoting it to a main bucket.

See also the data deals tracker here from the authors of "A Sustainable AI Economy Needs Data Deals That Work for Generators".

Collectives and intermediaries

Another idea that's gained some popularity over the last decade is the notion of joining a data cooperative, collective, union, or intermediary. Generally, the idea here is that you join the coop, contribute data, and some actor in the coop transacts in a market on your behalf. In fact, there are some ways to join a data cooperative today. Organizations such as Swash and Brave Rewards monetize various online actions and tasks. The user bases of data-coop-style tools are very small compared to general Internet usage. For reference, Brave reported 101M monthly active users as of September 30, 2025, while StatCounter has recently put Brave at roughly 1% of worldwide desktop browser share. Some of the means of monetization, e.g. watching ads, are also quite different from an aspirational vision of data markets in which data sellers are spending their time crafting valuable documents and records and then being rewarded for their contributions to shared epistemics. I do really believe that a useful North Star is trying to shoot for a world in which more overall human work looks like the good parts of journalism, scientific peer review, editing Wikipedia, and participating in Q&A, with the bad parts minimized or automated away.

While the above data cooperatives involve a technical approach to enabling cooperation, and participation remains niche, I actually think there is a lot of low-hanging fruit for applying the collectives/coops/unions idea to the three types of markets described above.

First, for the big boardroom deals, we could imagine pools of users directly weighing in on the deal conditions. For example, when it comes time for Reddit to renegotiate with Google or others, they loop users in, either voluntarily or under threat of collective action. Second, data coops can post their own quality-assessed CSV file for sale on AWS Data Marketplace right now, but they still need social and technical support to actually organize in the first place and produce a good CSV. And there is a large body of work, including ongoing efforts, that aim to support collective bargaining for data workers, see e.g. the body of work from Dr. Saiph Savage and Data Workers' Inquiry from DAIR.

One consideration is that successful organization of data workers may actually involve a shift from crowdwork markets to collectives that sell bundled data, i.e. moving more overall information from Category 3 to Category 2, or even 1.

Data protection as a precondition for participation

There is also an emerging set of actors interested in supporting various forms of what we might term "data protection for the AI age". Cloudflare's public support of anti-scraping is one example.

In general, supporting data protection action can serve to make it more likely that data sellers will engage with markets, rather than give up because everything will be scraped anyway. In other words, some actor, either private or public, needs to do some level of policing for a market to emerge. This is a bitter pill to swallow from an open culture perspective, and open culture is foundational to the success of modern ML, including both peer production of critical training data and the production of actual open-source implementations of algorithms, optimizers, operational code, etc. But we can likely find a happy balance in which for many domains of data, open culture can continue, while other domains do become financialized.

Put another way: the status quo around scraping and distillation means that there are a huge number of potential sellers who won't even try to participate in the seller side of the market, as an individual or as a collective, because they assume the value they're going to receive is zero. Any actions that move us away from the current situation are net good in the short term, though we should be mildly cautious about some pendulum-swings-too-far data cartel situation in which nobody can use any information without paying.

In the short term, better data protection will likely increase market participation. However, data protection is, in some sense, a never-ending battle because of the nature of information. This is why attestation is so appealing; if attestations are good enough that people use them to make their AI purchasing decisions, the incentive to distill or "steal" data is reduced because a rival can copy outputs more easily than it can copy a verified provenance relationship. More on this in the next section.

V. Why attestation improves incentives: why buyers, labs, and workers should care

We need to solve two fundamental constraints around value measurement and value transfer in the context of AI:

  • the economic properties of information, non-rivalry and non-excludability, make it very hard to create efficient markets for information. However, by creating incentives to transact over attestations, which can be made rival and/or excludable, we can try to have our cake and eat it too: widely distribute information via AI as a public good while retaining the benefits of markets, including decentralized computation made possible by prices associated with different kinds of attestation

  • the value of data, as defined by any kind of data counterfactual estimation method ("how does accuracy change if unit or bundle of data is removed from training"), is generally small in raw magnitude unless data is grouped together as some kind of collective

We could somewhat address these challenges by just trying to coarsely distribute value to data creators: just worry less about provenance and distribute resources through something like a data-dividend public wealth fund. I think that may be a useful transitional approach in the short term, but likely not the ideal end state.

Why AI buyers care

For buyers, the benefit is straightforward: inputs become inspectable product features. Instead of relying only on benchmarks, reputation, or vibes, a buyer could ask more specific questions: How many credentialed authors contributed to training data? What process verified their work? What upstream sources were actually licensed? In higher-stakes domains, that still does not replace output inspection, but it gives downstream choice much more structure.

If those attestations become known to be predictive -- e.g. the AI model with stronger attestations around its training data is actually better at giving health advice -- they also reduce some of the appeal of indiscriminate scraping or distillation. A rival might still imitate outputs, but the attested relationship itself becomes part of what the buyer is purchasing.

Why contributors and collectives care

For creators, evaluators, and data workers, the important shift is from selling anonymous tokens to participating in visible, renewable labor relationships. Attestations for training and evaluation can be managed and aggregated by some kind of intermediary, collective, or union so people do not need to transact individually with AI companies.

If verified attestations become important to consumers or regulators, that would immediately bring more data work out into the open. It would make it easier to treat these jobs as real careers and harder to treat data workers as invisible or precarious labor. Note that while I've been using doctors as a convenient running example -- a relatively small group with easy credentials to verify and existing social infrastructure for collective action -- the same logic applies to many other kinds of knowledge work.

The economics of maintaining a trusted relationship are also very different from the economics of acquiring tokens. The actual bundles of information in training data are and will remain mostly non-rival, and often hard to exclude. But contracts with people or collectives are not just about buying rights to transfer bits. These are contracts for labor, which must be renewed and maintained. A world of data-with-attestations is therefore a world with more ongoing relationships and more room for bargaining.

Why labs and regulators care

For labs, credible attestation could become a meaningful source of differentiation, though adopting any particular standard is a collective action problem. This is where auditing, evaluation, and safety cases begin to connect to regulation, insurance, procurement rules, and industry self-governance.

The building blocks for this market already exist in the upstream markets described above. The main difference with attestations is that the fact a person made or evaluated some portion of the data would no longer be hidden, but rather made prominent. If an AI builder who uses attested or full-consent data early on is able to produce good models and prove that buyers care about those records, that creates pressure for other AI developers to play along as good-faith actors in the market.

(An immediate downside for some labs is that data and evaluation may cost more).

A version of an attestation-based provenance system that gets things really right could be good for AI developers, data creators who want to add knowledge via pretraining, content creators who want to be rewarded when their content is retrieved, and people who want to use AI to produce things and sell them, e.g. software developers selling vibe-coded software. In the strongest version, the flow of attested data and AI outputs also becomes a governance lever, in the sense that healthy markets for anything act as both a kind of computation over preferences and a kind of governance.

VI. Implementation and caveats

Enforcement will be partly technical and partly social

Here I've talked in broad strokes about attestation, provenance, and verification a bunch, but have remained relatively agnostic to any particular implementation details. Attestation and verification will involve a mix of technical and social means of enforcement. They might lean more heavily on technical innovations, e.g. new approaches for embedded cryptographic signatures, or more heavily on social forces, e.g. organizations that enforce attestations through reputation.

At a high level, there are at least five areas of existing/ongoing work that could support attestation:

  • C2PA is currently the closest technical analogue. It already supports signed provenance for AI/ML models, training datasets, outputs, fine-tuning, and versioned releases. If one wanted to implement parts of this proposal tomorrow, C2PA is an obvious place to start.

  • in-toto and SLSA provide the generic supply-chain pattern: authenticated statements about subjects, materials, builders, and verification.

  • Model cards, datasheets, and related documentation frameworks make model and dataset facts legible via standardized processes.

  • Frontier AI auditing focuses on institutional assurance: who gets access, what independence standards matter, what level of assurance an audit provides, and how results should be communicated.

  • Data licensing explores the contractual layer around who can use what data, for which purposes, and under what terms. That layer is not itself an attestation format, but it helps define the claims an attestation system would need to make legible and enforceable.

The transition will be gradual

Recent trends in dataset documentation suggest a world of transparent data for AI models is not yet around the corner. Dataset details in model and system cards remain extremely short and vague, as reflected in the Foundation Model Transparency Index, and understandably are likely to remain short and vague until additional legal clarity is achieved. That said, for both researchers and consumers, it will be very valuable to keep an eye out for new open releases, such as models from EleutherAI, AI2, and academic groups (see e.g. recent work from Fan et al. on model training using highly compliant web data).

But critically, almost all the actions that data protectors might take now, including developing and offering anti-scraping technologies, helping to socialize anti-scraping and AI preference signals, and supporting related research on data value estimation, full-consent LLMs, partnerships with national labs, etc., are likely to also make it easier to work toward attested data.

Appendix I: how this proposal connects to earlier posts

This first appendix is meant to be read as optional background for returning readers (I suppose the main audience being myself, as this is a thinking-in-public blog). It shows how the proposal connects back to the "quasi-enclosure" piece (itself a follow-up to "tipping points"), the "data rules" piece, and an earlier series of posts discussing evaluation and data labor.

On "quasi-enclosure" and "tipping points"

Those posts argued that AI can increase useful knowledge work while still routing the resulting data into private pools, creating a precarious quasi-enclosure and raising the risk of content-ecosystem tipping points. The attestation proposal could address that problem by making the human and organizational inputs to AI systems more visible, more clearly credited, and easier to contract around. If labs compete partly on verifiable claims about who contributed, who evaluated, and under what terms, then some of the value currently captured silently through closed transcript pools can instead flow through explicit labor relationships, collective bargaining, and public-facing quality claims. That does not solve the commons problem by itself, but it does create a more plausible "golden path" in which AI products remain broadly useful while preserving stronger incentives for the people and institutions that keep producing the knowledge those systems depend on.

Critically, to retain the benefits to humanity of maintaining knowledge commons in general, we must also create a set of interventions and social norms aimed explicitly at maintaining a knowledge commons. Attestation alone will not solve this!

On "data rules"

The "data rules" piece argued for clearer and more enforceable options governing how data can be used across training, retrieval, and evaluation, in ways that help both creators and model builders. Attestation is one record-keeping layer that could make such rules practical. A rule only matters if someone can later verify what happened: whether a dataset was licensed for training but not evaluation, whether an evaluation set was kept separate from training, whether a model output can be linked back to a model version and a body of supporting evidence. In that sense, attestation is not a rival proposal to clearer data rules; it is one concrete way to operationalize them.

Of particular note: designing a schema for attestation could help to surface the types of contracts available to data creators and AI developers.

On "selling AGI like AG1"

That post argued that as AI products get more expensive, buyers will want more than vibes and vague claims about "more intelligence". The attestation proposal gives a direct answer to that consumer problem. Instead of asking a buyer to trust a proprietary blend, it creates the possibility of saying: this model was trained on these kinds of licensed inputs, was evaluated by this many experts with these credentials, and this output is traceably linked to that evaluated system.

The earlier evaluation posts argued that output evaluation is costly, incomplete, and increasingly central as models become more general. They also argued that evaluation labor itself could become a major site of bargaining power. The attestation proposal addresses both points by making evaluation labor legible, portable, and auditable. An evaluation attestation can say who did the work, what they reviewed, how much they reviewed, and under what process. That helps buyers interpret quality claims, but it also helps workers and collectives bargain over the provision of evaluation labor because the work is more visible.

Appendix II: objections and replies

Q: What if buyers simply do not care about attestations?

A: This is one strong practical objection. If most buyers continue to choose AI products based mainly on outputs, price, latency, UX, and enterprise features, then attestations will not do much work on their own. My claim is not that attestations magically displace those factors; it is that they can become additional decision-relevant signal, especially in higher-stakes domains and in procurement settings where compliance, liability, and reputation matter. If consumer and enterprise demand never materializes, then the proposal likely depends much more heavily on regulation, insurance, or industry self-governance than on pure market pull.

Q: Couldn't this cause people to confuse provenance with "explainability" or "justification"?

A: Yes, that is a real danger. A provenance chain is not the same thing as a causal explanation of why a specific output is correct. Knowing that doctors contributed to training data or reviewed some model outputs does not by itself prove that a particular medical answer is trustworthy. The point of attestation is therefore not to replace epistemic validation, but to supplement it. It gives buyers more information about the system behind an output, not a complete proof that any one answer is right.

A critical part of making this system work would be ensuring that the organizations participating in the attestation chain (e.g., medical organizations) ensure that attestations are, for the most part, epistemically valid.

Q: Won't companies just game the metrics once "attested inputs" become a selling point?

A: They might. If the market starts rewarding visible counts like "number of doctors involved" or "number of licensed sources," firms will have incentives to optimize for what is easiest to attest rather than what most improves the system. That is a classic Goodhart problem. The best response is to design attestations carefully! Even then, some gaming pressure is inevitable.

Q: Doesn't this just move trust from AI labs to auditors and certifiers?

A: In part, yes. Any attestation regime creates a new bottleneck around whoever verifies claims. That raises familiar worries about certification cartels, regulatory capture, and barriers to entry for smaller labs or open projects. This is one reason I describe attestation as market infrastructure rather than a purely technical fix: the hard part is not only signing records, but governing who gets to be trusted as a verifier and under what standards.

Q: How can useful attestations coexist with privacy, secrecy, and anti-gaming concerns?

A: There is a genuine tension here. If attestations are highly specific, they may reveal private contributor information, proprietary dataset composition, or evaluation details that make benchmarks easier to game. If they are too vague, they collapse into branding or marketing copy. There is no perfect resolution. The practical goal is likely to be selective disclosure: enough specificity to support meaningful external checking, without assuming that every contributor identity, dataset row, or eval item can be made public.

Q: Why would attestation reduce scraping or distillation if a rival can still match output quality?

A: It may not reduce those incentives nearly as much as the strongest version of the proposal suggests. If a competitor offers similar or better outputs at lower cost, many buyers will still switch regardless of cleaner provenance. The more modest claim is that attestation can create some product differentiation and make fully consent-based development more legible, not that it eliminates the economic pull of imitation. Scraping and distillation remain live incentives unless buyers, regulators, or counterparties attach real value to the attested relationship itself.

Importantly: for something like health advice, people are paying, in part, for some amount of confidence or peace of mind. Here, the presence of attestations really matters and makes a material difference between model A with attestations and model B with very similar outputs but no attestations.

Q: Does visibility alone actually create bargaining power for contributors?

A: Not by itself. Making data work legible is not the same thing as guaranteeing payment, better labor conditions, or stronger negotiating leverage. Those outcomes also require institutions that can enforce terms, whether through law, contracts, unions, collectives, or platform governance. Attestation can make bargaining easier by making contributions auditable and portable, but it does not substitute for the underlying political and organizational work.

Q: Doesn't this proposal risk deepening commodification rather than solving it?

A: One could argue that converting more knowledge work into attested, contract-governed inputs simply creates a more orderly and financialized form of enclosure. My view is that this risk is real, but the current regime already routes enormous value through opaque appropriation and closed pools. The proposal is best understood not as a full solution to the commons problem, but as an attempt to move some of that extraction into more explicit, contestable, and negotiable relationships.

Appendix III: a simple attestation schema for AI inputs, evaluations, and outputs

Here is one toy worked example of a concrete schema. This sketch borrows from several existing approaches described above.

Design goals

The object model should:

  • work across training, evaluation, and output provenance

  • support both human-readable and machine-readable claims

  • allow signatures or other authenticated proofs

  • stretch goal: allow selective disclosure, since full public disclosure of contributors, datasets, or eval items will often be impossible

  • support both individuals and organizations as contributors, attesters, and verifiers (but likely prefer organizations)

  • make it easy to traverse links from an output to a model version, from the model version to evaluation records, and from there to upstream contributions

Minimal outer wrapper

At a high level, the same outer structure can be reused even when the subject matter changes:

{
  "schema_version": "0.1",
  "object_type": "training_contribution",
  "subject": {
    "kind": "dataset_shard",
    "id": "med-001"
  },
  "attester": {
    "kind": "publisher",
    "id": "licensed-med-publisher"
  },
  "contributor": {
    "kind": "author",
    "credential": "MD"
  },
  "claim": {
    "statement": "A credentialed contributor supplied content used in model development."
  },
  "evidence": {},
  "verifier": {
    "kind": "third_party_auditor",
    "id": "auditor-a"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {},
  "signatures": [],
  "disclosure": {
    "public": true
  }
}
  • schema_version: version of this logical schema, not necessarily the wire format version

  • object_type: claim family, such as training_contribution, evaluation_record, model_release, or output_provenance

  • subject: the dataset shard, model, model version, evaluation artifact, or output being described

  • attester: the entity making the attestation

  • contributor: the person or organization whose role is being attested, if different from the attester

  • claim: the substantive statement being made

  • evidence: artifacts, counts, references, hashes, licenses, reports, or other support for the claim

  • verifier: the party, process, or assurance mechanism that makes the claim more trustworthy

  • timestamp: issuance time

  • links: pointers to upstream or downstream objects

  • signatures: cryptographic signatures or references to them

  • disclosure: what is public, redacted, or privately available to auditors only

The wire format does not have to be this JSON shape. The same logical object could be encoded using a C2PA manifest, an in-toto statement plus predicate, or another signed envelope.

Suggested object types

  • training_contribution: a claim about upstream data, labor, or licensed content used in model development

  • evaluation_record: a claim about who evaluated what, under what protocol, and with what sample size or assurance process

  • model_release: a claim about a model version, checkpoint lineage, deployment status, or serving identity

  • output_provenance: a claim that a specific output came from a specific model version and can be linked back to relevant evaluation and training records

  • credential_attestation: a claim about contributor qualifications, institutional affiliation, or eligibility to participate in a class of work

Toy examples

Training contribution

{
  "schema_version": "0.1",
  "object_type": "training_contribution",
  "subject": {
    "kind": "dataset_shard",
    "id": "med-042"
  },
  "attester": {
    "kind": "publisher",
    "id": "licensed-med-publisher"
  },
  "contributor": {
    "kind": "author",
    "credential": "MD",
    "organization": "licensed medical publisher"
  },
  "claim": {
    "statement": "A peer-reviewed cardiology chapter authored by a credentialed physician was included in licensed corpus shard med-042.",
    "usage": "pretraining"
  },
  "evidence": {
    "artifact_id": "chapter-8841",
    "license_id": "license-332"
  },
  "verifier": {
    "kind": "third_party_auditor",
    "id": "auditor-a"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "downstream_model_run": "model-x-pretrain-run-07",
    "downstream_model": "model-x-2026-03-15"
  },
  "signatures": [
    "sig:licensed-med-publisher:abc123"
  ],
  "disclosure": {
    "public": true,
    "auditor_only_fields": []
  }
}

Evaluation record

{
  "schema_version": "0.1",
  "object_type": "evaluation_record",
  "subject": {
    "kind": "model_version",
    "id": "model-x-2026-03-15"
  },
  "attester": {
    "kind": "evaluation_org",
    "id": "eval-lab-1"
  },
  "contributor": {
    "kind": "reviewer",
    "credential": "MD",
    "experience_years": 20
  },
  "claim": {
    "statement": "A practicing physician reviewed 100 outputs across 10 health-related tasks.",
    "task_set": "health-general-v2"
  },
  "evidence": {
    "sample_size": 100,
    "hours": 100,
    "report_id": "eval-report-77"
  },
  "verifier": {
    "kind": "assurance_process",
    "level": "independent-third-party"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "benchmark_report": "eval-report-77",
    "related_model": "model-x"
  },
  "signatures": [
    "sig:eval-lab-1:def456"
  ],
  "disclosure": {
    "public": true,
    "auditor_only_fields": [
      "sampled_output_ids"
    ]
  }
}

Output provenance

{
  "schema_version": "0.1",
  "object_type": "output_provenance",
  "subject": {
    "kind": "model_output",
    "id": "response-abc123"
  },
  "attester": {
    "kind": "model_provider",
    "id": "provider-y"
  },
  "claim": {
    "statement": "This output was generated by model-x version 2026-03-15."
  },
  "evidence": {
    "request_hash": "req-8fd2",
    "model_version": "model-x-2026-03-15"
  },
  "verifier": {
    "kind": "provider_signature",
    "id": "provider-y-signing-key"
  },
  "timestamp": "2026-04-08T00:00:00Z",
  "links": {
    "model": "model-x",
    "evaluation_attestations": [
      "eval-report-77"
    ],
    "training_attestations": [
      "train-attest-8841"
    ]
  },
  "signatures": [
    "sig:provider-y:ghi789"
  ],
  "disclosure": {
    "public": true
  }
}

If objects like these are interoperable, a buyer or auditor can move:

  • from a downstream answer to the serving model version

  • from the model version to evaluation records

  • from evaluation records and model lineage to upstream contributions