The Era of Data Financialization

We all have an intuition in the digital age that data is among the world's most valuable resources.

Businesses heavily leverage data on consumer behavior, transactions, and demographics to inform decisions and fine-tune their product offerings. From social media interactions to purchasing habits and physical location logs, nearly every aspect of modern life produces digital exhaust that can be aggregated and monetized.

Powering this global trade of personal and organizational information is a massive data brokerage industry—a fragmented marketplace of intermediaries who collect, refine, and sell user data to clients across finance, retail, healthcare, advertising, and more. Estimates place the combined annual revenues of data brokers in the hundreds of billions of dollars, projected to surpass 400–600 billion USD worldwide within the decade. Major players like Experian, Acxiom, and Oracle’s Data Cloud dominate major market slices, but thousands of smaller brokers cater to niche datasets and specialized analytics.

Despite its staggering size, this data brokerage market has chronic shortcomings. For starters, much of the data being sold belongs to individuals who generally see no upside from these transactions. Deals are struck behind closed doors, with limited transparency around sale prices or how the data is ultimately used. Brokers typically license the same data to multiple buyers, preserving their own position as gatekeepers while sidelining those individuals who originally produced the data. Moreover, pricing mechanisms remain opaque—brokers often negotiate bespoke contracts on a client-by-client basis, offering minimal public price discovery. Regulations such as GDPR and the Consumer Privacy Act have attempted to implement more disclosures and opt-out procedures, yet the data brokerage industry continues to operate largely as a black box.

Recent innovations in deep learning are poised to fundamentally change the inner workings of the data brokerage industry as an insatiable demand for data has begun to materialize. Fueled by ever more powerful GPUs, the training of large language models (LLMs) demonstrated that scaling neural networks by over an order of magnitude can yield not just incremental improvements but also new capabilities like meta-learning, where the model learns to adapt to new tasks from only a few examples. These so-called “scaling laws” revealed that increased compute, model size, and training data can unlock transformative performance. Traditional data brokers are not structured to service this new buyer of data, who demands flexible pricing structures, better data delivery and, most importantly, more novel datasets. As models approach the limits of freely available web data—the so-called “data wall”— a new mechanism to exchange data is urgently needed to accelerate AI development.

Over the past decade, a wave of crypto projects have envisioned the creation of a public, onchain marketplace that allows data to become an explicit tokenized asset. The genesis of this vision dates back to the 2017 Crypto ICO boom, when numerous ventures emerged to decentralize data trading. Projects like Streamr, IOTA, and DataBroker DAO began with IoT-centric data exchanges, capitalizing on the prevailing IoT hype-cycle at the time. However, they struggled to attract real demand beyond early enthusiasts. Attempts grew more generalized over time, with Ocean Protocol broadening its scope to include autonomous driving data, financial data, AI training sets, and more. Although these projects showcased novel architectures, they encountered the same fundamental roadblock: insufficient actual buyers willing to pay for niche datasets.

From a purely economic perspective, markets are bound to fail when demand is ambiguous or absent. Many of these early onchain marketplaces fixated on supply-side factors, tokenizing as many datasets as possible without developing clear buyer hypotheses, which was considerably more challenging to formulate in the pre-AI age. Compounding the issue was the reliance on utility tokens as a means of exchange, which introduced volatility and speculative behavior that clouded the true price of a given dataset. The emergence of AI models and agents as new data buyers has made it substantially easier to form buyer hypotheses around even the most niche dataset. At the same time, primitives in crypto went through leaps of improvement and maturity, leading to a better infrastructural substrate to implement data marketplaces.

While previous attempts to financialize data using crypto primitives failed to gain traction, an important caveat must be highlighted. The emergence of DeFi provided an important case study in the direction that the data economy is heading. As we have seen in DeFi, it is very likely that machines will become major buyers of data. DeFi protocols, especially lending markets, must leverage oracle services like Chainlink to programatically purchase data. These protocols need to purchase price feeds for evaluating collateral and managing liquidations. Oracles services in crypto showcased, for the first time in history, the viability of machines (smart contracts) governing data purchases and access.

This dynamic spawned a flourishing market for pricing data in crypto sourced from specialized sellers in the Data-as-a-Service segment. The emergence of oracle services onchain underscores the appetite for data that can be immediately operationalized and purchased by machines. However, current architectures in the oracle segment operate exclusively as delivery pipelines rather than true marketplaces, forcing oracle operators themselves to subscribe to data feeds before reselling them onchain.

Consequently, non-price data products see minimal appetite to be commercialized. Oracles sustain high network costs—posting fresh data onchain at short intervals—and must pre-fund data subscriptions with DaaS providers. Given no direct mechanism for protocols to signal which products they want to purchase, operators are disincentivized from experimenting with new data categories, stifling innovation. As a result, pricing data has thrived while more complex or specialized datasets and use cases remain untapped. Nevertheless, the success experienced by oracle operators in crypto with a limited product set is promising as it demonstrates, albeit with a constrained product set, how machines can leverage crypto to make data purchases.

We have learned considerably by operating in DaaS segment and dissecting the current data brokerage industry. What we're building at Portex aims to address existing gaps by expanding the types of data offerings available and using crypto primitives constructively. Rather than perpetuating broker-like opacity or confining data to an oracle pipeline, we seek to expand the scope of data consumed onchain, enabling a broad range of AI, financial, and analytics applications to ramp up new use cases. By giving data buyers—whether protocols, companies, or people—an explicit venue where stable listing prices meet transparent discovery, we hope to offer a credible alternative to the status quo of closed-door data broker deals and underutilized data pipelines as an explicit data marketplace.

Last updated 9 months ago