Previous Attempts

There have been many attempts at building decentralized data marketplaces using public blockchain infrastructure, but most have failed to gain traction.

The genesis of the concept traces back to the 2017 ICO boom, when several projects were created to address the lack of a public marketplace for data products. Coincidentally, the ICO era took place at the pinnacle of the Internet-of-Things (IoT) hype-cycle. As such, those initial attempts overemphasized the segment, with projects like Streamr, IOTA, and DataBroker DAO building IoT data marketplaces with at times substantial data supply, but unclear buyer demand.

As the IoT hype-cycle waned, subsequent attempts to build decentralized data marketplaces were generalized in terms of the data products offered. The Ocean Protocol, for example, was launched in 2019 to broaden the types of data offered beyond IoT. Ocean launched using ERC-721 tokens to store user metadata representing autonomous driving data, medical data, and financial data (Ocean whitepaper). As of 2024, the project has expanded into other data segments such as DeFi and AI, but once again a lack of immediate demand for the underlying data has impacted the volume of sales in the platform.

From an elemental economic perspective, markets are bound to fail when no demand materializes for the product set offered. A commonality amongst projects that have previously attempted to build data marketplaces onchain is a lethal fixation on supply-side factors (increasing data supply) combined with a lack of understanding around demand-side factors (who will buy the data). Another challenge faced by these projects is the use of their own utility tokens; medium-of-exchange tokens used to price the data products offered. There is strong evidence that the volatility of these tokens harms price discovery for the products sold as speculative and transactional motives converge.

A cultural aspect that has also impacted the viability of such markets is the expectation prevalent in crypto that data should be free. This is a dynamic that also undermines analytics providers in the space and is driven by the perception that data in blockchains is immediately transparent and can be easily fetched. Often underappreciated are the non-trivial costs associated with running nodes and building the exporters and front ends that ultimately realize cryptoā€™s perceived transparency. Combined, these factors have historically impacted the marketability of data and its expected value in crypto.

The silver lining is that, in spite of the challenges associated with building data marketplaces, crypto has effectively created a new buyer of data: protocols.

Protocols applications implemented onchain that frequently require data from exogenous systems from the outside world. Protocols are the perfect data buyers: they are designed to automatically and at times perpetually purchase data products as users interact with their contracts.

Protocols are willing to pay for data because, by design, data can be operationalized immediately for mission-critical use cases. The biggest success story among data products that have been operationalized onchain is pricing data. Protocols need an outside view of the price of an asset in order to use it as collateral for loans since blockchains lack a robust, intrinsic mechanism to perform this assessment. As such, many DeFi protocols rely on oracle providers such as Chainlink to relay prices from data vendors offchain.

The Role of Professional Data Products

The overwhelming majority of price feeds brought onchain by oracles today are predominantly sourced from professional data sellers like Kaiko, Coin Metrics, Digital Asset Research, Kraken and CoinGecko. These companies run low-latency systems connected to all major cryptoasset exchanges that ensure low-latency, data quality and uptime. Chainlink node operators purchase data from these providers under SLAs individually and simply relay the prices to a contract on the blockchain that further aggregates the data. These prices are then consumed predominantly by lending protocols, like AAVE, to price loans or liquidate positions.

The success of Chainlink under this model drove many oracle providers to follow the very same architecture. There are now over 16 oracles providing pricing data to protocols under Chainlink-inspired implementations. Chainlink, nevertheless, continues to dominate the segment which has become highly competitive. New entrants have implemented more efficient architectures and helpful new features like price confidence intervals, but overall there is little product differentiation and substantial first-mover advantage.

PMF Beyond Pricing Data

A natural question that has emerged regarding oracles and the role of data markets onchain is whether pricing data sourced offchain is the only data product to have Product-Market Fit (PMF) onchain. Oracles like Chainlink have experimented with other data types, like Verifiable Random Function (VRF) feeds, but 94% of sales still come from pricing data. Out of the 409 data products offered by Chainlink as of 2Q24, 321 are price feeds. The other 88 are predominantly public APIs with data covering things such as weather or sporting events.

At first glance, it might appear that there simply is no demand for products beyond pricing data. After all, Chainlink node operators collected 17x more revenue with price feeds than with all other products combined. However, there is a structural factor at play that disincentivizes the commercialization of more valuable data products for oracles as they are currently structured. The crux of the issue is that oracles today are structured as data delivery mechanisms, not marketplaces. Because oracle node operators must incur the costs associated with data purchases from professional sellers themselves, they are disincentivized from experimenting with paid datasets other than pricing data.

Beyond having to incur the non-trivial costs of data subscriptions from professional data sellers with potentially more useful products, oracles must also incur the network fees associated with posting that data onchain. Under most oracle architectures today, data products for sale are made available onchain at frequent block intervals, with popular feeds like the price of ETH/USD added to nearly every block. For Chainlink, the gas cost of maintaining this storefront onchain at nearly every block has surpassed USD 120M over the past 3 years.

Coupled with the costs of paid data subscriptions incurred offchain, gas costs incurred onchain under existing architectures disincentivize oracles to pursue product diversity and experimentation. Oracle operators must inherently take a hit on their profit margins when offering new products for sale. In todayā€™s highly competitive oracle market, operators are disincentivized to do so. Taking risks by offering new products can be costly because there is no market-driven mechanism for oracles to identify which data sets are in demand by protocols onchain. Ultimately, this dynamic stifles innovation and drives oracles to play it safe by predominantly focusing on pricing data.

We accounted for all of these factors and historical lessons when building Portex. We're on a mission to expand the types of data consumed onchain using an explicit marketplace. Rather than just offering data, we're focused instead on solving problems, starting with Sybil detection.

Last updated