Upload datasets

How to upload datasets to the PortexAI Datalab

Adding a new dataset

Users on the PortexAI Datalab with data seller permissions can upload datasets through the data studio or link to an external source.

Uploading Datasets on the Datalab

To upload data to the datalab, pick the format that matches your data:

  • Tabular data → upload a .parquet file for faster queries and built-in compression.

  • Multi-modal data (for example, images + text + labels) → bundle everything into a single .tar.gz or .gz archive to keep files together.

For the Init Commit message, add something like initial upload; every future update gets its own commit, giving you a transparent change log buyers can trust. Note that upload speeds will vary based on your local connection.

By default, you can host multiple datasets up to a combined 20 GB. If you're working with something larger simply reach out to us we can work with you to provision more storage for your account. To list a “collection” (several related datasets in one listing), zip them into a single .tar.gz archive first.

Linking an External Dataset to the Datalab

Data owners with very large datasets (e.g. many terabytes) might opt to instead link to an external source. To create an external dataset, users can select External Dataset when creating a new dataset. Optionally include the connection type (e.g. S3 bucket) and the estimated size (in GBs). Sellers can then create a listing just like a dataset uploaded to the Datalab.

Sales of external datasets require sellers to share dataset credentials with buyers at the time of purchase.

Removing a dataset

To remove a dataset, enter the data studio and navigate to the datasets tab. From here, you can select an existing dataset and delete it. You will be asked to enter your password to confirm this action as this action cannot be undone.

Last updated