Issue an RFP
How to issue a request for data on the PortexAI Datalab.
An RFP allows data acquirers to tap into our network of experts and data owners to commission datasets. This feature is ideal for sourcing proprietary datasets for your models or apps that are not yet offered on the Datalab for purchase. By signaling demand in a form of a reverse auction, RFPs incentivize data owners on Portex to create datasets for your project.
To request access to RFP privileges, submit a request within the data studio.
Creating a new RFP
You can create a new RFP under the RFPs
tab in the data studio.

From here, you can specify the details of the RFP.
Title: a clear title for the request like
Monthly Passenger Counts as of June 2025
Modality: the expected dataset modality (e.g. time series, images).
Amount: how much you're willing to pay for this data.
License: the dataset license that accompanies the request (by default, PSDLA).
Expiration: an optional expiration date, buyers can use this to signal urgency for the request.
Description: the RFP description should be as detailed as possible and provide responders with enough background to be able to fulfill the request. An example of a good RFP description is the bottom of this page.

Once you have specified your RFP, click the Create
button at the bottom right to publish it.
Your listing will then be publicly visible on the Datalab for data sellers to view and respond to.


Reviewing requests from data sellers
Responses to your RFP will be displayed in the data studio under the RFPs
tab. A request from a data owner will include:
the owner of that dataset (the respondent on Portex)
the file size and dataset schema
the date of the response
a message from the prospective seller providing more context on their dataset and why it sufficiently answers your RFP

How to tell if the dataset meets your requirements?
There are a few steps we undertake to ensure that you only see high-quality responses.
All responses undergo internal review and quality checks—some automatic and some manual—before being passed along to RFP issuers
We are currently testing an AI agent that can reduce information asymmetry and allow RFP issuers to ask questions about RFP responses without seeing the dataset in its entirety.
After reviewing the request, you can choose to accept or reject it.

After accepting an RFP, you will be able to purchase the dataset.
Purchasing accepted RFPs
Once you've accepted an RFP response, your RFP listing will be converted to a dataset listing where you can proceed with checkout and downloading the dataset.

Example of a good RFP
Here is an example of an effective RFP for a dataset of rent-stabilized apartment counts in New York City. Providing details about the time range, data sources, and (if possible) expectations on the dataset's values can help responders answer your RFP.
Data Specification
NYC Rent‑Stabilized Apartment Counts — Q1 2025 Snapshot (Statement Date 2025‑06‑07)
Summary
Produce a one‑time, city‑wide dataset of rent‑stabilized apartment counts as of the Q1 2025 tax‑bill cycle. Counts must be extracted exclusively from NYC Department of Finance (DOF) Statement of Account PDFs dated 2025‑06‑07 (stmtDate=20250607
). The property universe is derived from the PLUTO tax‑lot dataset after filtering to residential building classes. SOA data is the preferred data source.
Background & Motivation
NYC levies an annual per‑unit rent‑stabilization fee (US $10 until 1 Jul 2019, US $20 thereafter) codified in Administrative Code § 26‑517. Each quarterly SOA PDF contains a table row:
Rent Stabilization‑ Chg # Apts Activity Date Fee Identifier Amount
That row is the city’s official count of stabilized units for that cycle. An example is presented below (source).

1. Source Data & Filtering
1.2 Select residential lots
Keep rows whose bldgclass
is one of: C1 C2 C3 C4 C5 C6 C7 D1 D2 D3 D4 D5 D6 D7 D8 D9
.
1.3 Generate Tax Lot list
Select 10‑digit Borough Block Lot (BBL) string from PLUTO fields.
1.4 Fetch SOA PDFs
For each BBL call:<br>https://a836-edms.nyc.gov/dctm-rest/repositories/dofedmspts/StatementSearch?bbl={BBL}&stmtDate=20250607&stmtType=SOA
2. Parsing Rules & Guidelines
Date filter – retain the line whose
Activity Date
is the latest on the statement.Scalar return – output exactly one unit count per BBL (or
null
).
A starting point for the Regex is:
Rent\\s+Stabilization[^\\d$]*?Chg[^\\d$]*?(?<!\\$)(\\d{1,6})
(?=\\s*(?:\\d{2}/\\d{2}/\\d{4}|\\d{4,9}))
Note: not every building will have rent stabilized units. We expect this value to be close to 50% of all residential buildings.
3. Schema
Schema([
('bbl', String),
('address', String),
('bldgclass', String),
('stmt_date', String), # constant '20250607'
('rent_stabilized_units', Float64),
('pdf_text', String)
])
bbl
– 10‑digit Borough‑Block‑Lot (from PLUTO).address
– PLUTOaddress
field.bldgclass
– PLUTObldgclass
.stmt_date
– always"20250607"
.rent_stabilized_units
– parsed unit count (float to allownull
).pdf_text
– full text extracted from the SOA PDF.
An example is below:

4. Deliverables
Dataset
Parquet file
Methodology
Plain text
Short description of the methodology.
5. Checking your work
The following heuristics can be used to double check your work.
We expect that a large portion of buildings (roughly 50%) will contain no rent stabilized units.
We expect the total number of rent stabilized units to be close to 1M units as per city data.
Search for outliers in the data using sensible heuristics (no apartment should have more than thousands of rent stabilized units), as well as more tailored checks with the official NYC building class codes (e.g. walkup apartments which are Class C should not contains hundreds of rent stabilized units unless the building was incorrectly classified. An anomaly rate of less <1% of buildings to account for human data input error is acceptable.)
Respondents can also use HCR registration records to cross check here.
Last updated