In pfeed, the term dtype
(data type) refers to the granularity of the data, indicating the level of detail at which the data is recorded or aggregated.
pfeed supports a variety of data types, categorized into two main groups:
- Raw data types: ‘raw_tick’, ‘raw_second’, ‘raw_minute’, ‘raw_hour’, ‘raw_daily’
- Aggregated data types: ‘tick’, ‘second’, ‘minute’, ‘hour’, ‘daily’
Raw Data Types¶
Data types prefixed with raw_
represent raw data types. These are the data types closest to the raw data that is directly obtained from the data sources. Only columns renaming and values mapping (e.g. mapping ‘buy’ to 1 and 'sell’to -1 for calculation convenience) are applied to standardize different naming conventions from different data sources, everything else is intact.
For example, the data granularity of [Bybit Data] is tick
data (tick-level trade data), so the corresponding raw data type for Bybit data in pfeed is raw_tick
. Attempting to fetch data using a raw data type that is unsupported by the data source will result in errors. Here is the example of Bybit raw data:
Raw Tick Data¶
df = feed.get_historical_data(
'BTC_USDT_PERP',
resolution='raw_tick', # or use 'raw' implicitly
start_date='2024-03-01',
end_date='2024-03-01',
data_tool='polars', # or 'pandas'
)
print(df.head())
For convenience, you can use
raw
as an implicit raw data type. In the context of [Bybit Data],raw
will be implicitly converted toraw_tick
in pfeed. If there are multiple raw data types available in the data source, the most granular one will be used.
Aggregated Data Types¶
Aggregated Data Types are the data types that are aggregated from raw data types. The aggregation process includes data cleaning, data transformation, and data aggregation. The following are some examples:
Tick Data¶
df = feed.get_historical_data(
'BTC_USDT_PERP',
# resolution = period + timeframe, e.g. 1t (1tick), 2s (2second), 3m (3minute) etc.
resolution='1tick',
start_date='2024-03-01',
end_date='2024-03-01',
data_tool='polars', # or 'pandas'
)
print(df.head())
Minute Data¶
df = feed.get_historical_data(
'BTC_USDT_PERP',
resolution='2m', # 2-minute data
start_date='2024-03-01',
end_date='2024-03-01',
data_tool='polars', # or 'pandas'
)
print(df.head())