Supported Pandas Operations¶
Below is the list of the Pandas operators that HPAT supports. Optional arguments are not supported unless if specified. Since Numba doesn’t support Pandas, only these operations can be used for both large and small datasets.
In addition:
- Accessing columns using both getitem (e.g.
df['A']) and attribute (e.g.df.A) is supported. - Using columns similar to Numpy arrays and performing data-parallel operations listed previously is supported.
- Filtering data frames using boolean arrays is supported
(e.g.
df[df.A > .5]).
Integer NaN Issue¶
DataFrame columns with integer data need special care. Pandas dynamically converts integer columns to floating point when NaN values are needed. This is because Numpy does not support NaN values for integers. HPAT does not perform this conversion unless enough information is available at compilation time. Hence, the user is responsible for manual conversion of integer data to floating point data if needed.
Input/Output¶
pandas.read_csv()- Arguments
filepath_or_buffer,sep,delimiter,names,usecols,dtype, andparse_datesare supported. filepath_or_buffer,namesanddtypearguments are required.names,usecols,parse_datesshould be constant lists.dtypeshould be a constant dictionary of strings and types.
- Arguments
pandas.read_parquet()- If filename is constant, HPAT finds the schema from file at compilation time. Otherwise, schema should be provided.
General functions¶
pandas.merge()- Arguments
left,right,as_of,how,on,left_onandright_onare supported. on,left_onandright_onshould be constant strings or constant list of strings.
- Arguments
pandas.concat()- Input list or tuple of dataframes or series is supported.
Series¶
pandas.Series()- Argument
datacan be a list or array.
- Argument
Attributes:
Series.valuesSeries.shapeSeries.ndimSeries.size
Methods:
Series.copy()
Indexing, iteration:
Series.iat()Series.iloc()
Binary operator functions:
Series.add()Series.sub()Series.mul()Series.div()Series.truediv()Series.floordiv()Series.mod()Series.pow()Series.combine()Series.lt()Series.gt()Series.le()Series.ge()Series.ne()
Function application, GroupBy & Window:
Series.apply()Series.map()Series.rolling()
Computations / Descriptive Stats:
Series.abs()Series.corr()Series.count()Series.cov()Series.cumsum()Series.describe()currently returns a string instead of Series object.Series.max()Series.mean()Series.median()Series.min()Series.nlargest()Series.nsmallest()Series.pct_change()Series.prod()Series.quantile()Series.std()Series.sum()Series.var()Series.unique()Series.nunique()
Reindexing / Selection / Label manipulation:
Series.head()Series.idxmax()Series.idxmin()Series.take()
Missing data handling:
Series.isna()Series.notna()Series.dropna()Series.fillna()
Reshaping, sorting:
Series.argsort()Series.sort_values()Series.append()
Time series-related:
Series.shift()
String handling:
Series.str.contains()Series.str.len()
DataFrame¶
pandas.DataFrame()Only
dataargument with a dictionary input is supported.
Attributes and underlying data:
DataFrame.values
Indexing, iteration:
DataFrame.head()DataFrame.iat()DataFrame.iloc()DataFrame.isin()
Function application, GroupBy & Window:
DataFrame.apply()DataFrame.groupby()DataFrame.rolling()
Computations / Descriptive Stats:
DataFrame.describe()
Missing data handling:
DataFrame.dropna()DataFrame.fillna()
Reshaping, sorting, transposing
DataFrame.pivot_table()- Arguments
values,index,columnsandaggfuncare supported. - Annotation of pivot values is required. For example, @hpat.jit(pivots={‘pt’: [‘small’, ‘large’]}) declares the output pivot table pt will have columns called small and large.
- Arguments
DataFrame.sort_values()by argument should be constant string or constant list of strings.DataFrame.append()
DatetimeIndex¶
DatetimeIndex.yearDatetimeIndex.monthDatetimeIndex.dayDatetimeIndex.hourDatetimeIndex.minuteDatetimeIndex.secondDatetimeIndex.microsecondDatetimeIndex.nanosecondDatetimeIndex.dateDatetimeIndex.min()DatetimeIndex.max()
TimedeltaIndex¶
TimedeltaIndex.daysTimedeltaIndex.secondTimedeltaIndex.microsecondTimedeltaIndex.nanosecond
Timestamp¶
Timestamp.dayTimestamp.hourTimestamp.microsecondTimestamp.monthTimestamp.nanosecondTimestamp.secondTimestamp.yearTimestamp.date()
Window¶
Rolling.count()Rolling.sum()Rolling.mean()Rolling.median()Rolling.var()Rolling.std()Rolling.min()Rolling.max()Rolling.corr()Rolling.cov()Rolling.apply()
GroupBy¶
GroupBy.apply()GroupBy.count()GroupBy.max()GroupBy.mean()GroupBy.median()GroupBy.min()GroupBy.prod()GroupBy.std()GroupBy.sum()GroupBy.var()