Supported Pandas Operations

Below is the list of the Pandas operators that HPAT supports. Optional arguments are not supported unless if specified. Since Numba doesn’t support Pandas, only these operations can be used for both large and small datasets.

In addition:

  • Accessing columns using both getitem (e.g. df['A']) and attribute (e.g. df.A) is supported.
  • Using columns similar to Numpy arrays and performing data-parallel operations listed previously is supported.
  • Filtering data frames using boolean arrays is supported (e.g. df[df.A > .5]).

Integer NaN Issue

DataFrame columns with integer data need special care. Pandas dynamically converts integer columns to floating point when NaN values are needed. This is because Numpy does not support NaN values for integers. HPAT does not perform this conversion unless enough information is available at compilation time. Hence, the user is responsible for manual conversion of integer data to floating point data if needed.

Input/Output

  • pandas.read_csv()

    • Arguments filepath_or_buffer, sep, delimiter, names, usecols, dtype, and parse_dates are supported.
    • filepath_or_buffer, names and dtype arguments are required.
    • names, usecols, parse_dates should be constant lists.
    • dtype should be a constant dictionary of strings and types.
  • pandas.read_parquet()

    • If filename is constant, HPAT finds the schema from file at compilation time. Otherwise, schema should be provided.

General functions

  • pandas.merge()

    • Arguments left, right, as_of, how, on, left_on and right_on are supported.
    • on, left_on and right_on should be constant strings or constant list of strings.
  • pandas.concat()

    • Input list or tuple of dataframes or series is supported.

Series

  • pandas.Series()

    • Argument data can be a list or array.

Attributes:

  • Series.values
  • Series.shape
  • Series.ndim
  • Series.size

Methods:

  • Series.copy()

Indexing, iteration:

  • Series.iat()
  • Series.iloc()

Binary operator functions:

  • Series.add()
  • Series.sub()
  • Series.mul()
  • Series.div()
  • Series.truediv()
  • Series.floordiv()
  • Series.mod()
  • Series.pow()
  • Series.combine()
  • Series.lt()
  • Series.gt()
  • Series.le()
  • Series.ge()
  • Series.ne()

Function application, GroupBy & Window:

  • Series.apply()
  • Series.map()
  • Series.rolling()

Computations / Descriptive Stats:

  • Series.abs()
  • Series.corr()
  • Series.count()
  • Series.cov()
  • Series.cumsum()
  • Series.describe() currently returns a string instead of Series object.
  • Series.max()
  • Series.mean()
  • Series.median()
  • Series.min()
  • Series.nlargest()
  • Series.nsmallest()
  • Series.pct_change()
  • Series.prod()
  • Series.quantile()
  • Series.std()
  • Series.sum()
  • Series.var()
  • Series.unique()
  • Series.nunique()

Reindexing / Selection / Label manipulation:

  • Series.head()
  • Series.idxmax()
  • Series.idxmin()
  • Series.take()

Missing data handling:

  • Series.isna()
  • Series.notna()
  • Series.dropna()
  • Series.fillna()

Reshaping, sorting:

  • Series.argsort()
  • Series.sort_values()
  • Series.append()

Time series-related:

  • Series.shift()

String handling:

  • Series.str.contains()
  • Series.str.len()

DataFrame

  • pandas.DataFrame()

    Only data argument with a dictionary input is supported.

Attributes and underlying data:

  • DataFrame.values

Indexing, iteration:

  • DataFrame.head()
  • DataFrame.iat()
  • DataFrame.iloc()
  • DataFrame.isin()

Function application, GroupBy & Window:

  • DataFrame.apply()
  • DataFrame.groupby()
  • DataFrame.rolling()

Computations / Descriptive Stats:

  • DataFrame.describe()

Missing data handling:

  • DataFrame.dropna()
  • DataFrame.fillna()

Reshaping, sorting, transposing

  • DataFrame.pivot_table()

    • Arguments values, index, columns and aggfunc are supported.
    • Annotation of pivot values is required. For example, @hpat.jit(pivots={‘pt’: [‘small’, ‘large’]}) declares the output pivot table pt will have columns called small and large.
  • DataFrame.sort_values() by argument should be constant string or constant list of strings.

  • DataFrame.append()

DatetimeIndex

  • DatetimeIndex.year
  • DatetimeIndex.month
  • DatetimeIndex.day
  • DatetimeIndex.hour
  • DatetimeIndex.minute
  • DatetimeIndex.second
  • DatetimeIndex.microsecond
  • DatetimeIndex.nanosecond
  • DatetimeIndex.date
  • DatetimeIndex.min()
  • DatetimeIndex.max()

TimedeltaIndex

  • TimedeltaIndex.days
  • TimedeltaIndex.second
  • TimedeltaIndex.microsecond
  • TimedeltaIndex.nanosecond

Timestamp

  • Timestamp.day
  • Timestamp.hour
  • Timestamp.microsecond
  • Timestamp.month
  • Timestamp.nanosecond
  • Timestamp.second
  • Timestamp.year
  • Timestamp.date()

Window

  • Rolling.count()
  • Rolling.sum()
  • Rolling.mean()
  • Rolling.median()
  • Rolling.var()
  • Rolling.std()
  • Rolling.min()
  • Rolling.max()
  • Rolling.corr()
  • Rolling.cov()
  • Rolling.apply()

GroupBy

  • GroupBy.apply()
  • GroupBy.count()
  • GroupBy.max()
  • GroupBy.mean()
  • GroupBy.median()
  • GroupBy.min()
  • GroupBy.prod()
  • GroupBy.std()
  • GroupBy.sum()
  • GroupBy.var()