.. _supported: User Guide ========== HPAT automatically parallelizes a subset of Python that is commonly used for data analytics and machine learning. This section describes this subset and how parallelization is performed. HPAT compiles and parallelizes the functions annotated with the `@hpat.jit` decorator. The decorated functions are replaced with generated parallel binaries that run on bare metal. The supported data structures for large datasets are `Numpy `_ arrays and `Pandas `_ dataframes. Automatic Parallelization ------------------------- HPAT parallelizes programs automatically based on the `map-reduce` parallel pattern. Put simply, this means the compiler analyzes the program to determine whether each array should be distributed or not. This analysis uses the semantics of array operations as the program below demonstrates:: @hpat.jit def example_1D(n): f = h5py.File("data.h5", "r") A = f['A'][:] return np.sum(A) This program reads a one-dimensional array called `A` from file and sums its values. Array `A` is the output of an I/O operation and is input to `np.sum`. Based on semantics of I/O and `np.sum`, HPAT determines that `A` can be distributed since I/O can output a distributed array and `np.sum` can take a distributed array as input. In `map-reduce` terminology, `A` is output of a `map` operator and is input to a `reduce` operator. Hence, HPAT distributes `A` and all operations associated with `A` (i.e. I/O and `np.sum`) and generates a parallel binary. This binary replaces the `example_1D` function in the Python program. HPAT can only analyze and parallelize the supported data-parallel operations of Numpy and Pandas (listed below). Hence, only the supported operations can be used for distributed datasets and computations. The sequential computation on small data can be any code that `Numba supports `_. Array Distribution ~~~~~~~~~~~~~~~~~~ Arrays are distributed in one-dimensional block (`1D_Block`) manner among processors. This means that processors own equal chunks of each distributed array, except possibly the last processor. Multi-dimensional arrays are distributed along their first dimension by default. For example, chunks of rows are distributed for a 2D matrix. The figure below illustrates the distribution of a 9-element one-dimensional Numpy array, as well as a 9 by 2 array, on three processors: .. image:: ../figs/dist.jpg :height: 500 :width: 500 :scale: 60 :alt: distribution of 1D array :align: center HPAT replicates the arrays that are not distributed. This is called `REP` distribution for consistency. Argument and Return Variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HPAT assumes argument and return variables to jitted functions are replicated. However, the user can annotate these variables to indicate distributed data. In this case, the user is responsile for handling of the distributed data chunks outside the HPAT scope. For example, the data can come from other jitted functions:: @hpat.jit(distributed={'A'}) def example_return(n): A = np.arange(n) return A @hpat.jit(distributed={'B'}) def example_arg(B): return B.sum() n = 100 A = example_return(n) s = example_arg(A) Distribution Report ~~~~~~~~~~~~~~~~~~~ The distributions found by HPAT can be printed using the `hpat.distribution_report()` function. The distribution report for the above example code is as follows:: Array distributions: $A.23 1D_Block Parfor distributions: 0 1D_Block This report suggests that the function has an array that is distributed in 1D_Block fashion. The variable name is renamed from `A` to `$A.23` through the optimization passes. The report also suggests that there is a `parfor` (data-parallel for loop) that is 1D_Block distributed. Numpy dot() Parallelization ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The `np.dot` function has different distribution rules based on the number of dimensions and the distributions of its input arrays. The example below demonstrates two cases:: @hpat.jit def example_dot(N, D): X = np.random.ranf((N, D)) Y = np.random.ranf(N) w = np.dot(Y, X) z = np.dot(X, w) return z.sum() example_dot(1024, 10) hpat.distribution_report() Here is the output of `hpat.distribution_report()`:: Array distributions: $X.43 1D_Block $Y.45 1D_Block $w.44 REP Parfor distributions: 0 1D_Block 1 1D_Block 2 1D_Block The first `dot` has a 1D array with `1D_Block` distribution as first input (`Y`), while the second input is a 2D array with `1D_Block` distribution (`X`). Hence, `dot` is a sum reduction across distributed datasets and therefore, the output (`w`) is on the `reduce` side and is assiged `REP` distribution. The second `dot` has a 2D array with `1D_Block` distribution (`X`) as first input, while the second input is a REP array (`w`). Hence, the computation is data-parallel across rows of `X`, which implies a `1D_Block` distibution for output (`z`). Variable `z` does not exist in the distribution report since the compiler optimizations were able to eliminate it. Its values are generated and consumed on-the-fly, without memory load/store overheads. Supported Numpy Operations -------------------------- Below is the list of the data-parallel Numpy operators that HPAT can optimize and parallelize. 1. Numpy `element-wise` array operations: * Unary operators: ``+`` ``-`` ``~`` * Binary operators: ``+`` ``-`` ``*`` ``/`` ``/?`` ``%`` ``|`` ``>>`` ``^`` ``<<`` ``&`` ``**`` ``//`` * Comparison operators: ``==`` ``!=`` ``<`` ``<=`` ``>`` ``>=`` * data-parallel math operations: ``add``, ``subtract``, ``multiply``, ``divide``, ``logaddexp``, ``logaddexp2``, ``true_divide``, ``floor_divide``, ``negative``, ``power``, ``remainder``, ``mod``, ``fmod``, ``abs``, ``absolute``, ``fabs``, ``rint``, ``sign``, ``conj``, ``exp``, ``exp2``, ``log``, ``log2``, ``log10``, ``expm1``, ``log1p``, ``sqrt``, ``square``, ``reciprocal``, ``conjugate`` * Trigonometric functions: ``sin``, ``cos``, ``tan``, ``arcsin``, ``arccos``, ``arctan``, ``arctan2``, ``hypot``, ``sinh``, ``cosh``, ``tanh``, ``arcsinh``, ``arccosh``, ``arctanh``, ``deg2rad``, ``rad2deg``, ``degrees``, ``radians`` * Bit manipulation functions: ``bitwise_and``, ``bitwise_or``, ``bitwise_xor``, ``bitwise_not``, ``invert``, ``left_shift``, ``right_shift`` 2. Numpy reduction functions ``sum``, ``prod``, ``min``, ``max``, ``argmin`` and ``argmax``. Currently, `int64` data type is not supported for ``argmin`` and ``argmax``. 3. Numpy array creation functions ``empty``, ``zeros``, ``ones``, ``empty_like``, ``zeros_like``, ``ones_like``, ``full_like``, ``copy``, ``arange`` and ``linspace``. 4. Random number generator functions: ``rand``, ``randn``, ``ranf``, ``random_sample``, ``sample``, ``random``, ``standard_normal``, ``chisquare``, ``weibull``, ``power``, ``geometric``, ``exponential``, ``poisson``, ``rayleigh``, ``normal``, ``uniform``, ``beta``, ``binomial``, ``f``, ``gamma``, ``lognormal``, ``laplace``, ``randint``, ``triangular``. 4. Numpy ``dot`` function between a matrix and a vector, or two vectors. 5. Numpy array comprehensions, such as:: A = np.array([i**2 for i in range(N)]) Optional arguments are not supported unless if explicitly mentioned here. For operations on multi-dimensional arrays, automatic broadcast of dimensions of size 1 is not supported. Explicit Parallel Loops ----------------------- Sometimes explicit parallel loops are required since a program cannot be written in terms of data-parallel operators easily. In this case, one can use HPAT's ``prange`` in place of ``range`` to specify that a loop can be parallelized. The user is required to make sure the loop does not have cross iteration dependencies except for supported reductions. The example below demonstrates a parallel loop with a reduction:: from hpat import jit, prange @jit def prange_test(n): A = np.random.ranf(n) s = 0 for i in prange(len(A)): s += A[i] return s Currently, reductions using ``+=``, ``*=``, ``min``, and ``max`` operators are supported. File I/O -------- Currently, HPAT supports I/O for the `HDF5 `_ and `Parquet `_ formats. For HDF5, the syntax is the same as the `h5py `_ package. For example:: @hpat.jit def example(): f = h5py.File("lr.hdf5", "r") X = f['points'][:] Y = f['responses'][:] For Parquet, the syntax is the same as `pyarrow `_:: import pyarrow.parquet as pq @hpat.jit def kde(): t = pq.read_table('kde.parquet') df = t.to_pandas() X = df['points'].values HPAT automatically parallelizes I/O of different nodes in a distributed setting without any code changes. HPAT needs to know the types of input arrays. If the file name is a constant string, HPAT tries to look at the file at compile time and recognize the types. Otherwise, the user is responsile for providing the types similar to `Numba's typing syntax `_. For example:: @hpat.jit(locals={'X': hpat.float64[:,:], 'Y': hpat.float64[:]}) def example(file_name): f = h5py.File(file_name, "r") X = f['points'][:] Y = f['responses'][:] Print ----- Using ``print`` function is only supported for `REP` values. Print is called on one processor only since all processors have the same copy. Strings ------- Currently, HPAT provides basic ASCII string support. Constant strings, equality comparison of strings (``==`` and ``!=``), ``split`` function, extracting characters (e.g. ``s[1]``), concatination, and convertion to `int` and `float` are supported. Here are some examples:: s = 'test_str' flag = (s == 'test_str') flag = (s != 'test_str') s_list = s.split('_') c = s[1] s = s+'_test' a = int('12') b = float('1.2') Dictionaries ------------ HPAT supports basic integer dictionaries currently. ``DictIntInt`` is the type for dictionaries with 64-bit integer keys and values, while ``DictInt32Int32`` is for 32-bit integer ones. Getting and setting values, ``pop`` and ``get`` operators, as well as ``min`` and ``max`` of keys is supported. For example:: d = DictIntInt() d[2] = 3 a = d[2] b = d.get(3, 0) d.pop(2) d[3] = 4 a = min(d.keys())