.. _development: HPAT Development ================ Technology Overview and Architecture ------------------------------------ This `slide deck `_ provides an overview of HPAT technology and software architecture. These papers provide deeper dive in technical ideas (might not be necessary for many developers): - `HPAT paper on automatic parallelization for distributed memory `_ - `HPAT paper on system architecture versus Spark `_ - `HPAT Dataframe DSL approach `_ - `ParallelAccelerator DSL approach `_ Numba Development ----------------- HPAT sits on top of Numba and is heavily tied to many of its features. Therefore, understanding Numba's internal details and being able to develop Numba extensions is necessary. - Start with `basic overview of Numba use `_ and try the examples. - `User documentation `_ is generally helpful for overview of features. - | `ParallelAccelerator documentation `_ provides overview of parallel analysis and transformations in Numba (also used in HPAT). - `Setting up Numba for development `_ - | `Numba architecture page `_ is a good starting point for understanding the internals. - | Learning Numba IR is crucial for understanding transformations. See the `IR classes `_. Setting `NUMBA_DEBUG_ARRAY_OPT=1` shows the IR at different stages of ParallelAccelerator and HPAT transformations. Run `a simple parallel example `_ and make sure you understad the IR at different stages. - | `Exending Numba page `_ provides details on how to provide native implementations for data types and functions. The low-level API should be avoided as much as possible for ease of development and code readability. The `unicode support `_ in Numba is an example of a modern extension for Numba (documentation planned). - | A more complex extension is `the new dictionary implementation in Numba `_ (documentation planned). It has examples of calling into C code which is implemented as `a C extension library `_. For a simpler example of calling into C library, see HPAT's I/O features like `get_file_size `_. - | `Developer reference manual `_ provides more details if necessary. HPAT Development ---------------- HPAT implements Pandas and Numpy API as a DSL. Data structures are implemented as Numba extensions, and compiler stages are responsible for different levels of abstraction. For example, `Series data type support `_ and `Series transformations `_ implement the `Pandas Series API `_. Follow the pipeline for a simple function like `Series.sum()` for initial understanding of the transformations.