HPAT Development¶
Technology Overview and Architecture¶
This slide deck provides an overview of HPAT technology and software architecture.
These papers provide deeper dive in technical ideas (might not be necessary for many developers):
Numba Development¶
HPAT sits on top of Numba and is heavily tied to many of its features. Therefore, understanding Numba’s internal details and being able to develop Numba extensions is necessary.
Start with basic overview of Numba use and try the examples.
User documentation is generally helpful for overview of features.
- ParallelAccelerator documentation provides overview of parallel analysis and transformations in Numba (also used in HPAT).
- Numba architecture page is a good starting point for understanding the internals.
- Learning Numba IR is crucial for understanding transformations. See the IR classes. Setting NUMBA_DEBUG_ARRAY_OPT=1 shows the IR at different stages of ParallelAccelerator and HPAT transformations. Run a simple parallel example and make sure you understad the IR at different stages.
- Exending Numba page provides details on how to provide native implementations for data types and functions. The low-level API should be avoided as much as possible for ease of development and code readability. The unicode support in Numba is an example of a modern extension for Numba (documentation planned).
- A more complex extension is the new dictionary implementation in Numba (documentation planned). It has examples of calling into C code which is implemented as a C extension library. For a simpler example of calling into C library, see HPAT’s I/O features like get_file_size.
- Developer reference manual provides more details if necessary.
HPAT Development¶
HPAT implements Pandas and Numpy API as a DSL. Data structures are implemented as Numba extensions, and compiler stages are responsible for different levels of abstraction. For example, Series data type support and Series transformations implement the Pandas Series API. Follow the pipeline for a simple function like Series.sum() for initial understanding of the transformations.