what is the enhance project?

Heterogeneous accelerator systems are commonly used since the appearance of multicore CPUs and general-purpose graphics processing units (GPUs). Multi-core CPUs combined with GPUs are even provided in standard configurations of personal computers. Also data centers set up experimental cluster systems which use a combination of multi-core processors, GPUs, and specialized co-processors, such as ClearSpeed CSX or FPGAs. This is particularly interesting for solving medium sized scientific computing problems in a cost-effective and energy-efficient way without accessing large and expensive super computing systems. This enables nearly every scientific and industrial institution to access significant computing power to address increasingly complex numerical simulations.

At the current state of the art the usage of such systems is still limited, since most accelerators need to be programmed with proprietary and unfamiliar programming languages and Application Programming Interfaces (APIs). Efficiently developing software for these architectures is highly challenging for unexperienced developers and requires knowledge about the underlying hardware and software components. Hence, the research effort within the ENHANCE project addresses the challenges to ease the development and use of hardware accelerated code.

General structure of the ENHANCE project.
Facing the first challenge to simplify software development for heterogeneous systems we intend to develop a tool-flow that automatically parallelizes loops within an application. We therefore aim on performing the following steps to automate the source-2-source transformation:
  • First, we use a source code parser to analyze the application and translate the code into a polyhedral representation.
  • This intermediate representation may then be optimized by index transformations to maximize the number of independent indices.
  • To map these optimized index paths to the available hardware architectures, we aim to extend the PLUTO project to support heterogeneous architectures. PLUTO already allows to map a polyhedral model to OpenMP. The result will be an internal representation of the index paths for a specific architecture.
  • To generate code for the target architecture, we intend to use the CLooG tool and extend it to the needs of heterogeneous target languages.

In a second part of the project we approach the challenge of performing scheduling decisions at runtime and treat hardware accelerators as peer computation units that are managed by the scheduler like CPU cores. The goal of scheduling tasks in the context of heterogeneous systems is to assign tasks to compute units in order to enable time-sharing of accelerators and to provide fairness among tasks that compete for the same resources. We are therefore aiming to

  • include specific hardware characteristics and the status of the available heterogeneous compute units
  • obtain knowledge about the availability and suitability of a task for a particular hardware accelerator from the application
  • introduce a scheduling and programming model that allows preemption and migration for accelerators
  • evaluate scheduling policies to be used for the scheduling decision
  • implement an extension of the Linux Completely Fair Scheduler to support heterogeneous computing components.

Both of these parts have a strong demand of profiling data to optimize their results and a need for well defined interfaces between application and operating system. In a third part of the ENHANCE project we therefore intend to develop a performance model and a fat binary mechanism. The performance model therefore shall

  • precisely measure and provide runtimes, input- and output data volumes and operation counts of prototype functions and later also generated function implementations using benchmarks tools
  • provide a model to describe operations and dependencies
The intended fat binary model shall include
  • a metadata-model describing application parameters to be used for scheduling tasks
  • binaries for all target architectures.

The partners therefore aim high within the ENHANCE project and aspire a complete framework allowing both automatic parallelization and scheduling of applications on heterogeneous hardware architectures and optimizing the results with an included performance model. During the process of developing the needed tool-flow and the operating system extension, we evaluate our intermediate results iteratively by incorporating the industrial partner's challenging applications on bio-informatics, automotive computing, pollutant dispersion, and thermodynamics. That way, theoretical research and practical application walk hand-in-hand and make the project an exciting experience for all involved partners.