Lazy Skeletons

Lazy Skeletons is a C++ library for building high-performance, data-parallel applications.
Based on the concept of algorithmic skeletons, it provides a simple and expressive API for composing complex computations from a small set of high-level parallel patterns.

Code snippet

Getting Started

Requirements

  • CUDA Toolkit 12.6
  • A compatible NVIDIA GPU
  • C++20 compiler

Installation

Download or clone the repository from GitHub:

git clone https://github.com/rafa-picao/LazySkeletons.git

Usage

Include the lskel.cuh header file in your C++ project and use the lskel namespace to access the library's functionalities.

Performance graph

Examples and Benchmarks

See real-world code examples (like SAXPY) and detailed performance benchmarks demonstrating the benefits of kernel fusion against NVIDIA Thrust.

Why Choose Lazy Skeletons?

Our API delivers optimized performance by automatically fusing multiple operations into a single kernel, dramatically reducing GPU overhead and memory transfers.

Automatic Kernel Fusion

Chain multiple Map and Reduce operations together, and the framework compiles them into a single, highly efficient kernel.

High-Level C++ API

Use expressive parallel patterns (skeletons) to build complex algorithms without writing explicit CUDA or memory boilerplate.

Minimal Overhead

Lazy evaluation ensures computation only happens when the result is needed, minimizing unnecessary execution and host-device synchronization.

Get in Touch

Have questions about the framework, potential contributions, or collaboration opportunities? Reach out via email or connect on social media.