NVIDIA CUB

Github repo

Reusable software components for every layer of the CUDA programming mode

When working with GPUs, CPU creates a queue of operations to be run on GPU

in JAX, XLA uses cub to fuse(compose) operations together, so overhead of changing operations in threads is reduced

Based on easyness of programming:

ptax < cuda <  cub < thrust