NVIDIA CUB

Reusable software components for every layer of the CUDA programming mode

When working with GPUs, CPU creates a queue of operations to be run on GPU

in JAX, XLA uses cub to fuse(compose) operations together, so overhead of changing operations in threads is reduced

Based on easyness of programming:

ptax < cuda < cub < thrust

Digital Garden