NVIDIA CUB
Reusable software components for every layer of the CUDA programming mode
When working with GPUs, CPU creates a queue of operations to be run on GPU
in JAX, XLA uses cub to fuse(compose) operations together, so overhead of changing operations in threads is reduced
Based on easyness of programming:
ptax < cuda < cub < thrust