Performance Analysis
Specific to gcc compiled programs
General
- ALWAYS run with
-O3
enabled, even while debugging - Don’t run with
-Ofast
as it replaces NaNs and inf with 0 and very large value -g
i.e. debug symbols DO NOT add any overhead to the program
Performance Analysis Tools
perf: a command line performance analysis tool for linux. See Brendan Gregg’s Blogs for exmaples on how to use
Nvidia Insight Systems: a system-wide performance analysis tool designed to visualize an application’s algorithms, identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs
Intel Advisor: design and analysis tool for developing performant code. The tool supports C, C++, Fortran, SYCL*, OpenMP*, OpenCL™ code, and Python