Performance Analysis

Specific to gcc compiled programs

General

  1. ALWAYS run with -O3 enabled, even while debugging
  2. Don’t run with -Ofast as it replaces NaNs and inf with 0 and very large value
  3. -g i.e. debug symbols DO NOT add any overhead to the program

Performance Analysis Tools

perf: a command line performance analysis tool for linux. See Brendan Gregg’s Blogs for exmaples on how to use

Nvidia Insight Systems: a system-wide performance analysis tool designed to visualize an application’s algorithms, identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs

Intel Advisor: design and analysis tool for developing performant code. The tool supports C, C++, Fortran, SYCL*, OpenMP*, OpenCL™ code, and Python

Perf tools

# `-fno-omit-frame-pointer` : preserves stack trace
g++ -O3 -fno-omit-frame-pointer -g -o run app.cpp
 
 
# `-F99`: frequency of sampling, 99 Hz
# -g: preserve debug symbols (i think)
perf record -g -F99 ./run
# report is outputed in perf.data file
 
 
# by default reads from perf.data present in the same dir
# or you can specify the file with -i flag
perf report -g