Instrumentation

Manual Instrumentation of CVODE/SUNDIALS code

This page is about a patch for SUNDIALS that adds realtime profiling (manual instrumentation) to CVODE and the provided direct and iterative linear equation solvers.

Motivation

While profiling solvers with CVODE/Sundials integrators can be done easily with gprof, valgrind, or any other suitable tool, in some situations it is helpful to have runtime statistics/timings at hand, whenever you run the solver.

Examples:

  1. while developing solvers for general problems with different selections of LES solvers, Jacobian matrix generators, preconditioners
  2. when using guided problem-specific selection of components
  3. when pinpointing slowdowns of simulations observed by users (who report to developers)

Usage example

Example output of the PrintTimings() function (available with the patch):

Integrator timings
  Function evaluations called from integrator  : FEVAL                     = 20.915 s
  Linear system setup                          : LS_SETUP                  = 5.64935 s
  Linear system solve                          : LS_SOLVE                  = 4.73593 s

Direct LES solver timings
  Composition of Jacobian matrix df/dy (DQ)    : JACOBIAN_GENERATION       = 2.97953 s
  Factorization of system Jacobian M           : JACOBIAN_FACTORIZATION    = 0 s
  Function evaluations for Jacobian generation : FEVAL_JACOBIAN_GENERATION = 2.83091 s

Iterative LES solver timings
  Matrix-vector multiplications                : ATIMES                    = 0.600932 s
  Function evaluations during iterative solve  : FEVAL_LS_SOLVE            = 0 s

Preconditioner timings
  Preconditioner setup/generation              : PRE_SETUP                 = 2.66936 s
  Function evaluations during setup            : FEVAL_PRE_SETUP           = 0 s
  Preconditioner solves                        : PRE_SOLVE                 = 2.60584 s

Example output that can be generated with counters/timers available from SUNDIALS (from my own code):

------------------------------------------------------------------------------
Wall clock time                            =   32.708 s
------------------------------------------------------------------------------
Integrator: Steps                          =                              1619
Integrator: Newton iterations              =                              2951
Integrator: Newton convergence failures    =                                 4
Integrator: Error test failures            =                               197
Integrator: Function evaluation (Newton)   =   20.915 s    (63.95 %)      2952
Integrator: LES setup                      =    5.649 s    (17.27 %)       503
Integrator: LES solve                      =    4.736 s    (14.48 %)      2951
LES: Linear iterations                     =                              3485
LES: Linear convergence failures           =                                 0
LES: Function evaluations (LS solve)       =    0.000 s    ( 0.00 %)
LES: Function evaluations (Jacobian gen.)  =    2.831 s    ( 8.66 %)       784
LES: Jacobian matrix assembly time         =    2.980 s    ( 9.11 %)
LES: Matrix-vector multiplications         =    0.601 s    ( 1.84 %)      3485
LES: Function evaluations (precond. setup) =    0.000 s    ( 0.00 %)         0
LES: Preconditioner setup                  =    2.669 s    ( 8.16 %)        56
LES: Preconditioner solves                 =    2.606 s    ( 7.97 %)      6359
------------------------------------------------------------------------------

What the patch does

The patch adds a set of functions for calling timing functions for the individual platforms (Windows, Mac, Unix/Linux) and provides wrapper macros for timing own code blocks. In order to avoid cluttering the code, all timer related functions are wrapped in defines, that keep the original SUNDIALS code mostly unchanged.

How is it implemented

The performance critical functions are wrapped by a define that expands to a call to StartTimer() before the wrapped function and StopTimer() after the wrapped function. To avoid local variables, the timer sums are stored in static arrays wherein each timer is identified by an index, for example SUNDIALS_TIMER_FEVAL in the code snipped below.

/* original code */
retval = f(tn, zn[0], zn[1], user_data); 
 
/* instrumented code, using timer variable SUNDIALS_TIMER_FEVAL */
SUNDIALS_TIMED_FUNCTION(SUNDIALS_TIMER_FEVAL,
  retval = f(tn, zn[0], zn[1], user_data);
);

The collected sums can be accessed via function TimerSum() or printed out collectively with PrintTimings().

Include file for the functions is include/sundials/sundials_timer.h.

The overhead for the instrumentation is minimal. When commenting-out the SUNDIALS_USE_INSTRUMENTATION define in include/sundials/sundials_timer.h the macro expands to the original code (no overhead). This can be used to evaluate the instrumentation overhead.

Instrumenting own code

In addition to sundials-specific timer variables, you can define your own counters. The maximum index of counter variables must be below SUNDIALS_TIMER_COUNT (also defined in sundials_timer.h).

/*! define own timer index, last official Sundials timer index is 10 */
#define OWN_FUNCTION_TIMER 11 
 
/* instrument my own code */
 
SUNDIALS_TIMED_FUNCTION(OWN_FUNCTION_TIMER,
  lengthy_function_a();
  lengthy_function_b();
  for (i=0; i<1000; ++i)
    short_function_c();
);

Patch file

sundials_timer_v262.diff (Patch for SUNDIALS Version 2.6.2)

How to apply:

  1. create a fresh copy of sundials release version 2.6.2
  2. change into the sundials source-archive root directory
  3. enter the patch command:
patch -p0 -i /path/to/sundials_timer_v262.diff

(on Windows you may use a suitable patch tool)

Examples

The example files cvAdvDiff_bnd.c and cvDiurnal_kry.c show the final statistics generated by PrintTimers(). Note that some of the timer stats are only counted if corresponding callback functions are registered by the user. Also, when using a direct solver, the timers for the iterative solver components are zero, and vice versa.

Limitations

Currently only CVODE is instrumented and the direct and iterative LES solvers. Adding instrumentation to the other integrators/solvers is straight-forward.

Contact and License

The patch is licensed under the same conditions as SUNDIALS itself (see LICENSE document of Sundials). In case of questions and recommendations, please contact the author:

Andreas Nicolai [andreas -dot- nicolai -at- tu-dresden -dot- de]

Revisions/Suggestions

It was suggested to wrap the entire code block in sundials_timer.* into an ifdef clause which can further be controlled by the cmake configuration system, so that the feature can be enabled/disabled on build.

Discussion points with this appraoch:

  • Software that explicitly calls and links to timer-related functions (e.g. PrintTimings) may need to be changed to check for the SUNDIALS_USE_INSTRUMENTATION define.
  • Alternatively, dummy functions could be defined whenever SUNDIALS_USE_INSTRUMENTATION is not defined. However, user code may then not work as expected, e.g. screen layout may be broken if PrintTimings() does not print anything at all.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.