Manual Instrumentation of CVODE/SUNDIALS code
This page is about a patch for SUNDIALS that adds realtime profiling (manual instrumentation) to CVODE and the provided direct and iterative linear equation solvers.
Motivation
While profiling solvers with CVODE/Sundials integrators can be done easily with gprof, valgrind, or any other suitable tool, in some situations it is helpful to have runtime statistics/timings at hand, whenever you run the solver.
Examples:
- while developing solvers for general problems with different selections of LES solvers, Jacobian matrix generators, preconditioners
- when using guided problem-specific selection of components
- when pinpointing slowdowns of simulations observed by users (who report to developers)
Usage example
Example output of the PrintTimings() function (available with the patch):
Integrator timings
Function evaluations called from integrator : FEVAL = 20.915 s
Linear system setup : LS_SETUP = 5.64935 s
Linear system solve : LS_SOLVE = 4.73593 s
Direct LES solver timings
Composition of Jacobian matrix df/dy (DQ) : JACOBIAN_GENERATION = 2.97953 s
Factorization of system Jacobian M : JACOBIAN_FACTORIZATION = 0 s
Function evaluations for Jacobian generation : FEVAL_JACOBIAN_GENERATION = 2.83091 s
Iterative LES solver timings
Matrix-vector multiplications : ATIMES = 0.600932 s
Function evaluations during iterative solve : FEVAL_LS_SOLVE = 0 s
Preconditioner timings
Preconditioner setup/generation : PRE_SETUP = 2.66936 s
Function evaluations during setup : FEVAL_PRE_SETUP = 0 s
Preconditioner solves : PRE_SOLVE = 2.60584 s
Example output that can be generated with counters/timers available from SUNDIALS (from my own code):
------------------------------------------------------------------------------
Wall clock time = 32.708 s
------------------------------------------------------------------------------
Integrator: Steps = 1619
Integrator: Newton iterations = 2951
Integrator: Newton convergence failures = 4
Integrator: Error test failures = 197
Integrator: Function evaluation (Newton) = 20.915 s (63.95 %) 2952
Integrator: LES setup = 5.649 s (17.27 %) 503
Integrator: LES solve = 4.736 s (14.48 %) 2951
LES: Linear iterations = 3485
LES: Linear convergence failures = 0
LES: Function evaluations (LS solve) = 0.000 s ( 0.00 %)
LES: Function evaluations (Jacobian gen.) = 2.831 s ( 8.66 %) 784
LES: Jacobian matrix assembly time = 2.980 s ( 9.11 %)
LES: Matrix-vector multiplications = 0.601 s ( 1.84 %) 3485
LES: Function evaluations (precond. setup) = 0.000 s ( 0.00 %) 0
LES: Preconditioner setup = 2.669 s ( 8.16 %) 56
LES: Preconditioner solves = 2.606 s ( 7.97 %) 6359
------------------------------------------------------------------------------
What the patch does
The patch adds a set of functions for calling timing functions for the individual platforms (Windows, Mac, Unix/Linux) and provides wrapper macros for timing own code blocks. In order to avoid cluttering the code, all timer related functions are wrapped in defines, that keep the original SUNDIALS code mostly unchanged.
How is it implemented
The performance critical functions are wrapped by a define that expands to a call to StartTimer() before the wrapped function and StopTimer() after the wrapped function. To avoid local variables, the timer sums are stored in static arrays wherein each timer is identified by an index, for example SUNDIALS_TIMER_FEVAL in the code snipped below.
/* original code */ retval = f(tn, zn[0], zn[1], user_data); /* instrumented code, using timer variable SUNDIALS_TIMER_FEVAL */ SUNDIALS_TIMED_FUNCTION(SUNDIALS_TIMER_FEVAL, retval = f(tn, zn[0], zn[1], user_data); );
The collected sums can be accessed via function TimerSum() or printed out collectively with PrintTimings().
Include file for the functions is include/sundials/sundials_timer.h.
The overhead for the instrumentation is minimal. When commenting-out the SUNDIALS_USE_INSTRUMENTATION define in include/sundials/sundials_timer.h the macro expands to the original code (no overhead). This can be used to evaluate the instrumentation overhead.
Instrumenting own code
In addition to sundials-specific timer variables, you can define your own counters. The maximum index of counter variables must be below SUNDIALS_TIMER_COUNT (also defined in sundials_timer.h).
/*! define own timer index, last official Sundials timer index is 10 */ #define OWN_FUNCTION_TIMER 11 /* instrument my own code */ SUNDIALS_TIMED_FUNCTION(OWN_FUNCTION_TIMER, lengthy_function_a(); lengthy_function_b(); for (i=0; i<1000; ++i) short_function_c(); );
Patch file
sundials_timer_v262.diff (Patch for SUNDIALS Version 2.6.2)
How to apply:
- create a fresh copy of sundials release version 2.6.2
- change into the sundials source-archive root directory
- enter the patch command:
patch -p0 -i /path/to/sundials_timer_v262.diff
(on Windows you may use a suitable patch tool)
Examples
The example files cvAdvDiff_bnd.c and cvDiurnal_kry.c show the final statistics generated by PrintTimers(). Note that some of the timer stats are only counted if corresponding callback functions are registered by the user. Also, when using a direct solver, the timers for the iterative solver components are zero, and vice versa.
Limitations
Currently only CVODE is instrumented and the direct and iterative LES solvers. Adding instrumentation to the other integrators/solvers is straight-forward.
Contact and License
The patch is licensed under the same conditions as SUNDIALS itself (see LICENSE document of Sundials). In case of questions and recommendations, please contact the author:
Andreas Nicolai [andreas -dot- nicolai -at- tu-dresden -dot- de]
Revisions/Suggestions
It was suggested to wrap the entire code block in sundials_timer.* into an ifdef clause which can further be controlled by the cmake configuration system, so that the feature can be enabled/disabled on build.
Discussion points with this appraoch:
- Software that explicitly calls and links to timer-related functions (e.g. PrintTimings) may need to be changed to check for the SUNDIALS_USE_INSTRUMENTATION define.
- Alternatively, dummy functions could be defined whenever SUNDIALS_USE_INSTRUMENTATION is not defined. However, user code may then not work as expected, e.g. screen layout may be broken if PrintTimings() does not print anything at all.