Profiling Tools
To debug training inefficiencies we support the following profiling tools:
pytorch autograd profiler:
hannah-train profiler=simple trainer.max_epochs=1 profiler=pytorch
By default profiles are logged to the trainings output directory and can be visualized using tensorboard. For the example run use:
tensorboard --logdir trained_models/test/tc-res8
Simple Profiler:
hannah-train profiler=simple trainer.max_epochs=1 profiler=simple
Generates a summary of the durations of the individual training steps:
Action | Mean duration (s) |Num calls | Total time (s) | Percentage % |
------------------------------------------------------------------------------------------------------------------------------------
Total | - |_ | 29.42 | 100 % |
------------------------------------------------------------------------------------------------------------------------------------
run_training_epoch | 21.692 |1 | 21.692 | 73.733 |
run_training_batch | 0.030893 |288 | 8.8971 | 30.241 |
optimizer_step_and_closure_0 | 0.030348 |288 | 8.7402 | 29.708 |
get_train_batch | 0.022253 |288 | 6.4087 | 21.783 |
evaluation_step_and_end | 0.021021 |277 | 5.8228 | 19.792 |
validation_step | 0.020913 |277 | 5.7929 | 19.69 |
training_step_and_backward | 0.019199 |288 | 5.5292 | 18.794 |
model_forward | 0.010312 |288 | 2.9697 | 10.094 |
training_step | 0.010076 |288 | 2.9017 | 9.8631 |
model_backward | 0.0076062 |288 | 2.1906 | 7.4458 |
on_validation_epoch_end | 0.091247 |2 | 0.18249 | 0.6203 |
on_train_batch_end | 0.0003854 |288 | 0.111 | 0.37728 |
on_train_start | 0.09102 |1 | 0.09102 | 0.30938 |
on_train_epoch_end | 0.055033 |1 | 0.055033 | 0.18706 |
cache_result | 1.6399e-05 |2290 | 0.037554 | 0.12765 |
on_validation_batch_end | 9.3069e-05 |277 | 0.02578 | 0.087628 |
on_validation_end | 0.008392 |2 | 0.016784 | 0.05705 |
on_train_end | 0.0075968 |1 | 0.0075968 | 0.025822 |
on_batch_start | 2.4213e-05 |288 | 0.0069733 | 0.023702 |
on_after_backward | 1.5198e-05 |288 | 0.004377 | 0.014877 |
on_before_zero_grad | 1.4302e-05 |288 | 0.0041189 | 0.014 |
on_train_batch_start | 1.264e-05 |288 | 0.0036403 | 0.012374 |
on_batch_end | 1.1853e-05 |288 | 0.0034136 | 0.011603 |
training_step_end | 9.123e-06 |288 | 0.0026274 | 0.0089307 |
on_validation_batch_start | 8.5754e-06 |277 | 0.0023754 | 0.008074 |
validation_step_end | 7.5117e-06 |277 | 0.0020807 | 0.0070725 |
on_validation_start | 0.00019774 |2 | 0.00039547 | 0.0013442 |
on_epoch_start | 0.00021634 |1 | 0.00021634 | 0.00073533 |
on_epoch_end | 1.4326e-05 |3 | 4.2978e-05 | 0.00014608 |
on_validation_epoch_start | 1.0761e-05 |2 | 2.1523e-05 | 7.3157e-05 |
on_fit_start | 2.0443e-05 |1 | 2.0443e-05 | 6.9486e-05 |
on_train_epoch_start | 1.2379e-05 |1 | 1.2379e-05 | 4.2077e-05 |
on_before_accelerator_backend_setup | 1.1862e-05 |1 | 1.1862e-05 | 4.0319e-05 |
Advanced Profiler
hannah-train profiler=simple trainer.max_epochs=1 profiler=advanced
Creates python level performance summaries for the individual Training steps:
Profile stats for: optimizer_step_and_closure_0
8064 function calls in 8.897 seconds
Ordered by: cumulative time
List reduced from 25 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
288 0.001 0.000 8.897 0.031 accelerator.py:265(optimizer_step)
288 0.000 0.000 8.896 0.031 accelerator.py:282(run_optimizer_step)
288 0.000 0.000 8.896 0.031 training_type_plugin.py:159(optimizer_step)
288 0.000 0.000 8.896 0.031 grad_mode.py:12(decorate_context)
288 0.000 0.000 8.894 0.031 adamw.py:54(step)
288 0.000 0.000 8.893 0.031 training_loop.py:648(train_step_and_backward_closure)
288 0.000 0.000 8.893 0.031 training_loop.py:737(training_step_and_backward)
288 0.000 0.000 8.892 0.031 contextlib.py:108(__enter__)
288 0.000 0.000 8.892 0.031 {built-in method builtins.next}
288 0.000 0.000 8.892 0.031 profilers.py:62(profile)
288 0.000 0.000 8.892 0.031 profilers.py:250(start)
288 8.892 0.031 8.892 0.031 {method 'enable' of '_lsprof.Profiler' objects}
288 0.000 0.000 0.001 0.000 grad_mode.py:65(__enter__)
288 0.000 0.000 0.000 0.000 accelerator.py:104(lightning_module)
288 0.000 0.000 0.000 0.000 contextlib.py:238(helper)
288 0.000 0.000 0.000 0.000 training_type_plugin.py:91(lightning_module)
288 0.000 0.000 0.000 0.000 grad_mode.py:104(__enter__)
288 0.000 0.000 0.000 0.000 contextlib.py:82(__init__)
576 0.000 0.000 0.000 0.000 {built-in method torch._C.is_grad_enabled}
288 0.000 0.000 0.000 0.000 base.py:82(unwrap_lightning_module)