Sometimes, we write or use code that runs not as fast as we would like to. If we wish to improve execution speed, sometime intuition alone is not good enough to locate where we spend more time. This is where profiling come in handy. We can locate the offending function call and check if we can do something about it.

Assuming the code is written in Python and that is can be launched from command line, we can use the profiler to locate which functions takes the most cumulative time without any modification to the code. This is good!

python -m cProfile -s cumulative zero_shot_pose.py outputs/experiment_synchflip_next --subset_size 200 > profile.txt

We get an output (profile.txt) like this one:

Cool! Now we know that we spend the most time multiplying matrices in Numpy (trying to sort it), and quite sometime parsing the input file with Pandas. I don’t care about parsing because it happens only once, but the matrix multiplication will be repeated on larger runs. So I’ll check if I can minimize this somehow, or use an accelerated library that uses CUDA or OpenCL to do this, such as Cupy, which is a drop-in replace for Numpy that uses CUDA.

Final word: It that case, using Cupy increased speed by 2.5x to 5x depending on the test scenario. This is enough to save hours of running time at the expense of a 2-3 hour investigation.

Leave a Reply

Your email address will not be published. Required fields are marked *