Better parallelization for kprofefe, split gather profiles from saving them to profefe

The parallelization made here #10 helped to speed up the gathering of profiles, but the end solution should be to have two different pipelines one to gather profiles and one to save them.

Those two operations can be very different in terms of timing it is good to also have two separate parametrizable concurrency threshold.