My Quest: Sort 100 Million Integers With OpenMP
Over the past couple of weeks I have been teaching myself OpenMP. I wonted to see how much faster I could make my previous Quick Sort with Insertion Sort implementation. What is so great about OpenMP is it allows you to transform a single threaded code bit into a multi-threaded one with just a few lines of code. Here is my implementation:
void QuickSortImpl::Sort(vector< int >::iterator start, vector< int >::iterator end) { int numProcs = omp_get_num_procs(); omp_set_num_threads(numProcs); #pragma omp parallel { #pragma omp single nowait Sort_h(start, end-1); } } void QuickSortImpl::Sort_h(vector< int >::iterator start, vector< int >::iterator end) { if(start < end) { if(end - start < 20) { InsertionSort(start, end); } else { int q = Partition(start, end); #pragma omp task { Sort_h(start, start + (q - 1)); } Sort_h(start + (q + 1), end); } } } int QuickSortImpl::Partition(vector< int >::iterator start, vector< int >::iterator end) { int partitionIndex = (end - start)/2; int privotValue = start[partitionIndex]; int tmp = start[partitionIndex]; start[partitionIndex] = *end; *end = tmp; int swapIndex = 0; for (int i = 0; i < end - start; i++) { if(start[i] <= privotValue) { tmp = start[swapIndex]; start[swapIndex] = start[i]; start[i] = tmp; swapIndex++; } } tmp = start[swapIndex]; start[swapIndex] = *end; *end = tmp; return swapIndex; } void QuickSortImpl::InsertionSort(vector< int >::iterator start, vector< int >::iterator end) { for(int j = 1; j < = end - start; j++) { int key = start[j]; int i = j - 1; while(i >= 0 && start[i] > key) { start[i+1] = start[i]; i = i - 1; } start[i+1] = key; } }
I chose to only task off one of my recursive calls in the sort_h routine. This is because there is a small overhead associated with sending off a task to the queue of waiting threads to execute. The actual improvement to performance was fairly minimal at about 0.2-0.3 seconds. This trick would have been more noticeable if there was more iterations of sort_h.
This implementation sorted my random list of integers in roughly 5.5 seconds. That means it is roughly a factor of three speed up over the single threaded version. It also better utilizes my desktop's CPUs but spiking all four cores for a few seconds to complete the operation.
Besides the pretty impressive speed up, what strikes me the most interesting is how little code changed from my previous implementation. Literally 4 additional lines of code was required to turn a single threaded Quick Sort into a parallel Quick Sort that scales pretty well with the number of CPU's you have (on most desktops). This just goes to show you the power of OpenMP. I highly recommend anyone needing to write multi-threaded c++ code to look at OpenMP.
Download: OpenMPQuickSort
No comments yet.