My Quest: Sort 100 Million Integers With OpenMP

Over the past couple of weeks I have been teaching myself OpenMP. I wonted to see how much faster I could make my previous Quick Sort with Insertion Sort implementation. What is so great about OpenMP is it allows you to transform a single threaded code bit into a multi-threaded one with just a few lines of code. Here is my implementation:

 
void QuickSortImpl::Sort(vector< int >::iterator start, vector< int >::iterator end)
{
    int numProcs = omp_get_num_procs();
    omp_set_num_threads(numProcs);
    #pragma omp parallel
    {
        #pragma omp single nowait
        Sort_h(start, end-1);
    }
}
 
void QuickSortImpl::Sort_h(vector< int >::iterator start, vector< int >::iterator end)
{
    if(start < end)
    {
        if(end - start < 20)
        {
            InsertionSort(start, end);
        }
        else
        {
            int q = Partition(start, end);
            #pragma omp task
            {
                Sort_h(start, start + (q - 1));
            }
            Sort_h(start + (q + 1), end);
        }
    }
}
 
int QuickSortImpl::Partition(vector< int >::iterator start, vector< int >::iterator end)
{
    int partitionIndex =  (end - start)/2;
    int privotValue = start[partitionIndex];
    int tmp = start[partitionIndex];
    start[partitionIndex] = *end;
    *end = tmp;
    int swapIndex = 0;
    for (int i = 0; i < end - start; i++)
    {
        if(start[i] <= privotValue)
        {
            tmp = start[swapIndex];
            start[swapIndex] = start[i];
            start[i] = tmp;
            swapIndex++;
        }
    }
    tmp = start[swapIndex];
    start[swapIndex] = *end;
    *end = tmp;
    return swapIndex;
}
void QuickSortImpl::InsertionSort(vector< int >::iterator start, vector< int >::iterator end)
{
    for(int j = 1; j < = end - start; j++)
    {
        int key = start[j];
        int i = j - 1;
        while(i >= 0 && start[i] > key)
        {
            start[i+1] = start[i];
            i = i - 1;
        }
        start[i+1] = key;
    }
}
 

I chose to only task off one of my recursive calls in the sort_h routine. This is because there is a small overhead associated with sending off a task to the queue of waiting threads to execute. The actual improvement to performance was fairly minimal at about 0.2-0.3 seconds. This trick would have been more noticeable if there was more iterations of sort_h.

This implementation sorted my random list of integers in roughly 5.5 seconds. That means it is roughly a factor of three speed up over the single threaded version. It also better utilizes my desktop's CPUs but spiking all four cores for a few seconds to complete the operation.

Besides the pretty impressive speed up, what strikes me the most interesting is how little code changed from my previous implementation. Literally 4 additional lines of code was required to turn a single threaded Quick Sort into a parallel Quick Sort that scales pretty well with the number of CPU's you have (on most desktops). This just goes to show you the power of OpenMP. I highly recommend anyone needing to write multi-threaded c++ code to look at OpenMP.

Download: OpenMPQuickSort

  1. No comments yet.

  1. No trackbacks yet.

Please leave these two fields as-is:

Spam protection by WP Captcha-Free