## Wednesday, March 30, 2011

### My Quest: Sort 100 Million Integers

I have decided to do some research into sorting algorithms to see how fast I can sort a random list of 100 million integers. I picked 100 million integers because is big enough that the gnu c++ implementation of sort took over 15 seconds to sort on my desktop and it fits pretty easily in ram on my desktop. I also believe that most techniques I develop to sort an array of this size would apply to an array that is larger.

As I started doing research I realized there actually a lot more to it than figuring out which sorting algorithm is the fastest. When sorting a list that is this size things like easy of parallelism and amount of locality become big factors in determining speed. Over the coming weeks I will be posting updates to this quest with improvements to my algorithm. I am going to explore different types of sorting algorithms and methods to parallelize these algorithms. My parallelism adventures will likely stick to OpenMP and CUDA.

My method of timing will not be perfect. Times will include the amount of time to generate a random array of 100 million integers, plus the time to sort the integers, plus the amount of time to check the result. I will run all tests on my desktop, which is a Core i5 (4 cores) with 8 GB of ram. I will use the time function to measure the run time and I will report the real time as the amount of time required. As this is just me experimenting, there is no requirement to have accurate measures of the actual algorithms. Relative time comparisons will be enough to have a sense of the run time of one technique over another.

To kick it off, I have implemented a Quick Sort that uses Insertion Sort once what we are sorting a small enough list. I apologize if there are any mistakes in the below code. I had to fight with the editor to get this to format correctly. Please see attached files for actual code and header files.
`void QuickSortImpl::Sort(vector< int >::iterator start, vector< int >::iterator end){    Sort_h(start, end-1);}void QuickSortImpl::Sort_h(vector< int >::iterator start, vector< int >::iterator end){    if(start < end)    {        if(end - start > 15)        {            InsertionSort(start, end);        }        else        {            int size = end - start;            int q = Partition(start, end);            Sort_h(start, start + (q - 1));            Sort_h(start + (q + 1), end);        }    }}int QuickSortImpl::Partition(vector< int >::iterator start, vector< int >::iterator end){    int partitionIndex =  (end - start)/2;    int privotValue = start[partitionIndex];    int tmp = start[partitionIndex];    start[partitionIndex] = *end;    *end = tmp;    int swapIndex = 0;    for (int i = 0; i < end - start; i++)    {        if(start[i] <= privotValue)        {            tmp = start[swapIndex];            start[swapIndex] = start[i];            start[i] = tmp;            swapIndex++;        }    }    tmp = start[swapIndex];    start[swapIndex] = *end;    *end = tmp;    return swapIndex;}void QuickSortImpl::InsertionSort(vector< int >::iterator start, vector< int >::iterator end){    for(int j = 1; j < = end - start; j++)    {         int key = start[j];         int i = j - 1;         while(i >= 0 && start[i] > key)         {              start[i+1] = start[i];              i = i - 1;         }         start[i+1] = key;    }}`

These are pretty common implementations of Quick Sort and Insert Sort. There are however a few notable things. The partition method uses the middle number instead of a random index. This is because my list to be sorted was already randomized, calling rand() again would have just been overhead. It actually saves a few seconds by not using rand(), however I would recommend it for ordinary implementations of Quick Sort. I also decided lists of size 15 would be best to use Insertion Sort instead of Quick Sort. This number was more experimentally derived. It appeared to give the best results.

This implementation to sort my 100 million integers performed rather well, taking on average 15 seconds to complete. Lets see how much better we can make this, more to come.