Wednesday, March 23, 2011

Finding Median in (Almost) Linear Time

I sometimes find myself needing to find the median of a set of numbers. What I would do was sort the list of numbers and then take the value in the middle index, something like(in Java):
    public static int FindMedian(List <Integer> nums)
{
Collections.sort(nums);
if(nums.size() % 2 == 0)
{
return (nums.get(nums.size() / 2) + nums.get((nums.size() / 2) + 1)) / 2;
}
return nums.get(nums.size() / 2);
}

This solution is on the order of O(n lg n) where n is the number of elements. A faster method would be use a modified quick sort. The idea being that after each partition we gain some knowledge about how many numbers are less than or equal to a given number in the list. So we can be smart and only iterate on the side of the list where we know the selected number is. Here is my implementation of QuickSelect in Java:

public class QuickSelect {
public static < T extends Comparable > T quickSelect(List < T > values, int k)
{
int left = 0;
int right = values.size() - 1;
Random rand = new Random();
while(true)
{
int partionIndex = rand.nextInt(right - left + 1) + left;
int newIndex = partition(values, left, right, partionIndex);
int q = newIndex - left + 1;
if(k == q)
{
return values.get(newIndex);
}
else if(k < q)
{
right = newIndex - 1;
}
else
{
k -= q;
left = newIndex + 1;
}
}
}
private static < T extends Comparable> int partition(List < T > values, int left, int right, int partitionIndex)
{
T partionValue = values.get(partitionIndex);
int newIndex = left;
T temp = values.get(partitionIndex);
values.set(partitionIndex, values.get(right));
values.set(right, temp);
for(int i = left; i < right; i++)
{
if(values.get(i).compareTo(partionValue) < 0)
{
temp = values.get(i);
values.set(i, values.get(newIndex));
values.set(newIndex, temp);
newIndex++;
}
}
temp = values.get(right);
values.set(right, values.get(newIndex));
values.set(newIndex, temp);
return newIndex;
}
}

From here implementing FindMedian should be pretty straight forward. The nice thing about this algorithm is that it can find any kth largest number in a set. Plus on average this will run in O(n) with a worst case of O(n lg n). Enjoy!