Wednesday, March 12, 2008

Finding the median value in SQL

Most SQL dialects don't come with a median function. But one can be cobbled together using the procedural programming language that accompanies most databases.

They work like this:
1. Sort the list of values.
2. Count the number of items in the list.
3. Select the element at position n/2.

A function like this can't be expressed purely in SQL because SQL is not procedural.

I've come up with a median function that uses pure SQL:

select age
from person
where (select count(age) from person p1 where p1.age >= person.age) -
(select count(age) from person p2 where p2.age <= person.age) between 0 and 1;

This technique uses two correlated subqueries. The outer query works through the list of values (in this case, age). As it considers each age in turn, it counts the number of ages that are greater than the target age and the number of ages that are less. The difference between the two should be 0 or 1 (depending on whether there are an odd or even number of ages to consider).

When the difference is 0 or 1, you've found the median value.

This solution is O(n2). Expect it to take a long time on large data sets.

Sunday, March 9, 2008

I'm so excited, and I just can't hide it!

As you know from one of my previous posts, I have a list of things I'd like to do during my lifetime. My upcoming sabbatical is going to give me some elbow room to get a few of the items checked off!

During my spring break, I'm going to make a hologram. I've already got all the raw materials: laser, holographic plates, chemicals. After a quick trip to Home Depot, I'll have the pieces needed to build the table on which holograms are made.

Right after school lets out, in May, I'm heading to Florida to watch the next Space Shuttle liftoff. I've built into my schedule a few extra days in case it doesn't blast off on time. As long as liftoff is within three days of the scheduled date, I'll be OK.

2008 is going to be a great year!