Monthly Archives: September 2013

Bayesian calculations using MySQL 5.0.1+ Views (ratings)

I’ve been incredibly lax about posting, but I’m going to try to write a new article every week. I have more than enough material, what I’m short on is time. That being said:

One of the sites I manage allows users to cast ratings for other members. It’s always been sort of a kludge but it worked. However, it was far from perfect. I recently had the time to improve it substantially by using Bayesian calculations and a feature of MySQL 5.0.1+ called a view.

For all intents and purposes, a View is a Stored Procedure (for those familiar with Oracle, Sybase or Microsoft SQL).

Bayesian calculations are pretty simple but they can be very CPU intensive. Take a simple example. Say two users have all ratings of 10. User A has six votes and User B has 7 votes. Basically, we want to see which user is more popular over all.

What we have is essentially a table of all votes and we need to track the average number of votes multiply by the average rating. We then add the individuals total votes and multiply that by their rating. We then divide that by the average number of votes plus the number of votes this individual has. The equation looks like this:

((avg_num_votes * avg_rating) + (this_num_votes * this_rating)) / (avg_num_votes + this_num_votes)

I created the view like this: (this assumes we have an existing table in the dB to track ratings, where the columns is “user_account”, the person the vote was cast for. “user” which is the person who cast the vote, and “rating” which is the rating value left by “user” for “user_account”)

CREATE 
    ALGORITHM = UNDEFINED 
    DEFINER = `jon`@`%` 
    SQL SECURITY DEFINER
VIEW `ratings_bayesian` AS
    SELECT 
        `ratings`.`user_account` AS `user_account`,
        ((SELECT 
                COUNT(`ratings`.`user_account`)
            FROM
                `ratings`) / (SELECT 
                COUNT(DISTINCT `ratings`.`user_account`)
            FROM
                `ratings`)) AS `avg_num_votes`,
        (SELECT 
                avg(`ratings`.`rating`)
            FROM
                `ratings`) AS `avg_rating`,
        COUNT(`ratings`.`user_account`) AS `this_num_votes`,
        avg(`ratings`.`rating`) AS `this_rating`
    FROM
        `ratings`
    GROUP BY `ratings`.`user_account`

You call views in MySQL just like you would an ordinary table. The difference is that all of the heavy lifting is handle as a background process.

Now the Display Query is simply

SELECT User_account, ((avg_num_votes * avg_rating) + (this_num_votes * this_rating)) / (avg_num_votes + this_num_votes)
as real_rating FROM `urnawm`.`ratings_bayesian` ORDER BY real_rating DESC

We now have a weighted ratings system that processes in milliseconds and provides an incredibly accurate real time snap shot of your ratings system.

For more information on Bayesian methods and theory, check out http://en.wikipedia.org/wiki/Bayesian.