Dimbo's Thoughts: Statistics

Tuesday, March 12, 2013

Revenues Property Evaulations

Its been a while since I've posted anything as life has been very quiet because of college, However I just had to get up on my high horse about Revenues new Local Property Tax tool and property evaulations

The Tool
First of all I want to divorce the tool from the content and say that it is great to see them using maps and visualization to get their message across. This sort of thing has been common in the mash up community for some time; overlaying statistics or information on a searchable map. I would question their choice of colours as it makes it very difficult to easily associate a particular hue with its corresponding band, but there is an option to click on a zone and get the relevant information.

The Content
I really am jumping on the bandwagon with the criticism of the valuation process but when I look at the area I grew up in it seems to have a blanket band of €350,000, which is actually Boom prices, not the reality we have 5 years on.
With resources like the property price register (which is a publicly available legal record of the price a house is transacted at over the last 3 years) I'm surprised that they didn't use this information to get a better sense of likely values. In addition there is a wide degree of variation within the dwelling types.

From a quick check on the PPR I see prices of closer to the €250,000 mark being transacted in the recent past. I also notice that the bound seems to defined by the voting district, which to my mind doesn't adequately describe or cluster similar properties together. I suspect if this was done we would wind up with many more, smaller zones.

What could be done to improve the valuations.
Given that we know the closing prices on recently transacted properties and there should be some record somewhere (land commission, planning permission, county council) of each property. This could be cross referenced with some key variables of property type like number of rooms, stories, Square footage to construct a more granular model of the accommodation stock. A clustering algorithm could break down areas into smaller more similar subgroups, which would make valuations more relevant.

What homeowners can do to argue their case
If your house was transacted in the last 2-3 years then you have a very strong basis on which to argue your band, unless you feel that there has been some serious negative equity.
For others the most obvious thing is to look at the property price register over the last 3 years and spot properties that are nearby. It would also be helpful to work out how those houses are the similar or different to your own.
In addition, checking out the prices on DAFT for nearby properties will also help you work out a fairer market price.
There seems to be significant scope for people to argue the valuation of the revenue commissioners but this also means a lot of work for the commissioners to investigate and respond.
Hopefully they will publish an updated set of valuations before next year so that we can see a more realistic picture of property price declines.

Sunday, February 19, 2012

Welfords One Pass Agorithm

Howdy!
I've continued to be quiet recently as I am back at college so a lot of my time is taken trying to understand the material.

This semester I am taking Simulation and Data Mining. The maths is a lot heavier that last semester. Most of it is covered in the form of proofs in the Simulation course. In addition I am re-learning Java programming, pretty much from scratch so when as I understand the maths bits I then try to code it up in Java or implement it in an assignment solution.

We learned about Welford's One Pass Algorithm during the week and it turns out that it should be implemented in an assignment.The algorithm is an efficient way of calculating mean, variance and standard deviation for a sample on the fly.Normally if we were working on paper or in excel we would store the data first and then run back through it, calculating the individual parts of the equation first and then adding those up to get the final answers. This algorithm allows us to compute mean and variance as we iterate through a loop.
The real value from a computer resource point of view is that we don't have the memory overhead of storing the individual bits of the formula first for calculation after.

I wasted a lot of time searching the net yesterday in a hungover state trying to find a way to code it up, as I thought that one of the variables was dependent on the the other two and so would not compute properly. Hopefully this blog entry will help anyone searching for it in plain English.

Standard equations
The fomula for the mean is:

The sum of the individual observations divided by n, the number of observations.

The formula for the variance is:

Standard deviation is simply the square root of s above.

As you can imagine in the above you would have to store values first, then send them through a loop to store the individual differences, and then finally do aggregated calculation. Very costly in terms of memory and code.

Welfords One Pass Algorithm
Welfords algorithm gets around this by calculating the minor changes on the fly. This removes the need to store data and simply looks at the delta's.

The algorithm is below; I have used a general notation which you can apply to whatever language you need.
int n = 0;
double xbar = 0.0; //mean
double var = 0.0; //Variance
double x = 0.0; //Next Observation
double d = 0.0; // Difference

For every observation:
{
x = getNextData();
n++;
d = x - xbar;
v = v + (d^2);
xbar = xbar + d/n;
}

v = v/n ***

*** I made this correction after trying to run this algorithm a second time on a different machine.

One problem I had was that if you are working through an array, the first array index is 0, which caused the entire equation to return NaN; but when I added 1 to it to change the indexing it seemed to work.

You can also calculate the covariance quite easily as well
Covariance is simply (the sum of (x*y) - (the sum of x times the sum of y) / n)/n-1
You can collect x*y as you iterate through the loop
sum of X times sum y can be calculated after the fact!