Tuesday, January 27, 2015

New Mortgage rules and the salary multiple cap

According to Rte the central bank rules for new buyers would be 10% on the first 220k then 20% thereafter so on a 300k house (not unrealistic in Dublin) you'd need a deposit of 38k which equates to 12.6%.
However there was a mention of a borrowing cap of 3.5 times salary so if you want to but that 300k house you will need a combined income of nearly 75k.
Hopefully house prices will reduce but even with the relaxed deposit rule for first time buyers the income multiple threshold may cut them out of the market.

Friday, October 17, 2014

Analyzing a cross section of the Irish secondhand car carket - Part 1

I've been planning to learn a bit more about R and I'm also in the market for a new car so what better time to play around with the data and look at whats available in the Irish car market.

Getting the data
I used Python and the Beautiful Soup package to gather some high level details about the cars available in Ireland. The details that the script pulled was limited to:

  • Make
  • Model
  • Engine
  • Seller Type
  • County
  • Mileage
  • Year
  • Price
Selecting the data
I reduced the list to cars 20 years old and newer. This produced a list of 12990 cars on offer.
The above is produced by in R with: hist(irl_cars$Year)

I focused on a range of Makes and models which excluded high end vehicles. I acknowledge that this was partly to reduce the skewness of the distribution but also on the basis that this analysis focused on cars one might see regularly.

  • BMW
  • Fiat
  • Ford
  • Honda
  • Hyundai
  • Mercedes-Benz
  • Nissan
  • Opel
  • Peugeot
  • Renault
  • Seat
  • Skoda
  • Toyota
  • Volkswagen
  • Volvo

lablist <- as.vector((unique(modCars$Maker))
counts <- table(modCars$Maker)
plot(counts, xaxt="n", main="Qty of Cars by Make")
text(1:15, par("usr")[3], labels=lablist, srt=90, pos=2, xpd=TRUE)

Looking at the mileage it looks like there are a number of cars above 500,000 miles
While this is not impossible, it seems unlikely that cars are doing so many miles. So I plotted out the number of miles by year for any cars above 100k:

plot(limCar$Year, limCar$Mileage, main="Mileage plot by Year", xlab="Year", ylab="Miles")
There is a very clear separation (which I have highlighted with a black line) where the bulk of observations lie below the 400,000 mile mark. There is still a great deal of dispersion above this line with some values looking high but not necessarily unreasonable. At this point so common sense might help. The maximum mileage is 2,500,002 miles for an 11 year old car. This implies that car did 622 miles a day every day for 11 years; 60 miles an hour 10 hours a day? Seems very unlikely, so its probably safe enough to discard this as a typo.
What about the ones of 1,000,000 miles and over? The earliest one of those is 13 years old so applying the same logic this would imply 210 miles a day every day for 13 years which equally seems unlikely. 

If the above seems too much like intuition we can apply something a little more reasoned, and I can try out 2 different methods of outlier detection to boot;

  1. Mean and standard deviation
  2. Median and median absolute deviation
Method 1 is more common, but method 2 is better when there are large outliers.
In both cases I am using a k-factor of 3as the threshold to detect outliers
1) Mean(modCars$Mileage) = 75841.97; sd(modCars$Mileage) = 53810.45
75841.97 + (3*53810.45) = 236,913.3 Miles
2) Median(modCars$Mileage) = 74936; mad((modCars$Mileage)=43471.31
74936 + (3*43471.31) = 205,349.90 miles

Method 1 exposes 32 cars and method 2 exposes 78 cars; my "intuition" exposed 13 cars though it was based on a simple visual inspection. I'll create 3 data sets for the follow up posts.

Sunday, August 3, 2014

The Birthday paradox at a wedding

I know I've been very quiet of late but I started a new job 4 months ago and have been pouring myself into it to learn as much as I can. More importantly I've been preparing for my wedding to the lovely Lelly Ann, though to be honest she has been doing most of the tough work so she deserves all the glory!.

The big day is only 5 days away now. I had great hopes of writing a cool optimizer that would seat guests based on their affinity but unfortunately laziness and an over-estimation of my programming skills got in the way. However I have been thinking about the birthday paradox recently and since there will be over 100 people in the room next Friday I thought it was a nice anecdote.

The Birthday paradox arises from the chances of two or more people in a group having the same birthday. Given that there are 365 days (ignoring leap years for simplicitys sake) in a year you would think that the chances that any 2 people might have the same birthday would be extremely low;
however this is not the case.
Wikipedia has a great page on it so I won't reproduce their excellent explanation but it turns out that at 23 people the odds tip over 50%, which is better than a coin toss. In terms of our day we are due to have 106 guests.
The formula is:
1 - (Permutation(365,n)/(365^n)) Where n is the number of people involved.

so 106 guests works out at 99.99999574936430000% or as close to 100% as makes no difference.
We also have tables of 8,10 & 11 which work out respectively at 7.43%, 9.46% & 14.11% respectively. I wonder would it be worth sampling each table to see how many times this actually comes through. with 10 tables we should probably see this at least once!

Wednesday, December 4, 2013

Understanding Monty Hall, and learning Python

Finally back with a post after a long hiatus!

I've finished the Masters that has been sucking the hours out of my life for the last 2 years and suddenly realized that I needed to add some better programming skills to my CV. I started doing the excellent R and interactive Python courses on Coursera. They give a good introduction and provide challenging assignments to help students build their knowledge.

I decided to code up some of the algorithms and concepts I've learned over the last 2 years to  improve my understanding of them and to develop my skills at coding. The first concept I decided to tackle was that of variable change, best captured in the Monty Hall problem.

The Monty Hall problem is a great example of where intuition is at odds with statistics. The player has to guess which of 3 doors of a prize is behind. Upon selection the game show host removes or opens a door that you haven't selected. He then offers you the choice to stick with the door you chose originally or switch to the remaining door.

The most common response is that it shouldn't make any difference; You chose 1 door out of 3 (33% chance) and now there are only 2 doors left so the odds are now 50/50. You may even feel that your odds have improved. So sticking with your initial choice should be as good a choice as switching to the other door.
This is actually wrong! It took me a long time to understand it and the reason is that we get very tied up thinking about winning. Instead lets think about losing. When the game started you had a 2 in 3 chance of picking the wrong door, which means you probably did. Since the other losing door has now been removed it makes the most sense to switch away from the initial (and likely wrong) door you chose in the first place.

Statistical evidence shows that contestants that switch from their initial choice win 2 out of every 3 times. This is what I wanted to code up and demonstrate with python. The table below demonstrates the results I collected over several runs with the python code I developed.

I've included the code snipped below which you can paste into Python or save as a .py and execute.
Appreciate any feedback either if the post was helpful to you, or if you have any improvements to suggest.

import random
import math

def setPrize():


    tmp[door] = 1
    tmp[((door+1) % 3)]=0
    tmp[((door+2) % 3)]=0
    return tmp
finalDoorz = [0,0]


prize = setPrize()

for i in range(0,n):
    prize = setPrize()
    #Player chooses their first door
    choice = random.randint(0,2)

    #We put the players choice in position 0
    if prize[choice]==1:
    elif prize[choice]==0:

    #Second part; stick or switch; 0 stays with the users choice, 1 switches it.
    choiceb= random.randint(0,1)

    if finalDoorz[choiceb]==1:

    if choiceb==0:

    if finalDoorz[choiceb]==1 and choiceb==0:
        sticknwin +=1
    elif finalDoorz[choiceb]==1 and choiceb==1:
        switchnwin +=1
    elif finalDoorz[choiceb]==0 and choiceb==1:
        switchnlose +=1
    elif finalDoorz[choiceb]==0 and choiceb==0:
        sticknlose +=1

print "Number of Observations: " + str(n)
print "Switch and Win: " + str(switchnwin/n)
print "Stick and Win: " + str(sticknwin/n)
print "Switch and Lose: " + str(switchnlose/n)
print "Stick and lose: " + str(sticknlose/n)

Tuesday, March 12, 2013

Revenues Property Evaulations

Its been a while since I've posted anything as life has been very quiet because of college, However I just had to get up on my high horse about Revenues new Local Property Tax tool and property evaulations

The Tool
First of all I want to divorce the tool from the content and say that it is great to see them using maps and visualization to get their message across. This sort of thing has been common in the mash up community for some time; overlaying statistics or information on a searchable map. I would question their choice of colours as it makes it very difficult to easily associate a particular hue with its corresponding band, but there is an option to click on a zone and get the relevant information.

The Content
I really am jumping on the bandwagon with the criticism of the valuation process but when I look at the area I grew up in it seems to have a blanket band of €350,000, which is actually Boom prices, not the reality we have 5 years on.
With resources like the property price register (which is a publicly available legal record of the price a house is transacted at over the last 3 years) I'm surprised that they didn't use this information to get a better sense of likely values. In addition there is a wide degree of variation within the dwelling types.

From a quick check on the PPR I see prices of closer to the €250,000 mark being transacted in the recent past. I also notice that the bound seems to defined by the voting district, which to my mind doesn't adequately describe or cluster similar properties together. I suspect if this was done we would wind up with many more, smaller zones.

What could be done to improve the valuations.
Given that we know the closing prices on recently transacted properties and there should be some record somewhere (land commission, planning permission, county council) of each property. This could be cross referenced with some key variables of property type like number of rooms, stories, Square footage to construct a more granular model of the accommodation stock. A clustering algorithm could break down areas into smaller more similar subgroups, which would make valuations more relevant.

What homeowners can do to argue their case
If your house was transacted in the last 2-3 years then you have a very strong basis on which to argue your band, unless you feel that there has been some serious negative equity.
For others the most obvious thing is to look at the property price register over the last 3 years and spot properties that are nearby. It would also be helpful to work out how those houses are the similar or different to your own.
In addition, checking out the prices on DAFT for nearby properties will also help you work out a fairer market price.
There seems to be significant scope for people to argue the valuation of the revenue commissioners  but this also means a lot of work for the commissioners to investigate and respond.
Hopefully they will publish an updated set of valuations before next year so that we can see a more realistic picture of property price declines.

Friday, July 13, 2012

Breaking the ice with Twitter

I was at a tech meet up recently and being my first time there, I didn't know anyone.
I'm not great at networking and I was sitting in a pub, not knowing where the group was.
I sent up some tweets with right hash tags, and voila; several people responded almost immediately!
Twitter as an icebreaker for shy people!

Saturday, May 19, 2012

The Facebook IPO

After all the hype Facebook IPO'd yesterday at $38 a share.

Despite the lackluster price increase, this was technically a successful IPO. In fact it might have been one of the best run IPO's in a long time. This sounds like a strange statement and to be honest I had thought "what a wash out!" but after reading an article on cnet this morning it reminded me that the purpose of an IPO is not to immediately raise the wealth of the people that are buying into the company. It is to raise capital for the company.

The company wants to raise as much money as possible by offering its share. Thus it need to offer them at a price that investors think is reasonable. If the price is too low then people will buy the IPO shares and then sell them very shortly afterwards at a profit.

If the price is too high investors will be reluctant to buy at all, which means the company won't raise the capital it needs; and the share price on the day will collapse below its ipo price - quite embarrassing!

Facebook launched at $38 and closed at $38.23, which means that the company foretasted the amount it could raise to about 99% accuracy. Thus this was a success for them.

The next few days and weeks will tell the true story as to whether the IPO price was good for investors, and this could have repercussions for the reputation of Facebook as a vehicle for investment. If the price begins to sag then people will feel duped and may try to sell off their shares to avoid further losses, further depressing the price for the stock. 
The cnet article also mentions that underwriters intervened to keep the stock at $38, however their pockets aren't limitless and as soon as they stop supporting Facebook we could see a rapid decline in price.

My guess (and this is not based on any maths or deep investment analysis) is that the underwriters will stop supporting late next week. The price will tip forward and back between 37 and 39 over the week and then begin a gradual decline to some lower equilibrium price (probably somewhere in the mid 25's) until the first set of quarterly trading statements. At that point we'll start to get a picture of what is really going on.

For me I don't really see the value. 1bn users who don't pay a thing for the base product.
I heard before (source unknown) that they reckoned that each user was worth $6 annually, but that means each user has to generate $6 of revenue each year.
I can't imagine that clicking on one ad generates $6 in one go so they would probably have to click on at least 10 ads. I've never clicked on one ad.
Another source of revenue might be sharing user data to analytics companies but this is fraught with privacy issues. Upsetting your users might drive them away, or certainly reduce the amount of data that they share.

Bottom line: Facebook are going to have to do a lot to justify that $38 a share.