Dimbo's Thoughts: 2012

Friday, July 13, 2012

Breaking the ice with Twitter

I was at a tech meet up recently and being my first time there, I didn't know anyone.
I'm not great at networking and I was sitting in a pub, not knowing where the group was.
I sent up some tweets with right hash tags, and voila; several people responded almost immediately!
Twitter as an icebreaker for shy people!

Saturday, May 19, 2012

The Facebook IPO

After all the hype Facebook IPO'd yesterday at $38 a share.

Despite the lackluster price increase, this was technically a successful IPO. In fact it might have been one of the best run IPO's in a long time. This sounds like a strange statement and to be honest I had thought "what a wash out!" but after reading an article on cnet this morning it reminded me that the purpose of an IPO is not to immediately raise the wealth of the people that are buying into the company. It is to raise capital for the company.

The company wants to raise as much money as possible by offering its share. Thus it need to offer them at a price that investors think is reasonable. If the price is too low then people will buy the IPO shares and then sell them very shortly afterwards at a profit.

If the price is too high investors will be reluctant to buy at all, which means the company won't raise the capital it needs; and the share price on the day will collapse below its ipo price - quite embarrassing!

Facebook launched at $38 and closed at $38.23, which means that the company foretasted the amount it could raise to about 99% accuracy. Thus this was a success for them.

The next few days and weeks will tell the true story as to whether the IPO price was good for investors, and this could have repercussions for the reputation of Facebook as a vehicle for investment. If the price begins to sag then people will feel duped and may try to sell off their shares to avoid further losses, further depressing the price for the stock.

The cnet article also mentions that underwriters intervened to keep the stock at $38, however their pockets aren't limitless and as soon as they stop supporting Facebook we could see a rapid decline in price.

My guess (and this is not based on any maths or deep investment analysis) is that the underwriters will stop supporting late next week. The price will tip forward and back between 37 and 39 over the week and then begin a gradual decline to some lower equilibrium price (probably somewhere in the mid 25's) until the first set of quarterly trading statements. At that point we'll start to get a picture of what is really going on.

For me I don't really see the value. 1bn users who don't pay a thing for the base product.

I heard before (source unknown) that they reckoned that each user was worth $6 annually, but that means each user has to generate $6 of revenue each year.

I can't imagine that clicking on one ad generates $6 in one go so they would probably have to click on at least 10 ads. I've never clicked on one ad.

Another source of revenue might be sharing user data to analytics companies but this is fraught with privacy issues. Upsetting your users might drive them away, or certainly reduce the amount of data that they share.

Bottom line: Facebook are going to have to do a lot to justify that $38 a share.

Saturday, May 12, 2012

Exams - That Electric Feeling!

Well I'm back! My exams are over so hopefully I'll be doing more posting!!

One odd thing I have noticed. In the two weeks running up to the exams I was getting regular static shocks; When I stood up, when I got into or out of the car, when I kissed the LAK (to her annoyance!) But all of a sudden they have just disappeared!

Anyone any ideas?

Saturday, March 31, 2012

Colour Blindness strikes again!!

So part of data mining is clustering, and one of the ways to visualize clusters in the application (Rapid Miner) is through color coding as in the below:

To me, Clusters 0, 4 and 6 appear the same. On closer inspection cluster_0 is different but I have to zoom in.
I really hope that the application has some way of showing these as symbols!

Sunday, February 19, 2012

Welfords One Pass Agorithm

Howdy!
I've continued to be quiet recently as I am back at college so a lot of my time is taken trying to understand the material.

This semester I am taking Simulation and Data Mining. The maths is a lot heavier that last semester. Most of it is covered in the form of proofs in the Simulation course. In addition I am re-learning Java programming, pretty much from scratch so when as I understand the maths bits I then try to code it up in Java or implement it in an assignment solution.

We learned about Welford's One Pass Algorithm during the week and it turns out that it should be implemented in an assignment.The algorithm is an efficient way of calculating mean, variance and standard deviation for a sample on the fly.Normally if we were working on paper or in excel we would store the data first and then run back through it, calculating the individual parts of the equation first and then adding those up to get the final answers. This algorithm allows us to compute mean and variance as we iterate through a loop.
The real value from a computer resource point of view is that we don't have the memory overhead of storing the individual bits of the formula first for calculation after.

I wasted a lot of time searching the net yesterday in a hungover state trying to find a way to code it up, as I thought that one of the variables was dependent on the the other two and so would not compute properly. Hopefully this blog entry will help anyone searching for it in plain English.

Standard equations
The fomula for the mean is:

The sum of the individual observations divided by n, the number of observations.

The formula for the variance is:

Standard deviation is simply the square root of s above.

As you can imagine in the above you would have to store values first, then send them through a loop to store the individual differences, and then finally do aggregated calculation. Very costly in terms of memory and code.

Welfords One Pass Algorithm
Welfords algorithm gets around this by calculating the minor changes on the fly. This removes the need to store data and simply looks at the delta's.

The algorithm is below; I have used a general notation which you can apply to whatever language you need.
int n = 0;
double xbar = 0.0; //mean
double var = 0.0; //Variance
double x = 0.0; //Next Observation
double d = 0.0; // Difference

For every observation:
{
x = getNextData();
n++;
d = x - xbar;
v = v + (d^2);
xbar = xbar + d/n;
}

v = v/n ***

*** I made this correction after trying to run this algorithm a second time on a different machine.

One problem I had was that if you are working through an array, the first array index is 0, which caused the entire equation to return NaN; but when I added 1 to it to change the indexing it seemed to work.

You can also calculate the covariance quite easily as well
Covariance is simply (the sum of (x*y) - (the sum of x times the sum of y) / n)/n-1
You can collect x*y as you iterate through the loop
sum of X times sum y can be calculated after the fact!