Friday, April 14, 2023

The deceptive game: Inbetweenies

In our teens and early twenties, we are often exposed to card games, and so it was that I discovered inbetweenies (Also known as Acey Deucey, Yablon, and Red Dog). It's a deceptively simple card game that you will almost certainly lose your shirt on! Having watched, won, and lost, I am fascinated by the probabilities, which should be relatively easy to calculate.

The Premise
The game is best played by 5 or 6 players as it becomes difficult to track the cards, and the pot can grow in leaps and bounds. Face cards are valued at 10 and an Ace is usually considered low.
All players contribute a cash amount to a pot. A dealer offers 2 cards face up to a player. Based on the size of the difference, the player proceeds, and bets some amount up to the value of the pot, at which point the dealer deals a third card face-up. If the card's value is between the first two cards, you win the amount that you bet. If the card is outside of those first 2 cards, you lose and have to put that amount into the pot. The real fun starts if you "hit the post". This happens when the third card has the same value as one of your first 2 cards; you have to pay double the amount that you bet. This means that a particularly ballsy player might bet the pot and have to pay in double. I personally saw pots leap to €400 from a modest start of €5 early on. It also means that if you place a bet, you'd better be able to cover the worst outcome!
I've created a simple table below to explain the outcomes 


The Probabilities
As the dealer returns used cards to the bottom of the deck and periodically re-shuffles them, it's possible to calculate some very simple odds in your head at each play. If you were gifted in card counting you could probably do even better, but almost every time I played this game beer was involved so that advantage would probably be lost pretty quickly.

I am going to approach this in two ways: through logical reasoning and then through experimentation using Python.

Logical approach
Knowing that there are 52 cards in a deck we can calculate the basic probability of any hand by looking at how many cards fall between the lowest and highest cards. For example, if the dealer gives a player a 2 and a 10, we know that there are 7 cards in the winning "zone" (3,4,5,6,7,8,9). And we know that there are 4 suits of those cards so 7 * 4 = 28 ways (out of 50 - 2 cards are already dealt) that you can win. The corollary of this is that there are 22 (50 - 28) ways that you can lose. Hitting the post is a different way to lose though, so let's separate that out; 2 cards (low and high) times 3 suits = 6 ways to lose big.
We subtract that from our total ways to lose to get the less painful ways to lose (28-8): 20 out of 50.

Win: 28/50 = 56%
Post: 6/50 = 12%
Lose: 16/50 = 32%


One variation of the game allow the Jack, Queen, and King cards to have their normal values (11,12,13). The same logic applies, and for simplicities sake I've visualized these using a heat map.


Experimentation approach
I used python to simulate the game and visualize the outcomes. The great benefit of the simulation is that I can run a large number of games to test the outcomes. 
After running 1 million games I was able to confirm that the chances of hitting the post is 12% for every hand.

I can also introduce player behaviour. In this case, I can make an assumption that players won't play every hand and will set themselves some threshold, i.e. I'll only play if there is a difference of more than X between the lowest and highest value cards dealt. I was surprised to find that the introduction of this behavior resulted in no change in the outcomes.


What can you take from this?
The major takeaway for players is to favour hands with large differences. This should be obvious but knowing the probabilities allows you to select hands which at least a difference of 7 between them will give you better than 50/50.
Knowing that the odds of hitting the post are 12% might encourage you to think about how much you are willing to bet. One downfall of the game is that the payoffs for different spreads are the same. An innovative variation would be to offer better pay-offs on lower probability spreads. 


Python Code used for research
import pandas as pd
import numpy as np
import random
from copy import deepcopy


print('Rejecting 6 and up')
#Create deck
cardList=[['Ace',1,13,'hearts'],
['2',2,2,'hearts'],
['3',3,3,'hearts'],
['4',4,4,'hearts'],
['5',5,5,'hearts'],
['6',6,6,'hearts'],
['7',7,7,'hearts'],
['8',8,8,'hearts'],
['9',9,9,'hearts'],
['10',10,10,'hearts'],
['Jack',11,11,'hearts'],
['Queen',12,12,'hearts'],
['King',13,13,'hearts'],
['Ace',1,13,'diamonds'],
['2',2,2,'diamonds'],
['3',3,3,'diamonds'],
['4',4,4,'diamonds'],
['5',5,5,'diamonds'],
['6',6,6,'diamonds'],
['7',7,7,'diamonds'],
['8',8,8,'diamonds'],
['9',9,9,'diamonds'],
['10',10,10,'diamonds'],
['Jack',11,11,'diamonds'],
['Queen',12,12,'diamonds'],
['King',13,13,'diamonds'],
['Ace',1,13,'spades'],
['2',2,2,'spades'],
['3',3,3,'spades'],
['4',4,4,'spades'],
['5',5,5,'spades'],
['6',6,6,'spades'],
['7',7,7,'spades'],
['8',8,8,'spades'],
['9',9,9,'spades'],
['10',10,10,'spades'],
['Jack',11,11,'spades'],
['Queen',12,12,'spades'],
['King',13,13,'spades'],
['Ace',1,13,'clubs'],
['2',2,2,'clubs'],
['3',3,3,'clubs'],
['4',4,4,'clubs'],
['5',5,5,'clubs'],
['6',6,6,'clubs'],
['7',7,7,'clubs'],
['8',8,8,'clubs'],
['9',9,9,'clubs'],
['10',10,10,'clubs'],
['Jack',11,11,'clubs'],
['Queen',12,12,'clubs'],
['King',13,13,'clubs']]




#Create column names for data to track
cols = ['observation','lowCardFace','lowCardSuit','lowCardVal', 'highCardFace','highCardSuit', 'highCardVal', 'testCardFace', 'testCardSuit','testCardVal']



##################################################################################
#Functions

def shuffleDeck():
    #Shuffles the deck in place
    tmpDeck = cardList
    random.shuffle(tmpDeck)
    return tmpDeck
    
#Returns 3 cards for a player
def getCards():
    handEvent=[]
    draw1 = shuffledDeck.pop(0)
    draw2 = shuffledDeck.pop(0)
    draw3 = shuffledDeck.pop(0)
    if draw2[2] > draw1[1]:
        handEvent.append([draw1, draw2, draw3])
    else:
        handEvent.append([draw2, draw1, draw3])
    return handEvent
    
    
##################################################################################
#Evaluate a hand
def textValue(lowVal, highVal, dealtVal):
    if dealtVal > lowVal and dealtVal < highVal:
        tmp='win'
    elif dealtVal < lowVal or dealtVal > highVal:
        tmp='lose'
    elif dealtVal == lowVal or dealtVal == highVal:
        tmp = 'post'
    return tmp
        
def diffVal(lowVal, highVal):
    tmp = highVal - lowVal
    return tmp
    
    
##################################################################################
#Run the script by creating an initial deck
shuffledDeck = deepcopy(cardList)
random.shuffle(shuffledDeck)

#I used deepcopy when i was creating a newly shuffled deck. This was needed when i had offered all of the cards in the current deck to the players and had none left.



testRun=[]

for i in range(0,1000000):
    if len(shuffledDeck) < 3:
        shuffledDeck = deepcopy(cardList)
        random.shuffle(shuffledDeck)
    handEvent=[]
    draw1 = shuffledDeck.pop(0)
    draw2 = shuffledDeck.pop(0)
    if abs(draw2[1] - draw1[1])>1:
        draw3 = shuffledDeck.pop(0)
        if draw2[1] > draw1[1]:
            handEvent.append([draw1, draw2, draw3])
        else:
            handEvent.append([draw2, draw1, draw3])
        cardOut = handEvent
        testRun.append([i, cardOut[0][0][0], cardOut[0][0][3], cardOut[0][0][1],cardOut[0][1][0], cardOut[0][1][3], cardOut[0][1][1], cardOut[0][2][0], cardOut[0][2][3], cardOut[0][2][1] ])

playDF = pd.DataFrame(testRun,columns=cols)

#Calculate the outcome and add a column to the dataframe to record that
playDF['outcome'] = playDF.apply( lambda row: textValue(row['lowCardVal'], row['highCardVal'], row['testCardVal']),axis=1)
playDF['diff'] = playDF.apply( lambda row: diffVal(row['lowCardVal'], row['highCardVal']),axis=1)


table = pd.pivot_table(playDF, values='observation', index=['diff'], columns=['outcome'], aggfunc='count', fill_value=0)
print(table)

table = pd.pivot_table(playDF, values='observation', index=['diff'], columns=['outcome'], aggfunc='count', fill_value=0, margins=True, margins_name='Total').pipe(lambda d: d.div(d['Total'], axis='index')).applymap('{:.000%}'.format)
print(table)


playDF.to_csv('C:\\temp\\inbetweenies_output_reject_01.csv', index=False)
print('Execution complete')
    
#Plot charts
flippedCount = playDF.groupby(['diff','outcome'])['observation'].count().unstack('outcome').fillna(0)
flippedCount[['lose', 'post', 'win']].plot(kind='bar', stacked=True)




Tuesday, January 27, 2015

New Mortgage rules and the salary multiple cap

According to Rte the central bank rules for new buyers would be 10% on the first 220k then 20% thereafter so on a 300k house (not unrealistic in Dublin) you'd need a deposit of 38k which equates to 12.6%.
However there was a mention of a borrowing cap of 3.5 times salary so if you want to but that 300k house you will need a combined income of nearly 75k.
Hopefully house prices will reduce but even with the relaxed deposit rule for first time buyers the income multiple threshold may cut them out of the market.

Friday, October 17, 2014

Analyzing a cross section of the Irish secondhand car carket - Part 1

I've been planning to learn a bit more about R and I'm also in the market for a new car so what better time to play around with the data and look at whats available in the Irish car market.

Getting the data
I used Python and the Beautiful Soup package to gather some high level details about the cars available in Ireland. The details that the script pulled was limited to:

  • Make
  • Model
  • Engine
  • Seller Type
  • County
  • Mileage
  • Year
  • Price
Selecting the data
I reduced the list to cars 20 years old and newer. This produced a list of 12990 cars on offer.
The above is produced by in R with: hist(irl_cars$Year)


I focused on a range of Makes and models which excluded high end vehicles. I acknowledge that this was partly to reduce the skewness of the distribution but also on the basis that this analysis focused on cars one might see regularly.

  • BMW
  • Fiat
  • Ford
  • Honda
  • Hyundai
  • Mercedes-Benz
  • Nissan
  • Opel
  • Peugeot
  • Renault
  • Seat
  • Skoda
  • Toyota
  • Volkswagen
  • Volvo


lablist <- as.vector((unique(modCars$Maker))
counts <- table(modCars$Maker)
plot(counts, xaxt="n", main="Qty of Cars by Make")
text(1:15, par("usr")[3], labels=lablist, srt=90, pos=2, xpd=TRUE)

Mileage
Looking at the mileage it looks like there are a number of cars above 500,000 miles
While this is not impossible, it seems unlikely that cars are doing so many miles. So I plotted out the number of miles by year for any cars above 100k:

plot(limCar$Year, limCar$Mileage, main="Mileage plot by Year", xlab="Year", ylab="Miles")
abline(h=400000)
There is a very clear separation (which I have highlighted with a black line) where the bulk of observations lie below the 400,000 mile mark. There is still a great deal of dispersion above this line with some values looking high but not necessarily unreasonable. At this point so common sense might help. The maximum mileage is 2,500,002 miles for an 11 year old car. This implies that car did 622 miles a day every day for 11 years; 60 miles an hour 10 hours a day? Seems very unlikely, so its probably safe enough to discard this as a typo.
What about the ones of 1,000,000 miles and over? The earliest one of those is 13 years old so applying the same logic this would imply 210 miles a day every day for 13 years which equally seems unlikely. 

If the above seems too much like intuition we can apply something a little more reasoned, and I can try out 2 different methods of outlier detection to boot;

  1. Mean and standard deviation
  2. Median and median absolute deviation
Method 1 is more common, but method 2 is better when there are large outliers.
In both cases I am using a k-factor of 3as the threshold to detect outliers
1) Mean(modCars$Mileage) = 75841.97; sd(modCars$Mileage) = 53810.45
75841.97 + (3*53810.45) = 236,913.3 Miles
2) Median(modCars$Mileage) = 74936; mad((modCars$Mileage)=43471.31
74936 + (3*43471.31) = 205,349.90 miles


Method 1 exposes 32 cars and method 2 exposes 78 cars; my "intuition" exposed 13 cars though it was based on a simple visual inspection. I'll create 3 data sets for the follow up posts.




Sunday, August 3, 2014

The Birthday paradox at a wedding

I know I've been very quiet of late but I started a new job 4 months ago and have been pouring myself into it to learn as much as I can. More importantly I've been preparing for my wedding to the lovely Lelly Ann, though to be honest she has been doing most of the tough work so she deserves all the glory!.

The big day is only 5 days away now. I had great hopes of writing a cool optimizer that would seat guests based on their affinity but unfortunately laziness and an over-estimation of my programming skills got in the way. However I have been thinking about the birthday paradox recently and since there will be over 100 people in the room next Friday I thought it was a nice anecdote.

The Birthday paradox arises from the chances of two or more people in a group having the same birthday. Given that there are 365 days (ignoring leap years for simplicitys sake) in a year you would think that the chances that any 2 people might have the same birthday would be extremely low;
however this is not the case.
Wikipedia has a great page on it so I won't reproduce their excellent explanation but it turns out that at 23 people the odds tip over 50%, which is better than a coin toss. In terms of our day we are due to have 106 guests.
The formula is:
1 - (Permutation(365,n)/(365^n)) Where n is the number of people involved.

so 106 guests works out at 99.99999574936430000% or as close to 100% as makes no difference.
We also have tables of 8,10 & 11 which work out respectively at 7.43%, 9.46% & 14.11% respectively. I wonder would it be worth sampling each table to see how many times this actually comes through. with 10 tables we should probably see this at least once!

Wednesday, December 4, 2013

Understanding Monty Hall, and learning Python

Finally back with a post after a long hiatus!

I've finished the Masters that has been sucking the hours out of my life for the last 2 years and suddenly realized that I needed to add some better programming skills to my CV. I started doing the excellent R and interactive Python courses on Coursera. They give a good introduction and provide challenging assignments to help students build their knowledge.

I decided to code up some of the algorithms and concepts I've learned over the last 2 years to  improve my understanding of them and to develop my skills at coding. The first concept I decided to tackle was that of variable change, best captured in the Monty Hall problem.

The Monty Hall problem is a great example of where intuition is at odds with statistics. The player has to guess which of 3 doors of a prize is behind. Upon selection the game show host removes or opens a door that you haven't selected. He then offers you the choice to stick with the door you chose originally or switch to the remaining door.

The most common response is that it shouldn't make any difference; You chose 1 door out of 3 (33% chance) and now there are only 2 doors left so the odds are now 50/50. You may even feel that your odds have improved. So sticking with your initial choice should be as good a choice as switching to the other door.
This is actually wrong! It took me a long time to understand it and the reason is that we get very tied up thinking about winning. Instead lets think about losing. When the game started you had a 2 in 3 chance of picking the wrong door, which means you probably did. Since the other losing door has now been removed it makes the most sense to switch away from the initial (and likely wrong) door you chose in the first place.

Statistical evidence shows that contestants that switch from their initial choice win 2 out of every 3 times. This is what I wanted to code up and demonstrate with python. The table below demonstrates the results I collected over several runs with the python code I developed.


I've included the code snipped below which you can paste into Python or save as a .py and execute.
Appreciate any feedback either if the post was helpful to you, or if you have any improvements to suggest.

<CODE>
import random
import math

def setPrize():


    tmp=[0,0,0]
    door=random.randrange(0,3)

    tmp[door] = 1
    tmp[((door+1) % 3)]=0
    tmp[((door+2) % 3)]=0
       
    return tmp
prize=[0,0,0]
finalDoorz = [0,0]
n=100
   


switchnwin=0.000000
switchnlose=0.000000
sticknwin=0.000000
sticknlose=0.000000


prize = setPrize()

for i in range(0,n):
    prize = setPrize()
    #Player chooses their first door
    choice = random.randint(0,2)
   

    #We put the players choice in position 0
    if prize[choice]==1:
       finalDoorz[0]=1  
       finalDoorz[1]=0        
    elif prize[choice]==0:
       finalDoorz[0]=0  
       finalDoorz[1]=1


    #Second part; stick or switch; 0 stays with the users choice, 1 switches it.
    choiceb= random.randint(0,1)

    if finalDoorz[choiceb]==1:
        status="WIN"
    else:
        status="LOSE"

    if choiceb==0:
        switched="NO"
    else:
        switched="YES"

    if finalDoorz[choiceb]==1 and choiceb==0:
        sticknwin +=1
    elif finalDoorz[choiceb]==1 and choiceb==1:
        switchnwin +=1
    elif finalDoorz[choiceb]==0 and choiceb==1:
        switchnlose +=1
    elif finalDoorz[choiceb]==0 and choiceb==0:
        sticknlose +=1

print "Number of Observations: " + str(n)
print "Switch and Win: " + str(switchnwin/n)
print "Stick and Win: " + str(sticknwin/n)
print "Switch and Lose: " + str(switchnlose/n)
print "Stick and lose: " + str(sticknlose/n)
</CODE>


Tuesday, March 12, 2013

Revenues Property Evaulations

Its been a while since I've posted anything as life has been very quiet because of college, However I just had to get up on my high horse about Revenues new Local Property Tax tool and property evaulations

The Tool
First of all I want to divorce the tool from the content and say that it is great to see them using maps and visualization to get their message across. This sort of thing has been common in the mash up community for some time; overlaying statistics or information on a searchable map. I would question their choice of colours as it makes it very difficult to easily associate a particular hue with its corresponding band, but there is an option to click on a zone and get the relevant information.

The Content
I really am jumping on the bandwagon with the criticism of the valuation process but when I look at the area I grew up in it seems to have a blanket band of €350,000, which is actually Boom prices, not the reality we have 5 years on.
With resources like the property price register (which is a publicly available legal record of the price a house is transacted at over the last 3 years) I'm surprised that they didn't use this information to get a better sense of likely values. In addition there is a wide degree of variation within the dwelling types.


From a quick check on the PPR I see prices of closer to the €250,000 mark being transacted in the recent past. I also notice that the bound seems to defined by the voting district, which to my mind doesn't adequately describe or cluster similar properties together. I suspect if this was done we would wind up with many more, smaller zones.

What could be done to improve the valuations.
Given that we know the closing prices on recently transacted properties and there should be some record somewhere (land commission, planning permission, county council) of each property. This could be cross referenced with some key variables of property type like number of rooms, stories, Square footage to construct a more granular model of the accommodation stock. A clustering algorithm could break down areas into smaller more similar subgroups, which would make valuations more relevant.

What homeowners can do to argue their case
If your house was transacted in the last 2-3 years then you have a very strong basis on which to argue your band, unless you feel that there has been some serious negative equity.
For others the most obvious thing is to look at the property price register over the last 3 years and spot properties that are nearby. It would also be helpful to work out how those houses are the similar or different to your own.
In addition, checking out the prices on DAFT for nearby properties will also help you work out a fairer market price.
There seems to be significant scope for people to argue the valuation of the revenue commissioners  but this also means a lot of work for the commissioners to investigate and respond.
Hopefully they will publish an updated set of valuations before next year so that we can see a more realistic picture of property price declines.


Friday, July 13, 2012

Breaking the ice with Twitter

I was at a tech meet up recently and being my first time there, I didn't know anyone.
I'm not great at networking and I was sitting in a pub, not knowing where the group was.
I sent up some tweets with right hash tags, and voila; several people responded almost immediately!
Twitter as an icebreaker for shy people!