Thursday, November 07, 2019

Encoding the zietgeist for predicting outcome of a cricket match

TLDR: Here's the code.

This happened a few months ago...

World cup cricket mania was gripping the nation, and I was feeling left out because of my own ignorance towards the sport. I decided to fight the battle in my own battlefield and tried to make the game a bit more interesting for myself. I wanted to take part in company pool without putting in a lot of effort to follow the sport, but also not be the person to not have anything to talk about when all that people talk about is cricket.
Given all the buzz surrounding AI/ML and work that we are doing at Versa, I decided to just build a minimum viable solution to help me winning the pool.

The constraints I had for myself were simple: 
  • Don't spend a lot of time. Maximum of 1 day to implement an end to end solution. This ruled out any massive model building using play by play stats, players stats, team dynamics etc. 
  • Encode the zeitgeist and perception of the sports fans and enthusiasts rather than focus on micro indicators. 
Here's how I went about winning the pool (shared it with two other folks).

Home team advantage:

I don't claim to be an expert in any sports. However, I do claim to be an expert listener of sports blabber, and one thing that everyone seem to agree on in home team advantage. To factor this into the model I needed to figure out if the teams are playing in their home country. Our ML algorithm can make the connection between a victory and location of the game easily, so I scraped a random website from google to get the data:

02000-01-02Eden ParkNew ZealandWest IndiesNew ZealandAucklandNew Zealand
372000-01-04Owen Delany ParkNew ZealandWest IndiesNew ZealandTaupoNew Zealand
392000-01-06McLean ParkNew ZealandWest IndiesNew ZealandNapierNew Zealand
672000-01-08Westpac StadiumNew ZealandWest IndiesNew ZealandWellingtonNew Zealand
982000-01-09Brisbane Cricket GroundAustraliaPakistanPakistanBrisbaneAustralia

Encoding gaming form advantage:

Every sports fan I have spoken to, talks about a team or a player under performing or  outperforming because they are in bad/good form.

How do we objectify a "good" form? If team A is a vastly better team than team B, but team A is in its worst form and team B is in their best form, then who has a better chance? Surely there is an upper bound from the benefit of a good form!!

Based on these questions, I formulated few axioms. These are not true "facts", but universally accepted ideas.

Sridhar's AXIOMS of team sports (SATS):

Axiom 1: A previous win against any specific team can improve the winning probability by 𝛼
Axiom 2: A previous loss against reduces the winning probability by some 𝛽, where 𝛼 < 𝛽
This is based on the observation that sports fans always say that a team is in form because they have won quite a few matches in the recent past but are quick to retort the form is broken if the team loses a single game. This points to the fact that a loss is a heavier blow to the psyche than upliftment provided by a win.

Axiom 3: Specific form against the competitor 𝛼' and 𝛽' also is a factor
If team A has always won against team B, then it'll have a higher probability of winning against team B, even if team B has been winning recent string of matches. This effect can be seen in world cup cricket matches between India and Pakistan.

Axiom 4: The contribution of form to the winning probability is capped to some number
Axiom 5: The form is function of winning streak. i.e. a team will have "good form" if it has been winning more matches in the recent past.

From Axiom 5, we know that 𝛼=f(streak). So, let's just define streak and define its growth function. I'm arbitrarily choosing a decay of 0.8 for winning streak

Let's take a look at the function growth for streak:

Here the max streak is 5 that can be achieved by around 30 continuous wins. However I'm going to linearly decrease the streak when a game is lost. Note the arbitrary nature of the decay function and streak increase/decrease. I don't need to get the function exactly right.

The machine learning algorithm will figure out the importance to give to the streak.

After running a gradient boosted tree classifier on the test set, I get an accuracy of 67%. I don't expect a high accuracy on this model because the inputs are highly subjective. It's not great, but acceptable for a few hours of work.

Once all the i's were dotted and t's crossed, was able to run the prediction on the actual matches. Here are few test predictions:
print_update_prediction('Australia','Afghanistan', host = 'England')
Australia, 0.9750151038169861

print_update_prediction('India','Australia', host='India') 
India, 0.6103534698486328

print_update_prediction('India','Australia', host='Australia') 
India, 0.5572055578231812

Thanks to the untimely rains in the world cup, 4-5 games were rained out. Few of those games might have been mispredicted by my model. All in all, I was able to predict 87% of the games correctly.

Check out the code on github. I suspect that this would work for any team sport where the teams don't change. It would work for soccer, field hockey etc, but not for IPL cricket or NFL/NBA where the teams change every year.

Thursday, September 07, 2017

Prophesying APPL stock movement

Disclaimer: Lot of statistics and technical jargon ahead. Jump directly to the summary section for a quick walk-through. I'm neither a statistician or claim to be knowledgeable about the stock market in general. Use the information presented here at your own risk.

At Versa, we have been doing research into crunching network related data and providing actionable insight into network issues and network behavior in general. My wife works with financial models and a discussion on various tools used in econometrics led me into the rabbit hole of statistics and modelling (I had some experience with it back in school, but tech has come a long way in the past decade).

Although, I am fluent in Python, nothing came close rapid prototyping power of R, so had to pick it up along the way. After few datacamp courses, I had been trying to come up with a generalized statistical model that could work across a wide variety of similar data sets. There is a lot of literature that deals with model fitting and there are lot of libraries for number crunching (both in Python & R), however most of them were never designed for the deluge of data that we have now, nor do they deal with multiple seasonalities well (how the data looks per hour, per day, per week, per month, per year). After trying out stationary model fits and regressions, I moved on to time series models; These include a barrage of models, including Holt-Winters, Arima, ets, tbats, etc to fit a time series model. I have yet to try a LSTM to fit the data.

While browsing the interwebs, I stumbled across a generalized time series model fitter by Facebook's data science team: Prophet. It is fairly simple for use by an amateur like me and yields surprisingly good results. I decided to run this on Apple's daily opening price and tried to predict the motion of the stock for the next one year. There was no assumption on my part that this would yield any information on stock price (else statisticians would have been billionaires). What I wanted to figure out was the general direction of the stock's motion.

The prophet library allows the accounting for events affecting the stock price, which are mainly the holidays and the release announcements (and the days leading up to it). The following R code did the heavy lifting:

Here are some of the graphs generated by this model:

This is the graph of log(stock price) per day (This is done to normalize the variance of the graph). As you can see, even though we have a decent fit, the exact forecast is very klugy. We can't rely on the point forecast, but it does predict that after a small dip in early 2018, the stock will continue to grow up. Now let's look at each of the components and their effects:

The first graph shows that the stock will be on rise in 2018, but the curve shows sign of plateauing. The second component shows that the holidays and events have a fairly small effect on the stock price in general. The last graph shows that stock prices have been at their lowest near the end of January/early Feb (over the past 1 decade).

Another analysis we could do is to predict the growth of the stock, and by deduction, it's volatility. We repeat the same exercise for date vs gain in opening price:
Even though, there are few small spikes (which represent the upward movement of the stock), there are no downward spikes. Barring the temporal volatility of the market, AAPL is a solid stock to bank on. Doing the component analysis on the gain, we get:

Now the first graph here doesn't mean that the stock is going down, it points to the decrease in volatility. There seems to be a more pronounced effect of holidays and announcements on the gain observed, and stock dip is seen typically at the end of Jan/early Feb (as earlier graph predicted) on a Monday. September end, and June end/Early July seem like a good time to sell the stocks (on a Thursday).

What does it all mean?
  • It is not a good time to buy APPL stocks right now. They are at an all time high.
  • Jan end (on a Monday) might be a good time to pick up APPL stocks.
  • June end (on a Thursday) might be a good time to sell these stocks.
  • Since the stock is getting stabler, do not expect meteoric returns on investment.
  • The author of this blog is not a statistician or a stock market guru, so ignore the first four points.

Thursday, August 31, 2017

Language, Personality and Google

"Dude, pass me the book" or "Could you please give me that book?". Which of the two phrases do you prefer, and which one would make you awkward? To me, both of them would be awkward if uttered by the wrong person, which would convey the wrong intent. If my co-worker (who I'm not friendly with) uses the first phrase, that would signal a lack of respect. If a friend uses the second one, that would signal aloofness.

However, this analysis of intent is often wrong. Both, my friend & the co-worker might be preoccupied with something else. In all honesty, I don't care about which phrase is used, they might not care either, but there are people who do. Your friend circle is a small group of people who don't misplace your intent, and then later start harboring good feelings towards you. As mentioned in my earlier blog post, difference in societal stratification and economic status plays a big role in perception of malcontent in the language. If a billionaire friend (Bill Gates is not my friend... yet) were to suggest that we get a chartered plane to some remote island, I'd have to decline and be sad about it. If someone from my circle suggested it, we'd start talking numbers and try to come up with a plan to either make it happen, or laugh at the idea as a joke.

Language and the words we utter, the grammar, plays a big role in how we are perceived by other human beings. The rule of the game is to use the language according to the setting and the person/people you want to impress or at the very least, not alienate. There is a reason Steve Jobs was the face of Apple Inc., even though Steve Woz designed and developed the computers. Enthusiasm is contagious, it is a positive life force that gets everyone excited around you. Conveying enthusiasm with words is an art form. Engineers(like me) are typically succinct and to the point, not a great quality to have in a social setup. However, being Spock like logical and terse, is an occupational hazard. This leads to the perception of mal-intent by others who do not communicate in a similar way (which is why most friend circles have people with one dominant occupation)

Enter, Google Mail's Smart Reply. Smart Reply in it's current form and the planned future upgrades ( is going to change the way we (or rather I) communicate.

This is how most of my gmail group threads look like:
Alice> Hey Guys, We were thinking of doing a potluck this Saturday. Let me know if you guys are in.
Bob> Sure.
Carol> +2
Dave> +2, might be a bit late

With Smart Reply, here's what it'd look like:
Alice> Hey Guys, We were thinking of doing a potluck this Saturday. Let me know if you guys are in.
Bob> That sounds awesome. I'm in.
Carol> Sure thing. Do you want me to pick something up on the way.
Dave> Sounds good. We'll be there. Might be a bit late.

If you were the fifth person (Eve), which meetup would you go to? The people are the same, the event is the same, the intent is the same. Google's Smart Reply just made Bob, Carol and Dave sound a lot more affable and enthusiastic. In the absence of body language, written words are more likely to be misconstrued. Google's smart reply takes the intent of Alice/Bob/Dave, which is just "yes", and dresses it up in a way that is palatable to a majority of people. It changes a conversation with neutral tone to a positive one. Pretty soon one would have to work to convey apathy and antipathy in their conversation, and not have it introduced due to lack of time for the device in use (replying to a mail on phone).

One could also argue that as conversations get more positive, the net morale of the human race might improve (or at least amongst the people using Gmail). I don't think I should extrapolate this any further and guess the emergent properties or this cultural change. Small apps and features have changed human interactions in unpredictable ways (Twitter/Facebook have very well toppled/made governments, Uber/Lyft have caused a dip in Car ownerships). Let's hope for the best.

Monday, April 17, 2017

A case for and against God

A recent death in my family has caused me to re-examine my relationship with God. Being an atheist (will get back to this), every condolence worded in clichés like "She's with God", or "She's in a better place", trigger a very emotional and existential whirlwind of a crisis within me. What is the purpose of life if being with God is the end goal? What good are your degrees, patents, accolades, your unnoticed sacrifices, your selfless acts, if you cease to exist? What is the point of writing a book that no body will read in its entirety and then burning the book? Should there be point?

I've been on both side of the isle. I've been a devout Hindu while growing up and been a complete Dawkinian atheist for most of my adult years. However, I find myself trying to go to the temple and think about God in times of joy and sorrow. And when someone asks about my allegiance, I give them a wide range of answers, ranging from the obtuse:"What is God?", to vague:"mostly atheist", to random:"Yes, I'm a pastafarian".

I can't call myself pure atheist, because, well... I like going to temple & praying. I can't call myself an agnostic, because I'm not ambivalent, I do not have any doubts about the existence of God, and I'm not religious as I don't think that reciting certain texts can give me an edge over others. So, what am I? I am all three, and I'm neither of those at the same time. All the three groups approach the concept of God differently.

Credit: Andrea Baldwin (

Life is an uphill journey that starts with birth and ends with death. That is the undisputed, cold fact. Imagine a life, of just climbing up a steep set of endless stairs, with no guard railing on the side. The sole purpose of life is to thrive and propagate, but how does an animal, who, by the freak of nature, questions the futility of the exercise & the purpose of existence, survive? The first thing, most of us, who are afraid of heights, would do is imagine a guard railing. Is the railing really there? No. Does it help me climb the endless stairs towards death? Yes, of course. That guard railing we imagine is God.

This is where the factions come in. A general atheist with a mouth piece would yell that there is no rail. He doesn't care if the person relying on this mental construct is scared of heights. He doesn't care if he could help pull someone up the stairs. If he is not scared of heights, he rallies that no one else should be either. A garden variety religious person, on the other hand, tries to convince a scared person that the railing is real. He rallies hard to convince others that not only the guard is real, but his stainless steel railing is much better than the plastic railings others have. He convinces others that it is perfectly alright to lean on this railing. He convinces others to fight for his cause. Of course, we need stronger railing to stop us from falling into the bottomless abyss, don't we.

Not everyone is bad though. An enlightened atheist might distract you from the deep abyss and show you the wonders that lie ahead of you. You can always create other mental constructs to alleviate your fears. Even most people who identify themselves as religious, fall into this category. Why else would they believe in the power of small sheets of paper with numbers printed on it? (Money.. in case you didn't get that). Similarly, there are a lot of religious people I know, who would just point to the railing to comfort the scared climber. Nothing wrong with that.

An agnostic, would just give an ambivalent answer. He would look straight ahead and tell that he's not sure if the railing exists. He is not sure, either because he is not sure whether the person is referring to an actual railing, or, he just doesn't care about the answer.

So what am I? I'm whatever the person who needs help climbing the stairs of life needs me to be. My own relationship with my God, or any other mental construct, is my own, and varies with the ebbs and flows of life. Defining any relationship with a single word, will take the depth out of it. Love without devotion is nothing. Devotion without conviction is nothing. Conviction without awareness is nothing. We rise from nothingness and we go back into it. Nothing encapsulates everything.

Humans have had the need to box off their uncertainties in their life and attribute it to a higher power, or a foreign country, or a sect, or anyone but themselves. This disassociation helps them achieve extraordinary feats in life, and has definitely helped the human race progress at an accelerating pace. As long as one knows what God is, and use it to imbue positivity in their own life and in the life of others, God is and will remain, the most redeeming feature of the human race.

Wednesday, December 07, 2016

Interoperability of AI assistants..

I don't think that it'd be presumptuous of me to assume that most of you would have used an AI assistant of some sort; be it SIRI on iPhone/MacBooks, Google Now on Android, Cortana on Microsoft Windows or Alexa on Amazon Echo. Even if you used the "speak to type" option on your phone, you have used it.

Most of the AI assistants have limited functionality and do certain things quite well. I don't care if Alexa is as smart as Google Assistant. It makes up for that with an extensive skill/rule set. Google Assistant would shine in situations where the queries are more free form. SIRI/Cortana/Google Assistant/Alexa all depend on the information you are sharing with them. This inherently limits the efficacy of each. Google Express is not going to replace Amazon for my shopping needs, Microsoft Live mail is not going to replace Gmail and I'm not going to exchange my Macbook or Windows desktop for a Chromebook. So what is the solution?!

The browser wars in the 90s and early 2000s have shown us how this is going to go down. Couldn't the industry leaders pool in their resources to come up with something like a W3C standard and provide a standard browser like shim layer (Reactor/Proactor design patterns) to dispatch requests to individual subsystems (AI assistants)? This could be a simple rule engine in the first phase, that punts my email queries to Google Assistant, my shopping inquiries to Alexa, and my file related queries to Siri/Cortana etc.

Taking the browser parlance a bit far, I would want to theme my AI assistants too. I don't want to call them Google/Cortana/Alexa, I want to call them Mr Chekov or Mr Sulu and I want them to refer to me as Captain and always with a sense of impending doom (Star Trek joke for the uninitiated).

A unified AI assistant might just be a pipe dream. Hopefully, it doesn't take the AI assistants a decade to mature (like it did for the browsers).