McCathy’s research project reflected the goal of Artificial Intelligence - but not learning. Estimates vary but in 1950 American mathematician Claude Shannon estimated the combinations of valid chess moves amounted to around 10^120. That’s a huge number of potential moves - but a computer could theoretically be programmed to respond to each move. Does knowing how to respond to every possible chess move in advance constitute intelligence - probably - but does it constitute learning - probably not. Deep Blue from IBM famously beat chess world champion Garry Kasparov in 1997. This supercomputer was really an example of an expert system - it essentially knew all of the potential moves from the outset - it was not really ‘learning’.
Expert Systems had their benefits. ‘Mycin’ for example was an Expert System from the early 1970’s that could outperform humans on the identification of blood disorders. That’s great but, how could we answer questions like; ‘does this picture contain a cat’ or ‘how likely is this customer to churn’ with an ‘expert system’? We simply could not!
The key to answering these types of questions is ‘Learning’. Current chess engines typically ‘learn’ how to play chess - they are not pre-programmed with all potential moves. But how do they learn?
We discussed Machine Learning being a sub-branch of Artificial Intelligence in the first post in this series. For the sake of simplicity, we can group Machine Learning approaches into three groups: Supervised Learning, Unsupervised Learning and Reinforcement Learning.
In this approach, the ‘machine’ learns from a large set of examples of a particular set of data. The learning is ‘supervised’ in that we actively feed in examples of historic data (known as ‘training data’) and retain a portion of the data (known as ‘test data’). We get the ‘machine’ to make a prediction on the test data. In this way we can test the accuracy of the predictions prior to deployment.
Supervised Learning is split into two main categories; Regression and Classification.
Think about a human infant. We show the infant dozens of photos of different cats and point out the cat in each and say ‘cat’. We are essentially labelling the image with the word ‘cat’ in the infant’s mind. If we then show another (previously unseen photo) we can ask the infant ''is this a cat'? The infant should be able to determine the typical features that a cat possesses and, even though it has not seen this particular photo before, establish that a cat is present. The features present match their understanding of the label 'cat'. The infant has 'learnt' what it is to be a cat.
In the cat example above, we identified an instance of a class of object (a cat in this case) - that’s an example of Classification.
Now let’s imagine that we had the particulars of hundreds of houses from an estate agent. These properties have a wide range of parameters - number of bedrooms, number of bathrooms, garage space, garden size, location, etc. If we see enough property particulars with the associated prices, it’s reasonable to assume we could make an educated guess about the price of a previously unseen property based on the parameters.
In the property price prediction above we are predicting a parameter on a continuous scale (property price) - that’s an example of Regression.
A key ingredient in Supervised Learning is a large set of labelled data with which to train the model. There is an ever increasing volume of data that can be used for this purpose. For example, you may have been to a web page that challenged you to prove you are not a robot. You may not be aware that you are assisting with labelling the 4x4 grid of blurry images for some AI scientist to later use in Supervised Learning models.
In the world of Salesforce, regression could be used for determining how likely a customer is to churn (percentage risk). Classification could be used for classification of whether a support case needs to be escalated or not.
As you might be able to guess, in Unsupervised Learning, we don’t ‘train’ the ‘machine’ with large quantities of labelled data (that’s the Supervised bit). In this approach we are essentially looking for patterns that form in data.
Think about your on-line shopping experience. When you are looking at a product, you often get recommendations for other products. These recommendations need not be predefined - they can be established over time by looking at customer buying patterns. For example, customers who bought that also bought this.
The recommendation scenario above is an example of clustering - the key use case for unsupervised learning.
In the world of Salesforce, an example of unsupervised learning is ’smart bucketing’. For example, ‘smart bucketing’ can bucket demographic data into automatically determined age buckets. This differs from Supervised Learning Classification because the class values (age buckets in this example) are not pre-determined - they are derived from the data.
This approach to Machine Leaning differs from Supervised and Unsupervised Learning as we will see later. However, Reinforcement Learning can be viewed as being between Supervised and Unsupervised Learning in that some context is needed (where Unsupervised Learning essentially has none) but nowhere near as much context as Supervised Learning requires.
In Reinforcement Learning there are three main components; an Agent, an Environment and a set of Actions. The Agent can perform Actions from the defined set in the Environment.
Think about a blindfolded friend trying to navigate a complex maze. The maze is the Environment, the blindfolded friend is the Agent and the Actions our friend can take are things like ‘step forward’, ‘turn 90 degrees left’, ’turn 90 degrees right’, etc.
In the above scenario something important is missing - the goal. The goal for our maze example might be to get to the centre (and then out again). Without a defined goal our blindfolded friend could be wandering the maze indefinitely.
There’s another important ingredient - the Reward. The reward in the maze example might be the satisfaction of making it to the centre - but without a reward, it could be tempting for our friend to simply not bother trying. Remember however that our friend is blindfolded. How do they know that any action they take is getting them closer to their goal? Imagine we were standing in the centre of the maze and could see our friend and the entire maze. We are now part of the Environment in which our friend is an Agent. Now imagine that we could issue two simple commands - ‘good’ and ‘bad’. After each Action our friend takes we respond with ‘good’ or ‘bad’ based on whether they are closer or further from the Goal. Really, receiving a ‘good’ from us is our friend’s reward - a series of ‘good’ responses is what’s needed to achieve their Goal.
We have mentioned chess a lot - so let’s stick with that theme. Where Deep Blue was essentially an Expert System (knowledgeable but not really intelligent as such), there have been more recent chess playing solutions. Google’s AlphaZero was able to ‘learn’ to play chess without any explicit human input apart from the rules of the game and without having ‘seen’ any previous games (historic data). AlphaZero essentially perfected chess to superhuman levels in just 9 hours - totally on its own. How did it do this? You guessed it - with Reinforcement Learning. AlphaZero essentially played against itself - the better version of its play (the one that maximised its reward) progressed and played another game against itself. This process repeated. In total 44 million games were played with the strongest version from each game progressing in a kind of evolutionary survival of the fittest.