Lets begin by first understanding how our brain processes information:
In our brain, there are billions of cells called neurons, which processes information in the form of electric signals. External information/stimuli is received by the dendrites of the neuron, proccessed in the neuron cell body, converted to an output and passed through the Axon to the next neuron. The next neuron can choose to either accept it or reject it depending on the strength of the signal.
Now, lets try to understand how a ANN works:
Here, w1, w2, w3 gives the strength of the input signals
As you can see from the above, an ANN is a very simplistic representation of a how a brain neuron works.
To make things clearer, lets understand ANN using a simple example: A bank wants to assess whether to approve a loan application to a customer, so, it wants to predict whether a customer is likely to default on the loan. It has data like below:
So, we have to predict Column X. A prediction closer to 1 indicates that the customer has more chances to default.
Lets try to create an Artificial Neural Network architecture loosely based on the structure of a neuron using this example:
In general, a simple ANN architecture for the above example could be:
Key Points related to the architecture:
1. The network architecture has an input layer, hidden layer (there can be more than 1) and the output layer. It is also called MLP (Multi Layer Perceptron) because of the multiple layers.
2. The hidden layer can be seen as a “distillation layer” that distills some of the important patterns from the inputs and passes it onto the next layer to see. It makes the network faster and efficient by identifying only the important information from the inputs leaving out the redundant information
3. The activation function serves two notable purposes:
In the above example, the activation function used is sigmoid:
O1 = 1 / 1+e^(-F)
Where F = W1*X1 + W2*X2 + W3*X3
Sigmoid activation function creates an output with values between 0 and 1. There can be other activation functions like Tanh, softmax and RELU.
4. Similarly, the hidden layer leads to the final prediction at the output layer:
O3 = 1 / 1+e^(-F 1)
Where F 1= W7*H1 + W8*H2
Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75) indicates that there is a higher indication of customer defaulting.
5. The weights W are the importance associated with the inputs. If W1 is 0.56 and W2 is 0.92, then there is higher importance attached to X2: Debt Ratio than X1: Age, in predicting H1.
6. The above network architecture is called “feed-forward network”, as you can see that input signals are flowing in only one direction (from inputs to outputs). We can also create “feedback networks where signals flow in both directions.
7. A good model with high accuracy gives predictions that are very close to the actual values. So, in the table above, Column X values should be very close to Column W values. The error in prediction is the difference between column W and column X:
8. The key to get a good model with accurate predictions is to find “optimal values of W – weights” that minimizes the prediction error. This is called “Back propagation algorithm” and this makes ANN a learning algorithm because by learning from the errors, the model is improved.
9. The most common method of back-propagation is called “gradient descent”, where, iteratively different values of W are used and prediction errors assessed. So, to get the optimal W, the values of W are changed in small amounts and the impact on prediction errors assessed. Finally, those values of W are chosen as optimal, where with further changes in W, errors are not reducing further. To get a more detailed understanding of gradient descent, please refer to:
Key advantages of neural Networks:
ANNs have some key advantages that make them most suitable for certain problems and situations:
ANNs have the ability to learn and model non-linear and complex relationships, which is really important because in real-life, many of the relationships between inputs and outputs are non-linear as well as complex.
ANNs can generalize - After learning from the initial inputs and their relationships, it can infer unseen relationships on unseen data as well, thus making the model generalize and predict on unseen data.
Unlike many other prediction techniques, ANN does not impose any restrictions on the input variables (like how they should be distributed). Additionally, many studies have shown that ANNs can better model heteroskedasticity i.e. data with high volatility and non-constant variance, given its ability to learn hidden relationships in the data without imposing any fixed relationships in the data. This is something very useful in financial time series forecasting (e.g. stock prices) where data volatility is very high.
A few applications:
Image Processing and Character recognition: Given ANNs ability to take in a lot of inputs, process them to infer hidden as well as complex, non-linear relationships, ANNs are playing a big role in image and character recognition. Character recognition like handwriting has lot of applications in fraud detection (e.g. bank fraud) and even national security assessments. Image recognition is an ever-growing field with widespread applications from facial recognition in social media, cancer detention in medicine to satellite imagery processing for agricultural and defence usage. The research on ANN now has paved the way for deep neural networks that forms the basis of “deep learning” and which has now opened up all the exciting and transformational innovations in computer vision, speech recognition, natural language processing – famous examples being self-driving cars.
Forecasting: Forecasting is required extensively in everyday business decisions (e.g. sales, financial allocation between products, capacity utilization), in economic and monetary policy, in finance and stock market. More often, forecasting problems are complex, for example, predicting stock prices is a complex problem with a lot of underlying factors (some known, some unseen). Traditional forecasting models throw up limitations in terms of taking into account these complex, non-linear relationships. ANNs, applied in the right way, can provide robust alternative, given its ability to model and extract unseen features and relationships. Also, unlike these traditional models, ANN doesn’t impose any restriction on input and residual distributions. More research is going on in the field, for example - recent advances in the usage of LSTM and Recurrent Neural Networks for forecasting.
ANNs are powerful models that have a wide range of applications. Above, I have listed a few prominent ones, but they have far-reaching applications across many different fields in medicine, security, banking/finance as well as government, agriculture and defence.