In this post on Artificial Neural Network (ANN) Model using Scikit-Learn, I will demonstrate how to use the scikit-learn library of python to build an Artificial Neural Network. Before proceeding further, let us first discuss what is an Artificial Neural Network.
Artificial Neural Network (ANN)
Basically, an Artificial Neural Network (ANN) has comprises of an input layer of neurons, an output layer and one or more hidden layers in between. Also, a fully connected ANN is known as Multi-layer Perceptron.
What is a Multi-layer Perceptron?
While a perceptron has an input layer and an output layer of neurons, the Multi-layer Perceptron also includes hidden layers along with input and output layers. Basically, a Multi-layer Perceptron works as follows.
Firstly, it computes a dot product of the input and the weight present at the input layer neuron. After that, it sends it to the hidden layer neuron. The input to a hidden layer goes through an activation function and the resulting output of the activation function is used in computing the dot product with the weight present at that neuron.
Consequently, the resulting dot product at hidden layer acts the input of the next layer neurons and the same process repeats until the output layer reaches.
Finally, the neurons at the output layer carry out computations using the activation function present there. Further, the computed results are used in either backpropagation for the purpose of training or they are used for making predictions in case of the model is already trained.
Creating an Artificial Neural Network (ANN) Model using Scikit-Learn
In fact, the scikit-learn library of python comprises a classifier known as the MLPClassifier that we can use to build a Multi-layer Perceptron model. Additionally, the MLPClassifier works using a backpropagation algorithm for training the network. The following code shows the complete syntax of the MLPClassifier function.
MLPClassifier(hidden_layer_sizes, activation, solver, alpha, batch_size, learning_rate, learning_rate_init, power_t, max_iter, shuffle, random_state, tol, verbose, warm_start, momentum, nesterovs_momentum, early_stopping, validation_fraction, beta_1, beta_2, epsilon, n_iter_no_change, max_fun)
In short, we describe the parameters below.
Basically, hidden_layer_sizes represents the number of neurons in a hidden layer. Moreover, you can specify number of hidden layers using a comma separated list of their respective size. For example, the following MLP Classifier has four hidden layers with given sizes.
MLPClassifier(hidden_layer_sizes=(12, 13, 10, 8), ......)
Similarly, we can specify the activation function which a hidden layer uses with the help of activation parameter. For the purpose of weight optimization we use a solver. Basically, it is an optimization algorithm that updates the learning rate. The default is adam solver. In order to prevent overfitting, we use the parameter alpha which is used for regularization.
Parameters Related to Learning
The next parameter, batch_size refers to the size of particular mini batches. Likewise, learning_rate parameter indicates whether the learning rate is constant, adaptive, or invscaling (inverse scaling learning rate). Further, the learning_rate_init refers to the initial learning rate.
In case, you use the inverse scaling learning rate by providing the value invscaling to the learning_rate parameter, use power_t also. Here you can specify the exponent for it using the parameter power_t. Further, you can specify maximum number of iterations using max_iter parameter. Also, the boolean parameter shuffle specifies whether the samples should be shuffled in each iteration.
Parameters for Stopping the Learning Process
Likewise, for weights and bias initialization, random numbers can be generated by using the random_state parameter. Significantly, the MLP Classifier has a parameter known as tol for specifying tolerance for the optimization. In other words, when the loss is not improved by the value specified in tol parameter after successive iterations, the network stops further learning and the training is finished. The parameter n_iter_no_change indicates the number of iterations till the value in tol is not achieved.
The boolean parameter verbose is used to display the progress messages, While the parameter warm_start makes use of values previously used. Similarly, momentum refers to the momentum for gradient descent updates. Additionally, nesteroves_momentum indicates the use of Nesterov Momentum.
Sometimes it happens that the validation score doesn’t improve, then we can terminate the learning by setting the boolean parameter early_stopping to true. Meanwhile, we can put the fraction of training data for the purpose of validation by specifying the validation_fraction parameter.
The parameters beta_1, and beta_2 refer to the exponential decay rate for estimates for first moment vector and the second moment vector in adam respectively. Further, we can specify the value for numerical stability using the parameter epsilon that we can use only with the adam solver.
Finally, we can specify maximum number of calls of the loss function using the parameter max_fun that we use only with lbfgs solver.