An activation function is used to introduce non-linearity in an artificial neural network. It allows us to model a class label or score that varies non-linearly with independent variables. Non-linearity means that the output cannot be replicated from a linear combination of inputs; this allows the model to learn complex mappings from the available data, and thus the network becomes a universal approximator. On the other hand, a model which uses a linear function (i.e. no activation function) is unable to make sense of complicated data, such as, speech, videos, etc. and is effective only up to a single layer.
To allow backpropagation through the network, the selected activation function should be differentiable. This property is required to compute …