This article aims at giving an overview of what a neural network is in the context of computing. Indeed, as a computer obviously contains no neurons, the goal is to demystify this concept. A lot of artificial intelligence (AI) and Machine Learning (ML) concepts are largely inspired by biology there will first have a quick introduction to what is a neuron and how does it works. Then we will define what perceptron neurons and sigmoid neurons are.
A neuron is a cell made of a body, an axon and dendrites. Meanwhile, neurons have the same cell organites than all other cells (nucleus, mitochondria, endoplasmic reticulum…), these, are distributed in a specific way in between the 3 parts of the cell. While neurons are similar to the other cells, they are specialized in communication. Thus their axon membrane is made for electric signal propagation and their dendrites are meant to receive the signal from ascendant neurons.
Figure 1: Two connected neurons.
Figure 1 shows how the first neuron (or ascendant) on the left, sends the signal to the second one (descendant) through its axon via synaptic terminals. Then the second neuron is able to send a piece of information to next and so on. But you should wonder how complex information arise from this simple interaction? And you are right. Despite the simplicity shown here, one should consider that each neuron has several (possibly thousands) inputs each of them having a different weight depending on the location of the synaptic terminal and the intensity of the signal. All this is integrated into the body part and the resulting signal sent to the axon toward several descendant neurons. And so on.
How the body is making electricity?
Yes, this is not obvious that body cells can produce electricity right? Let’s go simple. Our cells contain ions of different kinds. Ions are charged molecules like Na+ (Sodium) K+ (Potassium). This is the difference in the concentration of these ions that creates electricity, like in a battery (battery makers are using Li+ litium). Indeed, the membrane of the cells is made of lipids that do not allow ions to pass. Thus ions concentrations on the two sides of the membrane can be different creating a difference in the electric potential. This permeability can change thanks to canals which let ions go through the membrane and are open under certain conditions (the input signal of an ascendant neuron for example). This has been studied by Hodgkin and Huxley. This is the conductance of the membrane that will drive trans-membrane currants and so electric current of the neuron.
How do these signals look like?
Yes, I said these, because having a single signal would have been too easy right? Let’s make it simple.
At resting state, the membrane potential of a neuron is at a given level. The intracellular compartment of neurons is negatively charged relative to the extracellular space, and this charge, as well as current flow, is produced by ions. From the perspective of charged ions, the lipid bilayer of the neuronal membrane acts as a capacitor, and transmembrane glycoprotein pores or channels act as resistors. When information goes to the neuron, this will change its membrane potential due to the ions we were talking about earlier which will move across the membrane. These moves are allowed by different kind of proteins that will make ions crossing the membrane.
To study this, there is a kind of experiment called clamp. I will not go deep into it, if you wish to here is a useful reference. Basically what happens is that the more there is activation signals, the more membrane potential raise. The tricky thing in biology is that everything has a lifetime. Thus, activation signals have to be high enough and they can stack up on a short period of time. Imagine a queue that has to reach a certain length while diminishing as people reach the guichet. When the queue reaches a certain level, this is the strike and everybody goes mad. Basically, the same thing happens in the neural cell. As long as the activation level is not high enough, currents are proportional to activation. As soon as a threshold is reached, a maximum activation level is reached in a glimpse, this is a no return point, the neuron signal will fire.
There are loads of different ways of making ions go through the membrane one of those is thanks to gated or active transport. This involves proteins which act as a pump. In voltage-dependent channels, the probability that the gate (protein) is open depends on the membrane potential Po. Knowing the number of channels present it is possible to estimate the voltage dependence of Po.
Figure 2: Membrane protein in closed (c) and open (d) state.
A transposition to the world of IT
Now that we’ve seen the biological part, let’s try to understand what is a neural network in a computational way of thinking.
In 1950’s – 1960’s a scientist called Franck Rosenblatt developed a type of neurons called Perceptron (these are not proper neurons but a computational concept of a neuron). His work was inspired by Warren McCulloch and Walter Pitts ones. A perceptron neuron takes several binary (0 or 1) inputs x1, x2, …, xn and produces a single binary output.
But how does the neuron compute these input? Rosenblatt suggested that weight should be given to each input, reflecting the importance they have to the output. Then the neuron output’s value (0 or 1) depends on whether the weighted sum is greater or not a given threshold. Then, we can drastically change the model by acting on weights and threshold. This model is obviously not perfect and does not reflect the overall decision-making process of the human being but it is suitable for quite simple decision-making.
But this perceptron neuron can be though as involved in a decision tree in which inputs are the different factors that can have an effect on the decision and the weights as the strength of each factor. If the sum of each factor given the weights is bigger than a threshold you will take an action, if not, you stay in the same state. Now imagine a more complex tree in which you can chain such decisions. This can lead to more complex actions/decision-making.
Here, in the new model, we can think each column of neurons as a layer. Each neuron of the first layer is making a simple decision depending on the weight it gives to the input coming. Then, it gives a single input but towards different neurons of the second layer. Depending on the second layer’s neuron, the first layer outputs will have different weights and so on and so forth until the end.
Now if we apply this to a network of perceptrons. Assume we want the network to properly make a simple classification such as digits classification (this is one of the most common classifications in ML basics). Now imagine that after giving weights and biases to our network, it outputs a 6 where we expect a 9. This is not very far, we have two loops linked to a tail. But in one case, the tail is on the upper left of the loop and in the other case, it is below in the right. So because we are not very far, we decide to make some slights changes to our model and expect this will do the trick. The issue here is that changing one weight won’t cause a slight change, it can also make another neuron raise the threshold value and then drastically change the overall response.
The idea here is to get rid of the binary input and allow any intermediate value between 0 and 1. So then instead of an all or nothing input, we will have several, fraction of 1, values that will each one have a weight and the sigmoid neuron will compute these inputs knowing its bias.
As you can see in the figure above, is that the difference in between the two neurons (step function is for perceptrons) is not important for extreme values as they tend to be the same. No, here the difference is for values around 0. Indeed, finally sigmoid is just a smoothed perceptron. This is precisely this that is making a big difference (who said that math should be hard?). Then now when applying a small change to whether the weights or the bias, the output will be changed accordingly. Thus, we can feed a sigmoid neuron with any real number between 0 and 1 and this neuron will not “fire” 0 or 1 but another real value in between.
Take home message
Neural networks are now way more complex than this but the goal of this article is to show how biology could help IT. Indeed, Hodgkin & Huxley paper was released in the 50’s where perceptrons neurons were in the 60’s and sigmoïd ones only in the late 80’s. Why such a latency? Why when talking about neurons wasn’t it obvious for developers to inspire from biology?
This is changing, more an more projects now include transdisciplinary teams made of developers, neuroscientists, cell biologists and mathematicians. This is a good thing as progress may come from both ways and it makes no doubts for me that someday digital neural networks will help to understand what is happening in our brains.