There are no “grandma’s cells”
We often try to imagine the brain as a collection of neurons configured to detect certain images. Usually, this concept is colloquially referred to as grandma’s neuron or grandmother cell in a sense that when “grandma” image appears the associated neurons start to respond. This idea is strongly criticized these days.
There are several ideas of neural coding, i.e., how information could be presented on a neural level. We assume that the basis is not the activity of a single neuron, but the code formed by a group of neurons (so-called population coding). It was experimentally shown that the same set of neurons could be activated in different scenarios. So, just by looking at the activity of a single neuron, it is not possible to tell what is really going on.
According to the concept of “grandma’s neuron,” by having 100 neurons, using rate or sparse encoding one can encode 100 distinct “grandmas.” On the other hand, by using population coding, 100 neurons may encode an almost infinite amount of concepts: 2¹⁰⁰ is way greater than the number of elementary particles in the observable universe.
Brain codes are complex and have a semantic nature
It seems that the coding of information occurs not only due to the creation of an immediate picture of neuronal activity but also due to the time sequence of such pictures combined into a single information packet. This allows the brain to create elaborate descriptions. In computer science, the need to construct universal and powerful descriptions has led to the creation of formats like XML, JSON, and similar. We believe that it is possible to draw rather strong analogies between these formats and the coding methods used by the real brain.
Speaking about semantic nature of codes, we suppose that in the same way as we use words in our speech, the brain uses a rich system of codes, which could be treated as words of the internal language.
Neurons do not “scream,” they “whisper”
We suppose that the neuron spikes (with amplitude around 100 mV) are only the tip of the information iceberg. Most of the work takes place at much smaller amplitudes of several millivolts. It looks like the main brain activity is represented by tiny postsynaptic potentials. We think that these potentials are not random, as one can often hear, but are the very essence of the brain’s work.
At the same time, we show that tiny postsynaptic potentials and the currents they induce not only could be a sign of neurons work but also may be used to broadcast and propagate information across the cortex.
Brain codes may propagate like waves through the cortex
We have implemented the model showing that codes and the information they carry can be transmitted through cortex in the form of propagating pattern of neural activity. To imagine this, consider a small volume of the cortex, about 100 micrometers along the surface of the cortex, where a particular pattern of neural activity appeared. This neural activity may induce other patterns of neural activity in the neighboring regions of cortex. These new patterns, in their turn, may induce yet another patterns, and so on. Finally, patterns of neural activity will propagate in all directions, thus forming a specific pattern distributed all over the surface of the cortex.
Elements of such pattern could be represented, for example, by the activity of dendritic segments (small sections or branches of neurons’ dendritic tree) where the corresponding currents are induced.
The notable characteristic of such pattern is its deterministic nature. That is, information wave that started its propagation in one point of the cortex would eventually reach some other point. However, patterns formed in those points, as well as in any other part of the cortex along the way, would seem to be different and look totally random, but in fact, all of them would be strictly correlated to the information they carry. We believe that two parts of the cortex would be able to communicate even if each part uses its own “encoding”, as long as it stays consistent.
Each region of the cortex can act both as a receiver and as a transmitter of the information. The information is broadcasted over the surface of a particular cortex area, meaning that it becomes available all over the surface of that area.
Aside to the information broadcast that is supposedly local to the cortex area, there exists a way to send information to the distant regions of the brain using the system of projections. The system of projections between distant areas of cerebral cortex consists of thin bundles of nerve fibers. These nerve fibers travel from one small region of cortex (about 300 μm) to another one. This allows us to assume that those thin projection bundles transmit brain codes rather than raw signals of individual neurons. After the
corresponding cortex area receives the signal, it is then broadcasted locally as a propagating pattern of dendritic activity.
One can draw an analogy with transmitting the data over the Internet. Over the long distances, the data is transmitted over fiber optic cables. However, when the data arrives at a certain point, it is then distributed to the customers via wireless technologies like LTE or Wi-Fi.
Memory is the result of the interference of information waves
We have shown that if two patterns of activity are subsequently induced in the small volume of the cortex, then it is possible to memorize this event firmly. Here we speak not only about the activity of whole neurons but also about the pattern of activity of the dendritic segments. The first pattern could be considered as a key; it will indicate which dendritic segments should participate in the memorizing process. The second pattern represents the actual information that should be memorized. It will indicate which combination of neural activity should be captured by the selected segments of the first pattern.
To retrieve the information, it is necessary to invoke an appropriate pattern (code) in the local cortex area, which, in its turn, will induce the propagation of the information wave.
Cortical minicolumns serve as a space for memorization
It is well known that neurons in neocortex form regular structures. These structures are called cortical minicolumns (or cortical modules), because neurons in them are mostly organized vertically, one on top of the other. Inside such columns neurons are interconnected much stronger and denser than with individual neurons from neighboring columns.
Depending on the particular cortex area a minicolumn usually contains around 100 neurons.
Axons of a particular minicolumn’s neurons are thickly intertwined and form a dense network of multiple random intersections. Because of that density, near any axon of a minicolumn, there is always plenty of other axons. Axons branch out, forming collaterals, which end up with synapses. Typically, at any point of the minicolumn within one micrometer, there are about 30 synapses belonging to different neurons.
When some neurons in a minicolumn get activated, they massively release their neurotransmitters into the minicolumn space. Some neurotransmitters released from synapses “spill” into the surrounding volume (this is known as spillover effect).
Upon activation, each neuron releases its own unique “cocktail” of neurotransmitters and neuromodulators, sometimes augmented by astrocytes (see the concept of a “tripartite synapse”).
After neural activation, the places where axons of currently-active neurons intersect will be flooded by a very specific “cocktail” of neurotransmitters and neuromodulators from participating neurons. The ingredients of such “cocktail” will uniquely identify the combination of active neurons which created it. Different neural activities will result in different cocktails in different places.
For every signal and every dendritic segment it is always possible to find a place with a strong interaction of active neurons, and hence with a rich neurotransmitter “cocktail.” This allows definining the memory as the ability of the selected dendritic segments to retain the imprint of the “cocktail” that was associated with the corresponding signal.
The origins of memory are receptor clusters on the neuron’s dendritic surface
The surface of neuron’s dendritic membrane is populated with large quantities of receptors that float freely on it. The structure of receptors allows them to aggregate into clusters. Receptor clusters as a whole can act as the detectors of a particular combination of signaling molecules and can respond to this combination later. This ability allows body cells to adjust to environmental changes. We suppose that neurons use the same mechanism for creating receptor clusters. A cluster tunes to a particular “cocktail” that was present at the moment of memorization. Should this cocktail appear again, the cluster will detect it and respond.
It is very likely that the diversity of neurotransmitters and their simultaneous usage is not a quirk of nature, but a deliberate design aimed at creating a rich variety of chemical information keys in a small volume. At the same time, not only the composition of receptors is essential for the “key,” but also the place where such “key” is located.
Hippocampus is not temporary storage of memories, but a dispatcher that assigns identifiers
In order to form a memory, two patterns are needed. One pattern carries the information to store; the other one carries the identifier (descriptor) of this information. We assume that the hippocampus itself does not store any memories. Instead, its responsibility is to generate the unique identifiers of memories. These identifiers propagate to the corresponding areas of the cortex where the identifiers and memorized information interfere. Interference of identifier and information results in forming memory, not in hippocampus, but directly where needed, right in the cortex.
To reliably retain something in memory, receptor clusters that were created during memorizing should be stabilized (by processes of adhesion and polymerization). Not all created memories become long-term memories. The process of transition to a long-term state is called memory consolidation. We suppose that memory consolidation has nothing to do with memory being moved from or into the hippocampus, but rather it is a process of changing receptors conformation directly in the cortex.
For quick access to the data,s it is often convenient to tag it by timestamp, place, and relevant keywords. No wonder that hippocampus contains the structures responsible for such details. It seems that when forming an identifier, hippocampus includes everything potentially useful for a later relevant search.
Context and meaning are the essences of understanding
When presenting information, we use systems of concepts suitable for a particular purpose. Some systems (like maths and programming) are deliberately designed in such a way that each term has the only possible meaning. Such systems allow us to formulate statements that can be unambiguously interpreted. However, it turns out that when it comes to quick and effective transfer of knowledge, such systems appear to be bulky and inconvenient. Moreover, such systems are not suitable for effective knowledge generalization.
In contrast, natural languages tend to use a more flexible system where each concept has a lot of different meanings. In order to extract appropriate meaning, listener needs to know the context.
Contexts define certain entities in which words change their meanings jointly and consistently.
To understand the meaning of a particular phrase, it is necessary to take all possible contexts and check how this phrase will look in each context. To do so, every word in the phrase should be substituted with the interpretation of this word in that specific context. Only in the right context, such phrase will look plausible and not contradictory.
The only way to check if an interpretation obtained in a particular context is correct is to compare it with all previous experience and see if we have met similar interpretation before.
Our experience consists of memories of past events. These memories are not stored in the form of original descriptions, but instead as our interpretations of those descriptions.
It is crucial that all interpretations obtained in different contexts are compared to the same shared memory. This approach leads us to a series of essentially important conclusions. For example, this means that we can recognize a phenomenon that was never met before, but which is similar to something that was met before but in the entirely different context.
When some of the contexts provided an interpretation that is plausible and “makes sense,” i.e., fits nicely with our experience, we may say that we have just found the actual meaning of that information. At the same time, we now know, how to interpret that information later.
The ideas mentioned above generalize the frames concept introduced by Marvin Minsky, the idea of convolutional neural networks, and the theory of formal context analysis.
We have shown that by using the same system of concepts and contexts, we could describe and operate with any kind of information that the brain usually deals with. It is equally applicable to visual and audio, haptic, motor, as well as the semantic information.
Cortical minicolumns are independent context processors
We believe that minicolumns, which comprise the cortex, are, in fact, independent computing modules.
Each minicolumn is associated with a certain context and learns by mapping input to its interpretation within that context. When combined, minicolumns form a context space that is capable of processing information in all possible contexts specific to that cortex area.
Cortex area as a whole works by assigning a different context to each minicolumn, evolving this context space by simultaneously feeding the same input to all contexts, and selecting contexts that found meaningful interpretations of that input.
To determine whether the interpretation is acceptable in its context, each minicolumn needs to compare its interpretation to something that is already present in memory.
Memory is not local or distributed. Memory is total
It may seem very surprising and counterintuitive, but we clearly state that each cortical minicolumn contains its own copy of the whole memory domain specific to that cortex area.
We estimate that each minicolumn is capable of storing around half of a gigabyte of semantic information, assuming that such memory is based on the mechanism of interfering information waves. It is enough to store fairly detailed memories accumulated during human life.
Friedrich Nietzsche once said: “There are no facts, only interpretations.” Indeed, the incoming information stays meaningless until we choose its interpretation. It is done by querying context space for possible interpretations of the input information. Best interpretations are then broadcasted across the whole cortex area and stored by all minicolumns in their local memory. That way minicolumns constantly synchronize their knowledge.
Minicolumns can learn, generalize and resist to combinatorial explosion
When analyzing data encoded with long enough codes, we often face the “curse of dimensionality.” For example, iterative learning algorithms based on gradient descent appear to be very sensitive to initial approximation and the order in which samples are fed. Unfortunately, even the huge number of training samples cannot guarantee that eventually, we would find “good” solutions.
One of the most successful approaches to this problem was found in the field of machine learning. It is based on the idea of random subspaces. Suppose we have a 100-bit vector, using which we are able to encode 2¹⁰⁰ distinct concepts. Let’s assume that we need to find patterns of at most 15 bits in a set of such vectors.
To do that let’s create a significant number of subspaces of incoming data. Each subspace will contain only a limited mask of input bits that covers, say, 30 bits. The masks are generated randomly. In case of a sufficiently large number of subspaces, it is very likely that for any pattern present in the incoming data we would find a subspace revealing this pattern, i.e., it will include most of the bits presented in this pattern.
This approach allows us to search for global patterns in local subspaces, which dramatically simplifies the task. It may require to have hundreds of thousands of subspaces, but even so, thousands of simple tasks are still much better than one unsolvable problem.
One cortical minicolumn contains about a million synapses. Within the spillover radius of each synapse, there are around 30 neighboring synapses from other neurons. Taking that into account, each synapse can be treated as a random subspace, that receives signals from 20 to 30 neurons of about 100 neurons of the cortical minicolumn.
In our research, we have formulated and modeled a new approach to machine learning and generalization that is based on similar principles. We have modeled the creation of receptive clusters as a response to a pair of input and output vectors. Each created cluster represents some hypothesis that some input bits and some output bits are correlated. By evolving such clusters over time we can effectively filter out any noise bits that were statistically proven to be “wrong,” i.e., that do not fit the correlation hypothesis.
Contexts are generalizations
We define contexts as sets of interpretation rules. When an interpretation observed in a particular context appears to be similar to the one stored in memory, we say that it is now possible to interpret the incoming information in this context.
The interpretation rules evolve as the result of observation on how input description gets transformed in the presence of a certain phenomenon, that is, in fact, a context.
Initially, a context is formed based on a set of some notable features. Over time, interpretation patterns get generalized by observing different phenomenon sharing the same sets of features. Eventually, the context learns to recognize a phenomenon not by its features, but by recognizing the “essence of the phenomenon.”
For example, initially, we perceive tables by features as “something that has a tabletop and legs.” We create a context “table.” Eventually, we discover that all tables may be characterized as “a surface that allows you to operate objects conveniently.” From now on, we perceive more things as tables: a stump in the forest convenient to put a basket on, a medical operating table, or a computer desktop. The most important thing is that from now on we can recognize the table not by its features (that may be completely absent), but by the behavior of other objects in the presence of a table. If, during the process of interpretation, they behave as if there was a table, then in the context of “table” we obtain the right interpretation and therefore declare that indeed, we have just recognized a table.
Virtually any concept could behave as a context. The context space represents the space of generalizations available to us. Moreover, when an interpretation in a particular context turns out to be correct, this gives us not only the interpretation itself but also the context as a successfully recognized generalized characteristic.
To select an optimal description is to think
When an input data gets projected onto the cortex area, some of the minicolumns may produce plausible interpretations according to their contexts. If several minicolumns were activated, then we can say that input information could be seen from several different perspectives. In that case, the output would be comprised of several interpretations coupled with their contexts. However, in some scenarios, there could be so many possible interpretations that such output would be overloaded with insignificant details.
We all know that brevity is the soul of wit. When talking to other people, we value the ability to communicate by expressing concepts and ideas in brief and effective form, by leaving only the most important and omitting redundant or insignificant details.
We believe that this is not just a feature of the human speech, but a fundamental property of thinking. In other words, when working with information each cortex area yields not all possible contexts and interpretations, but only those are important in a given situation.
Thinking and intelligent behavior are the results of reinforcement learning
Reinforcement learning is a set of algorithms allowing an agent to adapt to a dynamic environment, instead of following some predefined set of rules. The idea is that the agent tries to maximize its expected reinforcement by observing the current situation from previous experience and taking into account possible positive or negative reinforcement that was earlier obtained in a similar setup.
We state that both thinking and behavior are formed as a result of trial-and-error learning.
Emotional estimations measure the quality of a situation
The ability to estimate a situation quality is crucial to reinforcement learning. Importantly, reinforcement can be delayed for a long time from the moment of making an action or forming a thought. So instead of maximizing a particular reinforcement, it is reasonable to maximize the probability of receiving positive reinforcement in the future. The estimation of the probability of future positive or negative reinforcement is called an estimate of the quality of a situation. By taking actions that maximize the estimation of the quality of a situation, we get an approach that maximizes the probability of obtaining the maximum positive reinforcement.
Notably, the quality estimation itself can act as a reinforcement. That is, we learn to maximize not only our chances to succeed in a particular situation but also our chances to turn up in a potentially beneficial situation.
All our emotions, i.e., feeling “good” or “bad,” are, in fact, the measures of the quality of the situation. “Bad” emotions always represent the fear that something unpleasant will happen. “Good” means that we anticipate something positive.
Context space is key to best prediction
As it was stated before, the classic approach to reinforcement learning is to act according to previous experience and reinforcement that was obtained in a similar situation. However, the real challenge is not to act similarly, but to infer that situations are, in fact, similar. The slightest detail may turn the meaning of a situation upside down. Unfortunately, even significant similarity between feature sets describing the situations is still not enough no treat such situations as similar. Moreover, two situations that must be treated as equal may have completely different descriptions.
It looks like the only proper way to conduct reinforcement learning is to use a space of contexts. First, it allows us to recognize a phenomena and situations not by a set of features, but by the very essence of things.
Second, the context space allows to compare the current situation to those stored in memory and to recognize identical situations even if they occurred in different contexts and are seemingly unrelated. Betrayal is always a betrayal, regardless of the form it is expressed.
Each minicolumn that was able to find plausible interpretation may also provide its prediction of the quality of situation, based on its previous experience. To find the optimal output of a cortex area as a whole means to select those contexts and interpretations that maximize the predicted quality of the situation.
For example, in the case of visual cortex, from the list of all available interpretations, we usually want to select the most important ones, i.e., describe the most important objects being seen. This process is usually referred to as the focus of attention. Such a reduced description should not be more than 5 to 7 elements. This is known as the volume of attention.
Choosing the optimal action
A minicolumn can model how a situation will be changed upon performing a particular action and predict the estimation of the quality of the obtained situation.
Minicolumns can transform information by learning. For example, they can learn to translate one description to another. This description change corresponds to how information picture is altered in the presence of a key concept of a certain context.
For concepts that denote actions, the changed description shows what will happen when the corresponding action is taken. In general, this is an extremely challenging task, but it simplifies greatly when the descriptions contain only the central meaning of the information and are filtered from insignificant details.
We suppose, that this approach helps the brain to implement a reinforcement learning model similar to the concept of adaptive V-critic. This requires two essential parts: Model and V-critic. Model is a speculative model of the real world, which simulates the consequences of actions. V-critic estimates the quality of the situation obtained after the action has been simulated. By iterating over all available actions, the brain can select the one with the best-predicted quality of the situation.
The context space, where every context is an action, allows to model and compactly describe the outcome of all possible actions applied to the current situation.
The memory of each context stores the experience of estimating past situations. This allows each context to make its prediction of future quality of the situation. As a result, it is possible to select an action that predicts the best quality of the situation. Also, this approach may be used to implement step-by-step prediction.
When predicting an estimation of the quality of a situation, we could get not only an estimation of quality itself but also an estimation of how reliable our prediction is. Similarly, confidence in used analogies may be estimated when determining the meaning of a situation. Such estimation can be used in further processing.
For example, exploratory behavior can be induced by intentionally selecting contexts that yield higher quality but lower confidence estimation.
Cortical minicolumns are organized in space optimally
In some sense, the process of active thinking or action planning may require to select non-correlating contexts. For example, we can move a hand and a leg simultaneously, but we cannot simultaneously make several useful but opposite actions by the same hand – we need to choose only one.
We suppose that the context space is organized in such a way that similar contexts are mapped to adjacent minicolumns. This allows selecting active non-conflicting contexts simply by finding local maximums on the surface of the cortex area.
Aside from that, such spatial organization allows generating a higher level generalized code of the concept by mapping groups of active contexts. Importantly, similar concepts would have similar codes.