From the archive, originally posted by: [ spectre ]

BERKELEY– Scientists at the University of California, Berkeley, have
recorded signals from deep in the brain of a cat to capture movies of
how it views the world around it.

The images they reconstructed from the recordings were fuzzy but
recognizable versions of the scenes that played out before the cat’s

The team recorded signals from a total of 177 cells in the lateral
geniculate nucleus – a part of the brain’s thalamus that processes
visual signals from the eye – as they played a digitized movie of
indoor and outdoor scenes for the cat. Using simple mathematical
filters, the researchers decoded the signals to generate a movie of
what the cat actually saw. The reconstructed movie turned out to be
amazingly faithful to the original.

“This work demonstrates that we have a reasonable understanding of how
visual information is encoded in the thalamus,” said Yang Dan,
assistant professor of neurobiology at UC Berkeley.

Theoretically, if someone could record from many more cells – the
lateral geniculate nucleus contains several hundred thousand nerve
cells in all – it should be possible to reconstruct exactly what the
animal sees, she said.

The results were reported in the Sept. 15 (’99) issue of the “Journal
of Neuroscience” by Dan; former postdoctoral fellow Garrett B. Stanley,
now an assistant professor at Harvard University; and Princeton
University undergraduate Fei Fei Li, who will be a graduate student
next year at the California Institute of Technology.

Dan sees the demonstration not only as confirmation of our current
understanding of how thalamic cells process signals from the retina,
but also as a step toward a larger goal of understanding how the entire
brain works. Such understanding is critical to discovering the causes
of brain diseases and mental illness.

“Fundamental understanding of brain processes is crucial to
understanding illness and eventually could help us come up with
treatments,” she said.

The current understanding of how cells in this part of the brain
respond to visual stimuli has been pieced together over decades by many
researchers working with animals. The results show that this approach

“Our goal is to understand how information is processed in the brain,
how it is encoded,” Dan said. “By working backward, using the firing of
nerve cells to reconstruct the original scene, we can see where we have
been successful and where we haven’t.

“We aren’t the first to use this decoding technique, but instead of
decoding the signals one at a time, we did it simultaneously to get a
movie image of what the cat saw.”

The lateral geniculate nucleus is only the first stop for visual
signals on the way to the brain. Higher areas of the brain, in the
cortex, do much more processing of signals. Much work is still
necessary to understand the details of such processing, Dan said. This
is the main subject of study in her lab.

Dan and her colleagues digitized eight short (16-second)
black-and-white movies of scenes ranging from a forest and tree trunks
to a face. They then played these in front of an anesthetized cat while
recording from cells in the lateral geniculate nucleus. Cats were
chosen because they have excellent vision. Even though cats have
primitive color vision, the group used low-resolution (64 by 64 pixels)
black-and-white images to simplify the experiment.

Since the researchers could record from a maximum of eight to 10 cells
at once, they replayed the video numerous times to record responses
from a total of 177 cells.

The specific cells from which they recorded are called X cells, which
respond to slower motion than other cells in the lateral geniculate
nucleus. In all there are some 120,000 X cells representing each eye in
this region of the thalamus.

Based on previous experiments by other researchers, Dan knew that each
point in the cat’s visual field should generate a spiking signal in
20-30 cells clustered together in the lateral geniculate nucleus. So,
she pooled the on-off responses of between seven and 20 cells to
reconstruct what the cat saw at each point in its field of view.

The reconstructions of the scenes were fuzzy and low in contrast, but

“We have provided a first demonstration that spatiotemporal natural
scenes can be reconstructed from the ensemble responses of visual
neurons,” the researchers concluded in their journal article.

The research was supported by the National Institutes of Health and an
Alfred P. Sloan Research Fellowship, a Beckman Young Investigator Award
and a Hellman Faculty Award.

Yang Dan
Professor of Neurobiology

E-mail: ydan [at] berkeley [dot] edu
Phone: (510) 643-2833
Lab Phone: (510) 643-3935


A major challenge in studying sensory processing is to understand the
meaning of the neural messages encoded in the spiking activity of
neurons. From the recorded responses in a sensory circuit, what
information can we extract about the outside world? Here we used a
linear decoding technique to reconstruct spatiotemporal visual inputs
from ensemble responses in the lateral geniculate nucleus (LGN) of the
cat. From the activity of 177 cells, we have reconstructed natural
scenes with recognizable moving objects. The quality of reconstruction
depends on the number of cells. For each point in space, the quality of
reconstruction begins to saturate at six to eight pairs of on and off
cells, approaching the estimated coverage factor in the LGN of the cat.
Thus, complex visual inputs can be reconstructed with a simple decoding
algorithm, and these analyses provide a basis for understanding
ensemble coding in the early visual pathway.


The current study is motivated by the following fundamental question:
when neurons fire action potentials, what do they tell the brain about
the visual world? As a first step to address this question, we
reconstructed spatiotemporal natural scenes from ensemble responses in
the LGN. Responses of visual neurons to natural images have been
studied in the past. Creutzfeldt and Nothdurft (1978) studied the
spatial patterns of the responses by moving natural images over the
receptive fields of single neurons and recording the responses at
corresponding positions. They generated “neural transforms” of static
natural scenes that revealed important features of the neurons in
coding visual signals. In contrast to this “forward” approach to
studying neural coding, we have taken a “reverse” approach, which is to
decode information from the neural responses. Given the known
properties of X cells in the LGN, significant information can in
principle be extracted from their responses with a linear technique.
Here we have presented the first direct demonstration that
spatiotemporal natural scenes can be reconstructed from experimentally
recorded spike trains. The results from the linear technique also
provide a benchmark for future decoding studies with nonlinear

In this study, we extracted information from the responses of a
population of neurons. The reconstruction filters not only reflect the
response properties of individual neurons, but also take into
consideration the correlation between neighboring cells. This is
crucial for decoding information from ensemble responses. Not
surprisingly, an important factor affecting the quality of
reconstruction is the density of cells (Fig. 4). In Figure 2, visual
signals within an area of 6.4 × 6.4o (1024 pixels) were reconstructed
from the responses of 177 cells, corresponding to an average tiling of
9 × 9 on/off pairs over a 32 × 32 array of pixels. As shown in Figure
4a, some areas of the scenes were covered with lower densities of
cells, resulting in lower correlation coefficients. A better coverage
of these areas could potentially improve the reconstruction. For
natural scenes, the quality of reconstruction begins to saturate at
12-16 cells, which appears to be related to the spatiotemporal
correlation in the inputs. A previous study using a similar technique
has shown that the saturation of the reconstruction quality occurs at
approximately three pairs of on/off retinal ganglion cells in the
salamander (Warland et al., 1997), which is significantly lower than
the number that we have observed. This discrepancy may be caused by the
difference in the visual inputs. In the earlier study the input was
full-field white noise with no spatial variation, whereas in our study
the natural scenes contain considerable spatial variation. The more
complex natural input ensemble presumably contains more information
that is carried by a larger number of cells. On the other hand,
compared to spatiotemporal white noise, natural scenes contain more
spatiotemporal correlation and therefore less information. This results
in the difference in the saturation for white noise and for natural
scenes (Fig. 4). Based on previous anatomical studies in the retina and
the LGN of the cat (Peichl and Wässle, 1979; Wässle and Boycott,
1991; Peters and Payne, 1993), we estimated that every point in visual
space is covered by the receptive fields of 20-30 geniculate X cells
(see Data Analyses). The density of cells required for optimal
reconstruction of natural scenes approaches this coverage factor,
supporting the notion that the early visual pathway is well adapted for
information processing in natural environments.

Several factors contribute to the error in the reconstruction,
including noise, nonlinearity, and nonstationarity in the neuronal
responses. Assuming linear encoding, we derived the theoretical error
of the reconstruction based on noise in the neuronal responses (Fig.
3b). The quality of the actual reconstruction reached this theoretical
limit between 3 and 16 Hz, suggesting that noise in the responses is
the major source of reconstruction error over this frequency range. We
would like to emphasize, however, that the definition of noise is
directly related to the assumed mechanism of encoding. Here, our
conclusion is based on the assumption of rate coding, under which noise
in the responses is defined as the difference between the firing rate
of each repeat and the average firing rate from multiple repeats of the
same movie. The disagreement between the actual and the theoretical
reconstruction errors below 3 Hz is likely to reflect deviation of the
neuronal responses from the linear model in this frequency range. To
further confirm this hypothesis, we predicted the neuronal responses
based on the linear model. The prediction was computed by convolving
the visual stimuli and the linear spatiotemporal receptive fields of
the cells, followed by a rectification (Dan et al., 1996). The
predicted and the actual firing rates were then Fourier transformed and
compared in the frequency domain. Significant difference was observed
only below 3 Hz (data not shown), supporting the notion that at these
low frequencies, the neuronal responses significantly deviate from the
linear model. Such deviation is not surprising because certain
nonlinearities, such as light adaptation and contrast gain control, are
known to occur at slower time scales (Shapley and Enroth-Cugell, 1985).
Also, some of the geniculate cells used in our study may be in the
bursting firing mode (Sherman and Koch, 1986; Mukherjee and Kaplan,
1995). Cells in the bursting mode are known to exhibit nonlinear
responses, which could contribute to the errors in the reconstruction.
In addition, there may be nonstationarities in the responses
unaccounted for by the model, such as that caused by slow drifts in the
general responsiveness of the visual circuit. Future studies
incorporating these mechanisms may further improve the reconstruction.

Our current decoding method assumes that all information is coded in
the firing rates of neurons. It is optimal only in the sense that it
minimizes the mean-square error under the linear constraint. Although
this technique proved to be effective in the present study, more
complex, nonlinear decoding techniques (de Ruyter van Steveninck and
Bialek, 1988; Churchland and Sejnowski, 1992; Abbott, 1994; Warland et
al., 1997; Zhang et al., 1998) may further improve the reconstruction
from ensemble thalamic responses. Furthermore, recent studies have
shown that neighboring geniculate cells exhibit precisely correlated
spiking (Alonso et al., 1996). With white-noise stimuli, up to 20% more
information can be extracted if correlated spikes are considered
separately (Dan et al., 1998). Such correlated spiking may also
contribute to coding of natural scenes. Ultimately, the success of any
reconstruction algorithm is related to the underlying model of the
neural code. The decoding approach therefore provides a critical
measure of our understanding of sensory processing in the nervous