Neural properties of the 3D reference frame transformation for reaching as predicted and explained by an artificial network

Gunnar Blohm, Gerald P. Keith, J. Douglas Crawford

CESAME, University catholique de Louvain, Louvain-la-Neuve, Belgium
CVR, York University, Toronto, Canada

Reaching requires a transformation of visual signals related to initial hand and target positions into motor commands to move the arm. This visuomotor transformation involves extraretinal eye and head position signals and is computationally complex, in particular because of the 3D rotational and translational geometry involved (Blohm & Crawford, J Vis 2007). It is currently under debate where in the brain this visuomotor transformation occurs and what signals we would expect to find in areas involved in this process. Despite several general theoretical descriptions, no one has ever trained a neural net to perform the 3D transformation and analyzed its properties to develop a theoretical framework for the actual physiology. Here we trained a physiologically realistic artificial neural network to perform the visuomotor transformation and demonstrate how 3D reference frame transformations can be performed in distributed processing.

We used a fully connected 4-layer feed-forward artificial neural network. The input layer had 2D retinal position maps of target and initial hand position, 2D maps of retinal disparity of target and initial hand position and extraretinal 3D eye and 3D head position signals as well as a 1D vergence input. These inputs all projected to the second (hidden) layer that in turn projected to a population output (3rd) layer. We reconstructed the resulting movement vector in the behavioral read-out (4th) layer that hypothesized cosine tuning in the population output and used an optimal linear estimator. The network was trained (resilient back-propagation) on the theoretically computed exact relationship between a random gaze-centered input and its corresponding shoulder-centered movement vector output.

We analyzed the network properties using an electrophysiological approach. In order to investigate how the complete reference frame transformation could be computed at the single unit level, we characterized each units input and output reference frame for the hidden (2nd) and population output (3rd) layers. The input reference frame was assessed by observing how visual receptive fields were modulated with extraretinal eye-head-vergence position. Hidden layer units were purely gaze-centered and showed gain modulation whereas the population output layer had mixed reference frames (between gaze- and shoulder-centered). We used two techniques to find the output reference frame for each unit, i.e. motor fields and simulated micro-stimulation, combined with eye-head-vergence modulation. The motor fields showed mixed reference frames for both the hidden and population output layers. However, simulated micro-stimulation showed mixed reference frames in the hidden layer and purely shoulder-centered coordinates in the population output layer.

To summarize, the complete reference frame transformation between gaze- to shoulder-centered coordinates was performed gradually at the single unit level. Partial transformations in single units are then weighted by extraretinal eye-head-vergence signals in a gain-like fashion and combined at the population level. We also demonstrate how relative distance is gradually transformed into absolute distance. Importantly, different reference frames can be found in the same area if assessed by different techniques. This work reconciles many apparently contradictory findings that have reported different reference frames in a particular area but when investigated using different electrophysiological techniques. This model also makes new predictions as to what signals we should find in areas involved in the 3D reference frame transformation for reaching and how such an area could be identified based on its neural properties.

Supported by Marie Curie (EU) and CIHR (Canada)