«The representation and perception of visual motion: to integrate or not to integrate by James H. Hedges A dissertation submitted in partial ...»
The representation and perception of visual motion: to
integrate or not to integrate
James H. Hedges
A dissertation submitted in partial fulﬁllment
of the requirements for the degree of
Doctor of Philosophy
Center for Neural Science
New York University
Eero P. Simoncelli
J. Anthony Movshon
UMI Number: 3380197
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.
UMI 3380197 Copyright 2009 by ProQuest LLC.
All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code.
ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106-1346 c James H. Hedges All Rights Reserved, 2009 Acknowledgements I would like to thank my advisors: Eero Simoncelli and Tony Movshon. I have long admired the incredible ingenuity and commitment that they bring to bear on answering diﬃcult questions. I am grateful for having had the opportunity to work with them during my graduate studies.
I would like to thank my committee members: Nava Rubin, Mike Landy, John Rinzel and Lynne Kiorpes.
I would like to thank my examiners: Norma Graham and Bart Krekelberg.
I would like to thank my collaborators: Jenny Gartshteyn, Adam Kohn, Bill Newsome, Nicole Rust, Tim Saint, Mike Shadlen and Alan Stocker. Adam, Nicole, Tim, Mike and Bill contributed to the work described in chapter 1. Jenny collected psychophysical data related the work described in chapter 1. Tim helped develop the motion energy model described in chapter 1. Alan helped develop the Bayesian transparency model described in chapter 2.
I would like to thank the Center for Neural Science (CNS) class of 2003: Tarimotimi Awipi, Mitch Day, Yasmine El-Shamayleh and Riju Srimal.
I would like to thank some present and former members of the Laboratory for Computational Vision: Jose Acosta, Eizaburo Doi, Rob Dotson, Chaitu Ekanadham, Rosa Figueras, Jeremy Freeman, Deep Ganguli, Jose Antonio GuerreroColon, David Hammond, YanKarklin, Misha Katkov, Siwei Lyu, Josh McDermott, Jonathan Pillow, Umesh Rajashekar, Martin Raphan, Jon Schlens, Brett Vintch and Rob Young.
I would like to thank some present and former members of the Visual Neuroscience Laboratory and Pete Lennie’s lab: Neel Dhruv, Mike Gorman, Arnulf Graf, Mehrdad Jazayeri, Romesh Kumbhani, Matt Smith, Sach Sokol and Chris iii Tailby.
I would like to thank some of the CNS/NYU faculty: Paul Glimcher, Mike Hawken, David Heeger, Sam Feldman, Pete Lennie, Larry Maloney, Dan Sanes, Mal Semple, Bob Shapley and Dan Tranchina.
I would like to thank some other people who are (or were) in, or around, CNS:
Eric DeWitt, Anita Disney, Jeﬀ Erlich, Andy Henrie, Chris Henry, Trent Jerde, Siddhartha Joshi, Brian Lau, Gabriel Lazaro-Munoz, Shani Oﬀen, Hysell Oviedo, Lana Roisis, Robb Rutledge, Scott Schafer, Max Schiﬀ, Abraham Schneider, Pascal Wallisch and Dajun Xing.
I would like to thank some present and former members of the CNS staﬀ:
Ken Anderson, Amala Ankolekar, Joey Azevedo, Krista Davies, Paul Fan, Erick Howard, Vic Keenan, Stu Greenstein, Joanne Rodriguez, Hillary Webb and Amy Yochum.
I would like to thank my family: Chuck and Carole Hedges; and Aug and Carolyn Firth. I would also like to thank Jerry and Mary Lou Barnes. I would like to thank two other New York-based neuroscientists: Annegret Falkner and James Herman.
There are many others who I will not name. Their contribution is, in any case, largely unquantiﬁable: girlfriends, friends, roommates, neighbors, musicians, painters, architects, designers, chefs, doormen, clerks, cab drivers, train conductors and janitors.
I have addressed the physiological mechanisms for, and perceptual consequences of, integrating visual motion. Where possible, I have tried to determine the rules by which the visual system decides whether to integrate or not. My ﬁrst set of experiments was motivated by the following observations. Humans and primates can see motions at small and large scales. Also, neurons in area MT have large receptive ﬁelds, which are known to play a role in the perception of visual motion.
I conducted a series of electrophysiological experiments to determine whether MT neurons compute global motion, which was deﬁned in terms of widely separated apparent motion. I used stimuli in which there could be opposing local and global motion. I found that MT neurons are unaﬀected by global motion, that their responses are entirely determined by local motion. My control experiments suggest that they do not compute global motion even in the absence of local motion. My second set of experiments concerned how the visual system decides whether to integrate or segment motions. I presented drifting square-wave plaids and asked subjects to indicate whether they appeared to move coherently, as a single object, or transparently, as two objects moving in diﬀerent directions. I found that a plaid’s component and pattern speed aﬀected how it was perceived. Plaids were more transparent at faster pattern speeds and were coherent otherwise. I developed a Bayesian model that can explain these results. Key components of the model are based on preferences of the system to see slow and singular motion. My ﬁnal set of questions was motivated by the idea that adaptation causes repulsion by reducing the gain of mechanisms that encode properties of a stimulus. In psychophysical experiments, I measured the pattern of biases in perceived direction that result from adapting to coherent and transparent drifting square-wave plaids. My results
3.1 Hypotheses for the perceptual consequences of adapting to a squarewave plaid................................ 103
3.2 Method of adjustment task for measuring adaptation-induced biases in perceived direction of motion.................... 107
3.3 Selecting coherent and transparent plaid adaptors.......... 110
3.4 Circle-shaped plot for showing adaptation-induced biases in perceived direction of motion....................... 112
3.5 Adaptation-induced biases from a coherent plaid........... 113
3.6 Adaptation-induced biases from a transparent plaid......... 117
3.7 A comparison of the performance of component and pattern predictions for coherent and transparent plaids............... 118
3.8 Adaptation-induced biases from transparent random dots...... 120
This thesis is presented as three self-contained papers. They are not part of a single line of study, but are connected in that they address aspects of the representation and perception of visual motion. My focus has been the perceptual and physiological consequences of integrating visual motion information. Where possible, I have tried make clear the rules by which the system decides whether to integrate or not. I have also tried to identify the mechanisms which are associated with the perceptual phenomena in which I was interested.
In this introduction, I brieﬂy state what visual motion is, how representations of it are formed, and what representations of it convey to an organism. I summarize sets of observations on how it is represented and perceived. At the end of this chapter, I link the questions that I address in the chapters that follow to the background I present here and I brieﬂy summarize my results. In many cases, I have omitted some alternative views of the background I describe. I likewise have not included all known pieces of evidence, for or against, the conclusions I discuss.
My aim is not to provide a comprehensive summary of these issues, but to lay out the threads of knowledge that are foundational to the questions that I explored.
0.1 Basic observations
In simple terms, visual motion may be deﬁned as changes in the position of light over time. To represent these changes, a system can ﬁrst transduce the brightness within many diﬀerent small cone-shaped regions in the surrounding three-dimensional world. The retina does this by estimating the amount of light that lands on its surface at many diﬀerent positions in a given amount of time.
When there is motion, such as when an object moves in a scene, the pattern of the projected retinal image is shifted relative to the previous image that it formed. It is the job of higher areas to estimate the motions in a scene from these changes in the retinal representations. I should point out that this is true for luminance-deﬁned motion, but visual motion can be deﬁned in other ways, an issue to which I will return. Changes in luminance are suﬃcient, but not necessary for visual motion.
They may or may not evoke a sense of motion when the system has access to them and the sense of motion can result without them.
A representation of motion provides a wealth of information to an organism.
There is ample evidence, for example, that humans use information about motion to interpret the world. Among the functions it serves are: sensing the real motion of objects; sensing the depth and relative distances to points in the environment;
sensing self motion, such as when walking in an environment; estimating the time to collision of visual targets; segregating diﬀerent objects in a scene; distinguishing ﬁgure from ground; and driving eye movements. For many organisms, these functions are essential for survival. In a sense this must be the case, since the mechanisms that represent motion consume considerable resources, even though they have been optimized to be as eﬃcient as possible.
0.1.1 The aperture problem and a ’solution’
The ’aperture problem’ is an inherent ambiguity that results when measurements of local motions are made [1, 2, 3, 4]. The solution to this ambiguity shapes the form and processes of the system that represents motion. Consider an extended contour, such as a line segment, moving within an aperture. There is not enough information to determine the true motion of the surface of which the contour is part. In other words, there are many diﬀerent physical motions that can lead to the same physical stimulus. The key point is that any motion parallel to a one dimensional (1D) pattern is invisible and, therefore, only motion normal to its orientation can be detected.
The visual system faces this problem, since it makes local measurements (i.e., through an aperture), and many structures in the world are 1D when measured locally. But it can, and does, determine the true motion of objects when more than one unique 1D motion exists. The basic idea is to pool 1D motions that are consistent with the same pattern of motion, as would result from a single translating object. Since these estimates can be assumed to relate to each other in a speciﬁc way a unique solution emerges. It is, nonetheless, not clear which signals to pool. The idea is to pool the ones that are consistent with pattern motion, although the pattern motion is unknown without pooling the local motions [5, 6].
It is easiest to consider the solution in the context of a few classic visual patterns. The set of 1D stimuli to consider includes extended gratings, edges and bars. None of these are truly one-dimensional in the real world, which is to say their extent is not inﬁnite, but they are essentially so, if they extend beyond the edge of the region of visual space that is represented or perceived. The set of two dimensional (2D) stimuli to consider includes plaids, which are the superposition of two component gratings in overlapping visual space, and random dots, which can be thought of as the superposition of a set of many overlapped drifting gratings.
In velocity space, each of these moving stimuli has a corresponding vector or pattern of vectors, whose length relates to the stimulus’ speed and whose orientation relates to its direction . A drifting grating, for example, maps to a line of possible velocities, a constraint line. The constraint line is parallel to the stimulus’ orientation and orthogonal to the vector representing its primary motion. For a plaid there are two such constraint lines, one for each component, and they intersect at a unique point. By integrating the two local motions the intersection of constraints (IOC) solution can be determined. This corresponds to the true motion of the plaid [3, 8, 7, 9]. Plaids are a simple case that illustrates the IOC solution, but IOC is not limited to them. The point is that it provides a solution for more complicated motions.
But IOC is not the only way to combine motion vectors, and some other options include summing or averaging them. For some stimuli, there may be no diﬀerence between the solutions provided by IOC and these other rules. One case for which the solutions are diﬀerent is a ’type II’ plaid. This is a plaid for which the direction in which the constraint lines intersect is outside of the narrowest sector of the directions of the two component motions. There is some evidence that the perceived direction of motion (DoM) of type II plaids is biased towards the vector average direction [10, 11]. Results like these remind us that the details of which rule is most similar to what is perceived in diﬀerent circumstances, have not been worked out. That said, IOC is a good approximation for many stimuli.
A two-stage model for motion perception emerges from these observations about integrating motions. The system ﬁrst estimates local motions and then combines them [3, 7, 9]. The ﬁrst step is the decomposition of the image into 1D spatial components of varying orientations. The speeds in the direction orthogonal to the component orientations are computed. The directions and speeds of many diﬀerent components at a given spatial location are recombined to ﬁnd the IOC solution.