Mar 14 (Tue) @ 2:30pm: "Incorporating Human Visual Properties into Neural Network Models," Aditya Jonnalagadda, ECE PhD Defense
Many animals and humans process the visual field with varying spatial resolution (foveated vision) and use peripheral processing to make intelligent eye movements and point the fovea to acquire high-resolution information about objects of interest. A foveated architecture results in computationally efficient rapid scene exploration and can result in energy savings for computer vision. A foveated model can also serve as a proxy to identify circumstances in which humans might make an error and as a tool to understand the human vision. Foveated architectures have been implemented into previous computer vision models. However, they have not been explored with the latest computer vision architecture transformer networks, which result in better robustness against adversarial attacks and better representation of spatial relationships across the entire image.
We propose foveated computational module for object classification (FoveaTer) and object detection (FST) integrated into the vision transformer architecture. We evaluate FoveaTer’s computational savings and gains in robustness to adversarial attacks relative to a full-resolution model. We used the self-attention weights to optimize the guidance of the model eye movements. We have also investigated using FoveaTer to predict the various behavioral effects of humans. We performed a psychophysics experiment for the scene categorization task and predicted the dependence of human categorization performance across fixations using the FoveaTer model. Using two additional psychophysics experiments, a forced-fixation experiment mouse recognition to detect mouse in the visual periphery and a visual search experiment to detect mouse using a limited number of fixations, We have also evaluated how the FST model uses contextual information to guide eye movements like humans.
In addition, we trained anthropomorphic CNN models to detect simulated tumors in simulated 3D Digital Breast Tomosynthesis phantoms and compare their performance and errors against that of radiologists. We provide preliminary results on extending the FST model for tumor search in virtual mammograms generated using 1/f noise.
Thus, the contributions of the dissertation are to further the implementation of computational cost savings for computer vision, to predict perceptual errors of humans, and provide a computational tool to study human vision/cognitive science in the wild.:
Hosted by: Prof. Miguel Eckstein
Submitted by: Aditya Jonnalagadda <email@example.com>