Preliminary Program at a Glance (pdf version)

Please note: at the Cosmos Club, Gentlemen are expected to wear a coat and tie in the public areas of the Cosmos Club, and Ladies are expected to wear comparable attire.

Sponsored by

, , and
.

Wednesday 15 October 2008

Start Event
0800 Continental breakfast
0830 Workshop Kickoff
0840 Keynote 1: Gary Adams, daVinci Systems, speaking on "Who Moved my Pixels? - or - The Effective Use of Motion Estimation in the Repair and Restoration of Motion Picture Film"
0910 Session 1: Co-locating Multiple Images
1150 Lunch
1300 Session 1 continued
1400 Session 2: Data Techniques
1540 Session 3: Full Body Tracking, Biometrics, and Surveillance
1730 Poster Session and Reception

Thursday 16 October 2008

Start Event
0800 Continental breakfast
0830 Keynote 2: Prof. Joe Mundy, Brown University, speaking on "Change detection in the 21st century: exploiting vast image collection resources"
0900 Session 4: Biometrics - Head, Face, and Eyes
1200 Lunch
1330 Keynote 3: Jonathon Phillips: NIST, discussing the Multiple Biometrics Grand Challenge
1400 Session 4 continued
1500 Session 3 continued
1730 Pre-Banquet Reception
1830 Evening Banquet and Speaker: John Delaney, National Gallery of Art, DC, "Visible and Infrared Imaging Spectroscopy of Paintings"

Friday 17 October 2008

Start Event
0800 Continental breakfast
0830 Keynote 4: Barbara O'Kane: Night Vision Lab, RDECOM, Principal Scientist for Human Performance, speaking on "Challenges in Human Sensing"
0900 Session 5: Medical - Imaging as a Biomarker
1020 Session 6: ATR
1200 Lunch
1300 Session 6 continued
1440 Closing Remarks



Detailed Program

Wednesday 15 October 2008

Start

Event

0800

Continental breakfast

0830

Workshop Kickoff
C. Cohen, General Chair

0840

Keynote 1: Gary Adams, daVinci Systems, speaking on "Who Moved my Pixels? - or - The Effective Use of Motion Estimation in the Repair and Restoration of Motion Picture Film"

0910

Session 1: Co-locating Multiple Images
Chairs: A. Williams (Harsh Environment Applied Technologies) and H. Rhody (Rochester Institute of Technology)

0910

Bandwidth Efficient Sensor Architectures with Multiple Feature Extraction
J. Caulfield (Cyan Systems), J. Elliot (Nova Sensors), J. Curzan (Nova Sensors), M. Massie (Nova Sensors), and P. McCarley (Air Force Research Lab, Eglin AFB)

Current generation sensors have very high data rates which rely on high speed image processors and can cause data bottlenecks and increase the support system electronics size to where high resolution sensors are hard to integrate on compact platforms. Cyan Systems is developing techniques for image processing that resides near the FPA Sensor. This compact image processor works to create localized processing suitable for in large format sensor in the visible and IR region.
We will demonstrate the ability to spatially downsample the output bandwidth and detect targets of interest. The algorithm has been developed with the ability for the near FPA electronics to automatically adapt the imaging and detection parameters to extract targets without losing the sensitivity or altering the false alarm or missed detection rate of existing off focal plane processing systems. Advanced FPA's with near FPA algorithms for pre-cueing have the potential to minimize the data throughput bottlenecks from very large format IRFPAs of 1024 x 1024 pixels and larger.
Improvements in Acuity processing concepts will be presented to include lower SNR detection and preservation of edges made possible by improvements in the dim target feature extraction.

0930

A spacial feature enhanced MMI algorithm for multi-modal wild-fire image registration
X. Fan and H. Rhody (Rochester Institute of Technology)

The integration of multispectral airborne imagery and geographic data for wildfire research and emergency response requires the 3D multiple view registration. Registration of maps, visible imagery and IR imagery, especially LWIR, is challenging because of the differences in brightness, color and features that are available in the different modalities. We have developed a semi-automated workflow for the registration and exploitation of this imagery and data that can produce quick-turnaround products for research and wildfire management. The techniques are based upon an enhancement of the conventional maximization of mutual information (MMI) algorithm through an efficient utilization of feature information. This technique largely overcomes the problems that arise from uncorrelated variations in pixel intensity between visible sensors, LWIR sensors that respond to temperature variations, and artificial colorations present in maps. A measure of registration confidence based upon the kurtosis of search space has been developed to enable operators to be cued to examine suspicious results produced by the semi-automated workflow algorithms. Experiments on real wild-fire imagery demonstrate the performance of the techniques.

0950

Coffee Break

1010

Real-time, Multiple Hot-Target Identification and Multi-Spectral Fusion
M. Khadaria, M. Pusateri (Pennsylvania State University), and D. Siviter (Harsh Environment Applied Technologies)

Night vision systems, consisting of intensified visible light imagers and various infrared band imagers, are widely deployed. Long-wave infrared imagers are useful due to their ability to detect the thermal emissions associated with human body temperature. However, long-wave infrared images provide an unnatural view of the world that causes the user to spend additional time interpreting the image, slowing their response.

New digital night vision systems are available with multiple apertures providing both intensified visible and long-wave infrared imagers. While color fusion algorithms have been used to combine both images, reviews of the resulting fused imagery have been mixed. An alternative fusion approach is to display the intensified image and provide graphic cues in the imagery as to the location of hot-targets.

In this paper we report on our hot-target detection algorithm and the method of cueing we use to identify locations of hot-targets. Multiple hot-targets are detected using edge and intensity information from the long-wave imagery. Detected blobs are then classified as hot-targets based on blob extent to minimize false alarms. The algorithm was targeted for use in a head mounted multi-spectral night vision systems that have stringent storage, computational and power requirements. The detection algorithm is independent of inter-frame information. Most operations in the algorithm are binary to minimize the computational, memory, and power requirements for real-time video processing. We will present test videos of the algorithm applied to various long wave scenes; the scenes were captured with a 320x240 sensor, using a 40° horizontal field of view. Acceptable detection results were obtained for multiple hot-targets within a 100 meter range.

1030

Fused Exploitation of LIDAR Point Clouds and Hyperspectral for Improved Target Detection
D. Messinger, M. Foster, and J. Schott (Rochester Institute of Technology)

Recent work has demonstrated the utility of performing rare point target detection in hyperspectral imagery in the calibrated radiance domain, instead of the (atmospherically compensated) estimated surface reflectance domain. This is achieved by using a physics-based forward model to predict how the target signature (reflectance) will be manifested in the radiance image under a variety of partially known atmospheric conditions. Typical methods use MODTRAN to model the atmospheric contributions to the measured signal, but ignore or over simplify any local geometric impacts on the signal. In this work, we present a methodology that utilizes a co-temporal, but low resolution LIDAR point cloud to improve target detection by constraining the geometric terms in the forward model. The three dimensional LIDAR point cloud is used to estimate, on a per-pixel basis in the hyperspectral image, fractional solar illumination, exposure to the full sky, surface normal, and subpixel presence of a target based on three dimensional target detection of a geometric target model. The method employed for the spatial target detection is subject to high false alarm rates, but when combined with the spectral information can be used to mitigate against challenging scenarios such as targets in hard canopy shadows. The method has been demonstrated on both real and synthetic data and examples will be presented for each.

1050

Image Fusion with Multiband Linear Arrays
M. Michelizzi, C. Maraviglia (Naval Research Laboratory), and K. Cox (Space/Ground Systems Solutions)

The Naval Research Lab (NRL) has been studying the application of multiband linear array sensors to develop image understanding algorithms. There are a number of applications for this work including collision avoidance of ships in a maritime environment. The scanning linear array sensors used for this study include bands in the infrared (IR) and a visible band line scanner.

The system architecture for the multiband linear array image fusion system will be described including the sensors and a custom gimbal that scans the area around the ship. Actual sensor imagery will be shown describing the image registration and scan to scan issues from multiple linear array sensors.

Algorithms for searching, acquiring, and tracking targets while continuing to scan using the multiple scanned linear arrays will be described. Imagery from the individual modes of operation will be shown. An alternate approach to this problem is the use of multiband two dimensional focal plane arrays (FPAs). The advantages and disadvantages of using linear arrays verses a two dimensional FPA for multiband image fusion and understanding will be discussed in this paper.

1110

Real-time mapping and navigation by fusion of multiple electro-optic sensors
R. Sandoval, M. Pusateri, J. Fry, D. Lesutis (Pennsylvania State University), and J. Siviter (Harsh Environment Applied Technologies)

Autonomous vehicle control can benefit from the abstraction of data from multiple sensors into an information stream containing only relevant aspects of the environment. We denote this information stream as a virtual environment; the virtual environment differs from the real environment in that only objects relevant to the vehicle are identified and mapped in the virtual environment. We are jointly developing an autonomous surface vehicle (ASV). The ASV will compete in the 2009 ASV challenge sponsored by the Association for Unmanned Vehicle Systems International and Office of Naval Research.

The ASV is equipped with a sensor suite including two forward looking color CCD cameras. By applying image processing and computer vision techniques, such as edge and blob detection, stereo image disparity, the ASV fuses its electro-optical data with its other sensor data to generate a virtual reconstruction of the elements of the real world relevant to the competition tasks. The virtual environment provides all information used in task and objective completion, including waypoint and buoy navigation, target identification and elimination, friendly identification and recovery, docking, and obstacle avoidance. The fusion of sensor data to form a virtual environment is a prime goal in the ASV's design which allows for the simplification of task, objective and behavioral programming. In this paper, we will report on the design of the virtual environment fusion algorithm for waypoint and buoy navigation. We will discuss the results and techniques used in the virtual reconstruction.

1130

Comparison of 2D Median Filter Hardware Implementations for Real-Time Stereo Video
J. Scott, M. Mushtaq, M. Pusateri, and H. Dhawan (Pennsylvania State University)

The two-dimensional spatial median filter is a core algorithm for impulse noise removal in digital image processing and computer vision. While the literature presents several analyses of median filters optimized for a standard 3x3 pixel neighborhood configuration, a 5x5 neighborhood, useful for imagery exhibiting noise not conforming to the classic "salt and pepper" formation, has received little analysis. Research efforts on hardware implementations of median filters have been devoted primarily toward implementations with low latency and high throughput. In the application we are investigating, the stereo visible near infrared sensors will both require a 5x5 median filter. Since the system is a battery powered unit, optimal power usage is a critical requirement. However, optimal power usage for median filtering has received little attention in the literature.

In this paper, we focus on investigating four selected hardware implementations of a 5x5 median filter on the basis of power efficiency. Power efficiency is extremely important when designing image fusion algorithms for night vision goggles; battery weight must be minimized without compromising operation time. We also analyze the latency, maximum clock rates, and resource utilization for these implementations. The designs include implementations of merge sort and radix sort-based elimination algorithms, common in software implementation of median filters, and a systolic sorting array and a Batcher sorting network, common hardware sorting techniques. All designs were created in the Altera Quartus-II environment for Stratix-II field programmable gate arrays (FPGAs), and were designed to be fully pipelined, accepting input sets and generating median filter output values every pixel clock pulse. Of the four considered designs, the Batcher network is a clear winner in power efficiency. Also, the Batcher network exceeds the functional and performance requirements for resource usage, latency, and clock rate.

1150

Lunch

1300

Session 1: Co-locating Multiple Images (continued)
Chairs: A. Williams (Harsh Environment Applied Technologies) and H. Rhody (Rochester Institute of Technology)

1300

Hyper-Spectral Content Aware Resizing
J. Scott, R. Tutwiler, R. Collins, and M. Pusateri (Pennsylvania State University)

Image resizing is performed for many reasons in image processing. Often, it is done to reduce or enlarge an image for display. It is also done to reduce the bandwidth needed to transmit an image. Most image resizing algorithms work based on principles of spatial or spatial frequency interpolation. One drawback to these algorithms is that they are not image content aware and can fail to preserve relevant features in an image, especially during size reduction. Recently, a content aware image resizing algorithm, called seam carving, was presented by Avidan and Shamir.

In this paper we discuss an extension of the seam carving algorithm to hyper-spectral imagery. For a hyper-spectral image with an MxN field of view and with P spectral layers, our algorithm will identify a one pixel wide path through the image field of view containing a minimum of information and then removes it. This process is repeated until the image size is reduced to the desired dimension. Information content is assessed using energy and power metrics; several such metrics have been tested with varying results. The resulting carved hyper-spectral image has the minimum reduction in information for the resizing based upon energy and power metrics used to quantify information. We will present the results of seam carving applied to imagery sets: three spectra imagery captured with VNIR, SWIR and LWIR cameras and ten spectra imagery generated synthetically.

1320

Remote Sensing Data Assimilation in Environmental Models
A. Vodacek, A. Spivey (Rochester Institute of Technology), Y. Li (Rapiscan Systems), and A. Garrett (Savannah River National Laboratory)

Remote sensing images typically provide a two dimensional snapshot of a three dimensional and time varying world. Numerical physics-based models of the environment can provide time varying predictions of processes in three spatial dimensions, but these models are subject to increasing error as time progresses. Data assimilation is the term used to describe various numerical techniques for incorporating new data over time into an executing model and thereby reducing prediction errors. We describe an example of remote sensing data assimilation using the Ensemble Kalman Filter, to illustrate some of the general procedures and requirements of this approach.

1340

Automated Image Registration to 3-D Building Models
K. Walli and H. Rhody (Rochester Institute of Technology)

This paper develops a technique for the registration of multisensor/multimodality images to 3-D models utilizing the KML encoding standard. Given the recent advancements in 3-D online visualization of geographic information utilizing such software tools as Google Earth, Microsoft Virtual Earth, and NASA's Whirlwind, it is becoming increasingly necessary to understand and utilize the capabilities of these impressive software tools in the field of remote sensing.

Since 2-D registration will always be limited by the effects of viewing geometry and occlusion. This approach orients the scene based model to the same viewing perspective as the remotely sensed image to enable traditional 2-D registration techniques. Once this is accomplished the 3-D ambiguity between the model and the image can be removed and the image can be utilized as a texture map on the model.

This approach relies on either apriori sensor information or scene based content to estimate the proper 3-D scene orientation relative to the remotely sensed image. Once the initial model orientation is estimated, an iterative approach to improve the model-to-image projection can be implemented. Multiple methods will be discussed to test the accuracy of the resulting model-to-image registration.

1400

Session 2: Data Techniques
Chair: Jim Aanstoos (Mississippi State University)

1400

Evaluation of Compression Techniques for Wide-area Video
A. Perera (Kitware)

Very large aerial video collectors, such as Constant Hawk, Angel Fire, and the planned ARGUS-IS, are of increasing interest. These sensors have very large effective focal plane arrays, and can generate a tremendous amount of data. For example, the ARGUS-IS system will generate about 425 Gigabits/second of image data. This presents significant storage challenge for the onboard storage, and also a significant challenge for transmitting the data down for real-time access. One way to address this challenge is through compression of the imagery. However, one must be confident that the compression does not cause any loss of intelligence in the imagery. This paper presents the results of an evaluation of the performance of different compression algorithms. We evaluated both single image compression algorithms (such as JPEG2000) and video compression algorithms (such as MPEG-4 AVC). In general, we found that video compression produces 3 to 5 times more compression than the single image compression, at equivalent quality. The quality was measured using the Structural Similarity metric. We also found that the stream can be compressed about 100 times without a perceptual loss in quality. This is an insufficient amount of compression to transmit the entire image down in real-time. For this, we would need to obtain 1000 to 2000 times compression, which will necessarily cause a loss in image quality. The loss in image quality does not necessarily mean loss of intelligence, and we argue that the quality of the compressed stream must be evaluated using a task-based metric. We also present some approaches to achieving the 1000 to 2000 compression factors. (Approved for public release, distribution unlimited.)

 

1420

Fault Tolerant Integrated Information Management Support for Physically Constrained Iterative Deconvolution
S. Spetka and G. Ramseyer (State University of New York Institute of Technology)

Multiple image processing algorithms are often required to process computer vision inputs. The rapid processing of complex image streams requires more computing power than is found in a typical PC based computer or workstation, and the processing power of high-performance computers (HPCs) and Linux clusters have been required to do this type of rapid massive processing. Emerging multicore processors offer the possibility of doing these types of processing at the PC level in real time. The Physically Constrained Iterative Deconvolution (PCID) algorithm is a multi-frame blind deconvolution (MFBD) parallel algorithm that allows the extraction of simple and complex information from multiple images. Massive computing power is required to use this algorithm in real time. Message Passing Interface (MPI) is normally used with PCID for communications between processors in multiprocessor systems. However, MPI has fault tolerant issues. A tool to replace MPI for multiprocesser communications has been developed that supports a high degree of fault-tolerance, and facilitates multiple image processing by integration with a publication/subscription infrastructure. This tool is demonstrated here for the PCID algorithm. Other attributes of MPI and this tool's publication/subscription information management support for PCID are compared and contrasted.

1440

A Nonlinear Manifold Learning Framework for Real-Time Motion Estimation Using Low-Cost Sensors
L. Xie, Y. Cao, and F. Quek (Virginia Polytechnic Institute and State University)

We propose a real-time motion synthesis framework to control the animation of 3D avatar in real-time. Instead of relying on motion capture device as the control signal, we use low-cost and ubiquitously available 3D accelerometer sensors. The framework is developed under a data-driven fashion, which includes two steps: model learning from existing high quality motion database, and motion synthesis from the control signal. In the model learning step, we apply a non-linear manifold learning method to establish a high dimensional motion model which learned from a large motion capture database. Then, by taking 3D accelerometer sensor signal as input, we are able to synthesize high-quality motion from the motion model we learned from the previous step. The system is performing in real-time, which make it available to a wide range of interactive applications, such as character control in 3D virtual environments and occupational training.taking 3D accelerometer sensor signal as input, we are able to synthesize high-quality motion from the motion model we learned from the previous step. The system is performing in real-time, which make it available to a wide range of interactive applications, such as character control in 3D virtual environments and occupational training.

1500

Coffee Break

1520

An Edge Detection Technique in Images
A. Awad and H. Man (Stevens Institute of Technology)

A novel edge detection method based on a neighborhood similarity criterion is presented in this paper. In this algorithm, the pixels in the original image that have minimum numbers of similar pixels among its neighboring pixels in the filtering window are labeled as edge pixels. Simulation results show that this approach performs well in noise-free images but it is superior to the others in images corrupted by AWGV. Moreover, the algorithm is fast and has low computational complexity.



1540

Session 3: Full Body Tracking, Biometrics, and Surveillance
Chair: John Irvine (Draper Laboratory)

1540

Modeling and Managing Spatiotemporal Information in Distributed Video Surveillance
P. Agouris, G. Cervone, P. Franzese, J. Radzikowski, and M. Sonwalkar (George Mason University)

In this paper we address the issue of object tracking and sensor management in distributed networks of video sensors. We consider persistent surveillance applications, and two types of sensors deployment techniques:
* fixed video sensors distributed in a large urban environment (e.g. located on rooftops), tracking moving targets (individuals or cars)
* video cameras on-board UAVs, tracking moving targets in a suburban or rural environment.

In this paper we present our approaches to:
* Model the spatiotemporal activities of moving targets, in order to support intelligent analysis (e.g. comparing patterns of behavior), and relevant database issues. Towards this goal we use the model of the spatiotemporal helixes as a concise description of spatiotemporal activities, and relevant solutions to support spatiotemporal helix analysis (e.g. similarity assessment).

* The use of innovative approaches in order to support the distribution and dynamic repositioning of a fleet of moving sensors (e.g. on-board UAVs) in order to optimize the coverage of the covered area. We proceed through the use of innovative information landscapes to describe current and predicted target positions, and present techniques that allow the repositioning of sensors.

In the paper we present theoretical models and early experimental results.


1600

Behavior Recognition for Surveillance Applications
C. Cohen, K. Scott, M. Huber, S. Rowe, (Cybernet Systems Corporation), and F. Morelli (Army Research Lab)

Differentiating between normal human activity and aberrant behavior via closed circuit television cameras is a difficult and fatiguing task. The vigilance required of human observers when engaged in such tasks must remain constant, yet attention falls off dramatically over time. In this paper we propose an architecture for capturing data and creating a test and evaluation system to monitor video sensors and tag aberrant human activities for immediate review by human monitors. A psychological perspective provides the inspiration of depicting isolated human motion by point-light walker (PLW) displays, as they have been shown to be salient for recognition of action. Low level intent detection features are used to provide an initial evaluation of actionable behaviors. This relies on strong tracking algorithms that can function in an unstructured environment under a variety of environmental conditions. Critical to this is creating a description of “suspicious behavior” that can be used by the automated system. The resulting confidence value assessments are useful for monitoring human activities and could potentially provide early warning of IED emplacement activities.

1620

A Hybrid Scoring Approach for Human Candidate Selection in IR Image Sequences
K. Byrd and M. Chouikha (Howard University)

This paper presents a new hybrid scoring approach for accurately determining and characterizing human candidates for Infrared (IR) Human Detection Systems. The scoring approach is a combination of binary and fuzzy set classification, work motivated by the need to precisely evaluate the performance of detection and classification algorithms, given partial candidate selections due to body-part occlusion, isolated hot spots and portions of selected regions of interest (ROI) no longer in the sensors field of view (FOV). The goal of this paper is to centralize the ideas and viewpoints of scientists as they formulate thoughts and calculate statistics related to human candidate misses and human candidate selections. We examine not only the sensitivity, specificity and accuracy of the candidate selection system, but also the area under the ROC Curve (AUC), F-Measure and Matthew's Correlation Coefficient (MCC). This new approach will hopefully lead to a standardized evaluation metric for computer vision/detection tasks.

1730

Poster Session and Reception
Chair: C. Maraviglia (NRL)

W-1

Identity Dominance: Using fingerprints to associate an individual to a larger social structure.
M. Loew and D. Herdegen (George Washington University and MITRE)

Fingerprint pattern and ridge count analysis of two separate population groups were used to associate an individual to a group through qualitative and quantitative comparison. The fingerprint data from the two groups was analyzed using a Classification and Regression Tree algorithm. Four separate trees were produced. The first tree separated the two populations using only finger number and pattern. Subsequent trees separated the two populations using finger number, pattern, and ridge count. Including ridge counts improved the per-finger classification from 56.4% to 73.9% and 79.5% for right and left loop patterns respectively. Whorls with both ridge counts improved the classification accuracy to 83.3%. The classification accuracies provided the basis for determining the probability of correctly associating a person to one of the two groups. For each finger, the probability of correctly associating the finger to the group is binomially distributed based upon the classification probabilities. Association is based upon a majority vote. In the worst case with only finger pattern and finger number available, the expected probability of correctly associating the individual is 54.1% using all ten fingers. Adding ridge counts raises the lower bound to 90.8%. The upper bound using whorls with two ridge counts is 98.4%. Between these two extremes are cases in which the patterns vary among the fingers. However, since the probability of correctly associating the individual to the city depends on the data available, cases where the fingerprint patterns or the deltas are not discernable reduce to the probability of correct association accordingly.

W-2

Tear Duct Detector for Identifying Left versus Right Iris Images
R . Abiantun and M. Savvides (Carnegie Mellon University)

In this paper, we present different pattern recognition approaches for automatically detecting tear ducts in iris acquired eye images for enhancing iris recognition and detecting mislabeling in datasets. Detecting the tear duct in an image will tell an iris recognition system whether the presented eye image is that of a left or a right eye. This will enable the iris matcher to match the enrolled image against images in the database belonging to the same side, thus reducing the error rates by eliminating the chance of matching a left iris to a right iris or vice-versa. This is a major problem in many single iris imaging acquisition devices currently deployed in the field where the data recorded is mislabeled due to human error. We present several techniques of detecting tear ducts, including boosted Haar features, support vector machines (SVM), and more traditional approaches like PCA and LDA. Finally, we show that tear duct detection improves the detection of left/right iris recognition over previous approaches.

W-3

MirrorTrack - A Real-Time Multiple Camera Approach for Multi-touch Interactions on Glossy Display Surfaces
P. Chung, B. Fang, F. Quek (Virginia Polytechnic Institute and State University)

This paper presents a real-time multiple camera approach for multi-touch interaction system that takes advantage of specular display surface (such as conventional LCD displays) and the mirror-effect in a low-azimuth camera angle to detect and track fingers their reflections simultaneously. Building on our prior work, 1. We use multi-resolution processing to greatly improve runtime performance of the system; 2. We employ different edge detection and pattern recognition algorithms for different processing resolution to help detect fingers more accurately and efficiently; 3. We track both the location of a fingertip and its pointing direction so it can be identified more effectively; 4. We use a full stereo algorithm to compute finger locations in the 3D space more accurately. Our system has many advantages. 1. It works with any glossy flat panel display; 2. It avoids clumsy set-up time of a top-down camera with the concomitant screen glare problems; 3. It supports both touch and hover operation; 4. It can work with large vertical display without the usual occlusion problems. We describe our approach and implementation in detail and present our experiment results.its pointing direction so it can be identified more effectively; 4. We use a full stereo algorithm to compute finger locations in the 3D space more accurately. Our system has many advantages. 1. It works with any glossy flat panel display; 2. It avoids clumsy set-up time of a top-down camera with the concomitant screen glare problems; 3. It supports both touch and hover operation; 4. It can work with large vertical display without the usual occlusion problems. We describe our approach and implementation in detail and present our experiment results.

W-4

Automatic Pain Recognition from Video Sequences using SVM
M. Monwar and S. Rezaei (University of Calgary)

In recent years, a number of studies have begun to investigate the neural substrates for perceiving facial expressions, using neuroimaging and other modalities. Specific expressions that have been studied include those of fear, anger, sadness, happiness, surprise and disgust. However, one basic category of facial expression that has not yet been investigated is that of pain. Facial expressions of pain have been the focus of considerable behavioral research. Such work has documented that pain expressions, like other affective facial expressions, play an important role in social communication. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. The pain affected portions of the face are obtained by using a mask image Then we identify 30 points on the eye and mouth regions, as for almost all of the people, these portions of face are affected due to pain (specially, brow lowering, orbit tightening, and raising of the upper lip), and calculate the displacement of these regions from normal image to painful image during pain. These displacement vectors are used as input to a Support Vector Machine classifier which uses statistical learning theory instead of heuristics or analogies with natural learning systems. We employ incrementally trained approach for support vector machine classifier, which speeds up the matching time as well as reduces the overhead involve with the training phase. It also enables us to use the previously stored data for later use. In addition, it allows for convenient combination of training data from multiple individuals to accomplish person-independent classification. We evaluate our method in terms of recognition performance for a variety of classification scenarios and compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system and can be effectively used in health care sector.

W-5

Multi Features Hybrid Active Shape Models for Lips Contour Tracking in Video Sequences.
Q. Nguyen and M. Milgram (University Pierre and Marie Curie)

Lip tracking has been extensively studied in recent years because it can significantly improve the performance of the automatic speech recognition and face recognition systems especially in a noisy environment. In this paper, we propose and evaluate a novel method for enhancing performance of lips contour tracking, which is based on the concept of active shape models (ASM) proposed by Tim Cootes, and multi features. On the first image of the video sequence, lip region is detected using the Bayesian's rule in which lip colour information is modelled by using the Gaussian Mixture Model (GMM). The GMM is trained by Expectation-Maximisation (EM) algorithm. Lip shape model is initialized in the detected region and then it converges upon lip contours. A single feature-based ASM presents good performance only in particular conditions but gets stuck in local minima for noisy conditions (like beard, wrinkle, poor texture, low contrast between lip and skin, etc). To enhance the convergence, we propose to use 3 features: normal profile, grey level patches and Gabor wavelets, and combine them by using a voting approach. The ASM is not able to take into account temporal information from previous frames therefore the lip contours are tracked by replacing the standard ASM with a hybrid active shape model (MF-HASM) which is capable to take advantage of the temporal information. Initial experimental results on video sequences show that MF-HASM is more robust to local minimum problem and gives a higher accuracy than traditional single feature-based method in lip tracking problem.

W-6

Low-Cost, High-Speed Computer Vision Using NVIDIA's CUDA Architecture
S. Park, S. Ponce, J. Huang, Y. Cao, and F. Quek (Virginia Polytechnic Institute and State University)

In this paper, we introduce real time image processing techniques using modern programmable Graphic Processing Unit (GPU). GPU is of SIMD (Single Instruction, Multiple Data) implementation that is inherently data-parallel computing device. By utilizing NVIDIA's new GPU programming framework, "Compute Unified Device Architecture" (CUDA) as a computational resource, we realize significant acceleration of image processing algorithm computations. We show that a range of computer vision algorithms map readily to CUDA with significant performance gains. Specifically, we present parallelization and optimization strategies for Canny's edge detection algorithm, and demonstrate the efficiency of our approach by applying it to a computation and data-intensive video motion tracking algorithm known as "Vector Coherence Mapping" (VCM) algorithm. Our results show the promise of using such common low-cost processors for intensive computer vision tasks.

W-7

Exploitation of Massive Numbers of Simple Events
R. Rimey and D. Keefe (Lockheed Martin)

Emerging image-based sensor systems can observe a relatively large area (e.g., the size of an urban neighborhood) for long time intervals either continually or with high revisit rates. This type of sensor data makes new types of exploitation possible, but only with the assistance of automated exploitation aids because of the massive volume of data that must be studied as a whole. Automated methods to extract the simplest events from image sequences are often fairly robust (e.g., change events derived from EO or SAR imagery or from video-derived tracks). Massive numbers of such events (available from emerging wide-area persistent surveillance sensor systems) can contain information of high intelligence value. This paper examines this general-purpose problem: How massive numbers of the simplest sensor-derived events can be exploited. We summarize the basic functionality an intelligence analyst needs for studying this type of event data, in short to understand the spatial structure, temporal structure and event-pair structure within an area of regard. Then we present several algorithms for automated exploitation of such data. The first set of algorithms detect activities in lower-level space- and time-varying features, which we also show can be exploited with the aid of visualization tools. The second set of algorithms, a varient of probabilistic Latent Semantic Analysis (pSA), describes and exploits local temporal structure in the observed events. Our techniques are experimentally validated using one simulated and two real datasets of sensor derived events, which contain all change events in an suburban Iraqi neighborhood over many weeks, in an outdoor marketplace over several months, and inside a building over several weeks. These three seemingly different problem domains share some characteristics in terms of the spatial-, temporal- and event-pair structure of normal activity patterns.

W-8

Multicamera- Multispectral Video Library - An Algorithm Development Tool
E williams (Harsh Environment Applied Technologies), M .Pusateri (Pennsylvania State University), and D. Siviter (Harsh Environment Applied Technologies)

HEAT has developed a ground-based forward-looking multispectral data collection system mounted on a rugged All Terrain Vehicle (ATV) to allow recording imagery while moving over rough terrain. The image data collected from multiple bands of the Electro-Optical/Infrared (EO/IR) spectrum is used to aid image fusion algorithm development for applications such as night vision goggles. The existing system consists of VNIR, SWIR, and LWIR cameras mounted on a ruggedized Pan/Tilt, a rack-mount PC with a frame grabbers to capture digital images, and a 4 TB RAID for real-time image storage. The system can also record meteorological data and GPS information synchronized with the imagery.

HEAT has developed a methodology for algorithm development using imagery and other important parameters about the scene of interest. The imagery collected by the data collection system during field exercises is stored in a database of imagery; the imagery can then be replayed into a model running in MATLAB on a desktop PC in the lab. The synchronized raw imagery and meteorological data would be provided as inputs to the model. The model is used to develop image fusion algorithms to display the best possible fused image to the human eyes. Also, target identification algorithms are developed and optimized for best probability of detection with lowest false alarm rate for a computer vision system. The optimized algorithms for both displaying to the human eyes and to computer aided target tracking can then be ported to a rugged FPGA-based system to deploy in the real world environment. Sample raw imagery input from the cameras into the data collection system will be shown. Examples of fused imagery created by the fusion algorithms will also be shown.


Thursday 16 October 2008

Start

Event

0800

Continental breakfast

0830

Keynote 2: Prof. Joe Mundy, Brown University, speaking on "Change detection in the 21st century: exploiting vast image collection resources"

0900

Session 4: Biometrics - Head, Face, and Eyes
Chair: R. Vorder Bruegge (FBI)

0900

Boosted Multi Image Features for Improved Face Detection
R. Abiantun and M. Savvides (Carnegie Mellon University)

In this paper, we present novel approaches of automatically detecting human faces in images which is extremely important for any face recognition system. This paper expands on the traditional Viola-Jones approach by proposing to boost a plethora of mixed feature sets for face detection; we do this by adding non-Haar-like elements to a large pool of mixed features in an Adaboost framework. We show how to generate discriminative Support Vector Machine (SVM) type features and Gabor-type features (in various orientations and frequencies and central locations) and use this whole pool as possible discriminative candidate feature sets in modeling the patterns of a frontal view human face. This general and large-diversity pool of features is used to build a strong classifier and we show we can improve the generalization performance of the AdaBoost approach, and as a result improving the robustness of the face detector. We report performance on the MIT+CMU face database and compare the results with other published face detection algorithms. We also discuss processing times and speeding up methods to offset the increase in complexity in order to achieve face detection in real time.

0920

A Robust Segmentation Approach to Iris Recognition Based on Video
Y. Chen (?)

(Abstract available soon...)

 

0940
Coffee Break
1000

Tracking and Recognizing Multiple Faces using Kalman Filter and Modular PCA-based Methods
J. Foytik, P. Sankaran, and K. Asari (Old Dominion University)

Tracking and recognizing multiple faces in complex environments has the ability to provide efficient security automation to large areas, such as ports. Such a system could provide real-time analysis of important individuals or notification of unwanted people. Previous research has shown that Kalman filter techniques paired with the Viola-Jones face detection algorithm can be used to successfully track one or more faces in a viewing region. However, these methods have relied on basic template face matching techniques and even cloth analysis of the shirt region for each tracked person. These techniques provide reasonable results in certain scenarios, but fail to reliably distinguish between tracked people under variant conditions. A real-time face tracking and recognition system, capable of processing multiple faces simultaneously, is presented. As in previous related work, the system performs face detection using the Viola-Jones algorithm, which is input as measurement values in a Kalman visual tracking framework. Modular Principal Component Analysis (MPCA) is used to quickly create a basic feature subspace, trained only using face images obtained during on-line processing, to distinguish the difference between currently tracked faces. These low-level face recognition and Kalman systems allow multiple people to be tracked and thoroughly analyzed by a higher-level face recognition subspace. This subspace is created using face images from a large database of people and processed off-line using Adaptively Weighted Modular Principal Component Analysis (AWMPCA). The overall system is shown to provide reliable tracking of more than one person and obtain a more accurate recognition rate due to the ability to create a time-average of the recognized faces.

1020

Investigating Useful & Distinguishing Features Around the Eyelash Region
H. Lai, M. Savvides, and T. Chen (Carnegie Mellon University)

Biometric identification is very important for National Security. Traditionally Biometric identification is based on single modalities, like face, iris or fingerprint. However when only face or iris is available and partial data is present, one has to look at additional cues to be able to infer if a match occurs. This paper explores soft-biometrics, by finding additional cues around the eye-lash region, such as analyzing eye-lashes and direction of these eye-lashes as possible discriminators for matches. It can be observed that many cultures have different types of eye-lashes and eye soft biometrics (like eye-fold). When looking at possible matches, these features can be used to declare or dismiss a match. In fact, one can also infer different population based on the type of eye-lashes, it is observed that in Eastern-Asian (e.g. Chinese) typically have eye-lashes that are pointed downward whereas for Western population this is not the case. The same can be said for detecting!
eye-folds. We present automatic algorithm for image processing and segmenting eye-lash, and direction for fast biometric binning and extracting distinguishable for features that can be used to declare a match or the exclusion there-of. We show results from the Iris-Challenge-Evaluation (ICE) dataset from NIST and CASIA on the performance of this automatic feature selection.

1040

Integrating Mono-Modal Biometric Matchers through Logistic Regression Rank Aggregation Approach
M. Monwar (University of Calgary)

Biometric system relies on person's behavioral and/or physiological characteristics as an alternative means of person authentication (traditional means being password, smart card, ID etc.). However, biometric system based solely on a single biometric may not always meet security requirements. Thus multibiometric systems are emerging as a trend which help in overcoming limitations of single biometric solutions, such as when a user does not have a quality sample to present to the system (an individual with a cold attempts to authenticate to a voice recognition system), and reduces the ability of the system to be tricked fraudulently. A reliable and successful multibiometric system needs an effective fusion scheme to integrate the information presented by multiple matchers. In this research, I integrate results of the three mono-modal biometric matchers face, ear and iris with the Logistic Regression approach of rank level fusion method. The face matcher uses an improved Bayesian approach in which the difference between two face images is modeled by three components: intrinsic difference (I) that discriminates different individuals; transformation difference (T) caused by such transformations as lighting or expression changes; and random noise (N). The ear matcher uses Active Shape Models (ASMs) to model the shape and local appearance of the ear in a statistical manner. In addition, steerable features, which encode rich discriminant information of the local structural texture and provide accurate guidance for shape location is also be extracted from the ear image ahead of ASMs. Steerable features Eigenearshape is used for final classification. The iris matcher uses support vector machines. Canny's edge detection and the Hough transform is used to find the iris/pupil boundary and a simple thresholding method is employed for eyelash detection. The Gabor wavelet technique is deployed in order to extract the deterministic features in the transformed iris of a person in the form of template. The extracted iris features then fed into a support vector machine (SVM) for classification. The novelty of my research lies in the consolidation of the outputs generated by these three matchers using the Logistic Regression approach of rank level fusion. Experiments results indicate that Logistic Regression method outperform Borda count method or Highest rank method. The system can be a contribution to the homeland security.

 

1100

Multiple Image Information Extraction
M. Pushpalatha (Srijayachamarajendra College of Engineering)

A general and efficient design approach using a wavelet basis function neural classifier to cope with small training sets of high dimension, which is a problem frequently encountered in face recognition is presented. In order to avoid over fitting and reduce the computational burden, face features are extracted by the principal component analysis (PCA) method. A hybrid learning algorithm is proposed which is used to train the wavelet neural networks so that the dimension of the search space is drastically reduced in the gradient paradigm. Although many algorithms have been proposed to configure conventional neural networks like RBF NN and Incremental learning radial basis function neural networks (RAN) for various applications including face recognition, here we would like to provide more insights into these algorithms and compare their performances with our method. Simulation results conducted on the ORL database show that the system achieves improved performance both in terms of error rates of classification and learning efficiency.

1120

Nonlinear Manifold Embedding of a Virtual 2D Face Database For Face Recognition
P. Sankaran, Q. Wang, and V. Asari (Old Dominoin University)

A nonlinear manifold approach on modeling face images with multiple orientations for face recognition is presented in this paper. Face images are treated as manifolds in state space instead of classical representation as fixed points. This approach helps to identify the underlying low dimensional pattern of the face set under consideration in a better way and place a new test point to a more accurate position in the state space when compared to a fixed point approach.

One of the requirements of such an approach is a large database of training faces covering all these multiple orientations so that a smooth manifold could be modeled. For this we propose the creation of a large 2D face database based on an analytical 3D face surface (Prototypical Face Model) using a few 2D images to start with. The model is now adjusted for illumination and color followed by a registration process to account for 3D to 2D conversion distortions. This is followed by shape and texture fitting resulting in a sequence of virtual 2D face images. These faces are now used to train the nonlinear manifold.

Multiple patterns are trained by modeling nonlinear manifolds for each pattern. A test face image can now be projected onto these trained manifolds and a minimum distance to manifold approach can be used to recognize the test case.

1140

Human Recognition Using Visible Iris and Face Images from a Single Source
R. Tompkins and K. Asari (Old Dominion University)

Here we a present an algorithm which uses high-resolution visible images and extracts face and iris regions in order to establish identity. The system uses component-based Viola Jones face detection to locate potential subjects. Face Recognition is performed using a subspace of face images from a large database of people and processed off-line using Adaptively Weighted Modular Principal Component Analysis (AWMPCA). Component locations indicate the approximate location of the iris, and the Hough transform is used to estimate the location and boundary of the iris region. This location is input as a measurement value into a conditional density propagation visual tracking framework, which is used to position the camera in order to maintain a view of the iris. After a perspective estimation and correction step, the segmented iris is transformed into a coordinateless rectangular region. The recognition step consists of a two dimensional PCA technique performed on overlapping patches of the transformed iris region. Identity is established using a weighted fusion of the iris and face recognition steps.

1200

Lunch

1330
Keynote 3: Jonathon Phillips: NIST, discussing the Multiple Biometrics Grand Challenge
1400
Session 4: Biometrics - Head, Face, and Eyes (continued)
Chair: R. Vorder Bruegge (FBI)
1400

Fractal Encoding of Low Resolution Iris Imagery for Improved Matching
T. Trebaol and M. Savvides (Carnegie Mellon University)

Fractals are geometric shapes that are composed of many parts, each of which is self-similar to the rest. A fractal can be constructed from one, or a set of simple shapes and by iteratively re-applying a series of affine transformations to each shape. The stochastically self-similar nature of the eye suggests that it may be appropriate to model iris images as fractals. Images encoded as fractals without compression can be decoded and enlarged with higher spatial resolution than images resized with pixel interpolation (such as bilinear, cubic or spline). In this paper we compare the iris matching performance of images encoded using fractal methods to iris matching performance achieved by traditional image resizing with different types of interpolation. The results are based on the Iris Challenge Evaluation (ICE) dataset from NIST and we compare with commercial grade fractal encoding methods using Adobe plug-ins as-well as fractal encoding methods found in literature. While this approach is shown to work in the domain of iris biometric (particularly in low resolution - at-a-distance iris image acquisition) , we also propose this in the future as a possible potential tool to improve low resolution facial images (single image super-resolution).

1420

A Comparative Study of the Multi-linear PCA for Face Recognition
J. Wang (Florida International University)

This is a comparative study among different methods for face recognition application. The purpose of this paper is to determine if multi-linear PCA method (also known as nD PCA) can provide higher accuracy and comparable processing time. The recognition accuracy and average running time among the methods (such as ICA, PCA, KPCA and 2D PCA) are compared. Also, the mathematical foundation for evaluating the computational complexity and the memory requirements for feature bases for each method are discussed.

Unfolding is an important concept in multi-linear PCA as it distinguishes the technique from 1D PCA for face recognition. In this proposed application, a 3D tensor is unfolded into two 2D matrices and the eigenvectors of these two 2D matrices are computed. The recognition process is reduced to evaluating the differences in norms between the image projections from training images and the image projection of the testing image. The smallest norm difference determines the most resemblance between the testing images and the images in the training set. In this application, the unfolding method is different from the traditional methods as Backward Cyclic and Forward Cyclic.

The AT&T database is used to test all of the above mentioned methods. The database consists of 40 subjects with 10 images each. Given the highest recognition accuracy among those methods, the multi-linear PCA has shown to hold the best or similar accuracies for a faster running time in contrast to all other methods. It should be noted that although multi-linear PCA has a slightly slower run time than the 2D PCA, it nonetheless provides a more accurate face recognition process.

1440

A Video-based Face Detection and Recognition System using Cascade Face Verification Modules
P. Zhang (Alcorn State University)

Face detection and recognition in a video is a challenging research topic as overall processes must be done timely and efficiently. In this paper, a novel face detection and recognition system using three cascade fast face verification modules and an ensemble classifier is presented. Firstly, the head of the tester is serially verified by our proposed three verification modules: face skin verification module, face symmetry verification module, and eye template verification module. The three verification modules can eliminate the tilted faces, the backs of the head, and any other non-face moving objects. Only the frontal face images are sent to face recognition engine. This verification strategy can facilitate the workload of face detection and recognition in a video process. In addition, the frontal face detection reliability can be adjusted by simply setting the verification threshold coefficients in the verification modules so that this mechanism is suitable for different applications. Secondly, three hybrid feature sets are applied to face recognition. A novel ensemble classifier scheme is proposed to congregate three individual Artificial Neural Network (ANN) classifiers trained by the three hybrid feature sets. A computationally efficient fitness function of genetic algorithms is proposed to evolve the best weights for the proposed ensemble classifier. Experiments demonstrated that the frontal face detection rate can be achieved as high as 95% in the low quality video images. The overall face recognition rate and reliability are increased at the same time using the proposed ensemble classifier in the system.

1500

Session 3: Full Body Tracking, Biometrics, and Surveillance (continued)
Chair: John Irvine (Draper Laboratory)

1500

A Survey on Behavior Analysis in Video Surveillance for Homeland Security Applications
T. Ko (Raytheon)

Surveillance cameras are inexpensive and everywhere these days but the manpower required to monitor and analyze them is expensive. Consequently the videos from these cameras are usually monitored sparingly or not at all; they are often used merely as archives, to refer back to once an incident is known to have taken place. Surveillance cameras can be a far more useful tool if instead of passively recording footage they can be used to detect events requiring attention as they happen, and take action in real time. This is the goal of automated visual surveillance: to obtain a description of what is happening in a monitored area, and then to take appropriate action based on that interpretation. Video surveillance for humans is one of the most active research topics in computer vision. It has a wide spectrum of promising homeland security applications. Video management and interpretation systems have become quite capable in recent years. This paper looks into how hardware and software can be put together to solve surveillance problems in an age of increased concern with public safety and security. In general, the framework of a video surveillance system includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, behavior understanding and description, and fusion of information from multiple cameras. Despite recent progress in computer vision and other related areas, there are still major technical challenges to be overcome before reliable automated video surveillance can be realized. This paper reviews developments and general strategies of stages involved in video surveillance, and it analyzes the feasibility and challenges for combining motion analysis, behavior analysis, and standoff biometrics for identification of known suspects, anomaly detection, and behavior understanding.

1520

Human Gesture Tracking Using an Agent-based Tracking System
B. Fang, P. Chung, and F. Quek (Virginia Polytechnic Institute and State University)

We present an agent-based motion tracking and gesture recognition system to generate motion data using stereo calibrated cameras. The novelty of our approach is that agents are bound to body-parts (bone structure) being tracked. These agents are autonomous, self-aware entities that are capable of communicating with other agents to perform tracking within agent coalitions. Each agent seeks for "evidence" for its existence both from low-level features (e.g. motion vector fields, color blobs) as well as from its peers (other agents representing body-parts with which it is compatible). Multiple agents may represent different "candidates" for a body-part, and compete for a place within a coalition that constitutes the tracking of an articulated human body. The power of our approach is the flexibility by which domain information may be encoded within each agent to produce an overall tracking solution. We demonstrate the effectiveness of tracking system by testing actions (random moving, and walking).

1540
Coffee Break
1600

Fast Classification of Indecent Video by Low Complexity Repetitive Motion Detection
T. Endeshaw, J. Garcia, and A. Jakobsson (Karlstad University)

This paper proposes a fast method for detection of indecent video content using repetitive movement analysis. Unlike skin detection, motion will provide invariant features irrespective of race and color. The video material to be evaluated is divided into short fixed-length sections. By filtering different combinations of B-frame motion vectors using adjacency in time and space, one dominant motion vector is constructed for each frame. The power spectral density estimate of this dominant motion vector over the short sections of video frames is then computed using a periodogram with a Hamming window. The resulting power spectrum is then subjected to a selection window to restrict the spectrum to an limited frequency range typical of indecent movement, as empirically derived by us. A threshold detector is then applied to detect repetitive motion in video sections. However, there are many instances where repetitive motion occurs in these shorter sections without the video as a whole being indecent. As a second step, an additional detector is employed to determine if the sections over a longer period of time can be classified as as having indecent material. The proposed method is resource efficient not requiring the IDCT step of the video decoding. Further, the computationally expensive spectral estimation calculations are done on only one value per frame. Evaluations performed using a restricted set of videos with different amounts of texture, lighting conditions and complex backgrounds show very promising results with high true positive probability (>85%) for a low false positive probability (<10%) for the intermediate repetitive motion detection. After the second longer sequence estimator the results were, for the limited testing set, close to ideal. As a third step additional selectivity, that is complementary but more resource demanding (for example color or audio analysis) could be employed to further decrease the probability of false positives.

1620

Dual IR Spectral Video Inspection of a Concealed Live Animal
M Hsu (George Washington University), K .Byrd (Howard University), C. Hsu (George Washington University), and H. Szu (NRL/ONR)

Multiple spectral videos have been widely used in different fields such as inspection, image synthesis, 3D objects modeling, obstacle detection, collision avoidance, artificial intelligence navigation, medical applications, etc. Almost all researches of multiple images restoration and registrations were focused on 3D rigid-body objects. In this paper, we presented a scheme for 3D deformable live animal whose passive ID-recognition was generated from a fusion, beyond the traditional affine registration via 3 control points, of a pair of long (8~12? wavelength) and middle (3~5?) Infrared (IR) video cameras, used similarly in early passive breast cancer detection[Szu et al. Patent 20040181375 "Nonlinear blind demixing of single pixel..."]. In the experiment, the living animal, hamster, was concealed in a nighttime environment. We could only observe by means of the cryogenic infrared spectral video cameras made by FLIR for variable frame rates (10~100). Because of the thermal transparency coefficient is a function of the spectral density, the traditional night vision camera, used at the airport inspection, may not be suited for a concealed hamster inside an airport carried-on box made by Petland. For example, we could only deduce the centered position of hamster by the LIR video camera in a nighttime plastic container, and the details motion, e.g. the salient tail feature of hamster, must be deduced by means of MIR video camera. The adaptive neighborhood histogram modification method is equivalent to a blurred low-pass, but statistically correct averaged result of the overall animal center. Therefore, it could be used as a local reference image segments for the restoration and registration of MIR imagery sequence locally. Such a pair of spectral video has generalized the original "self-reference local matched filters" video restoration technique, originally purpose by Szu in 1980 for day-video harbor surveillance imaging through water waves. In this paper we have provided a fast generalization of local instance of good seeing by local shift-add-reject algorithm of image registration for a live animal in a concealed environment.

1730

Pre-Banquet Reception

1830

Evening Banquet and Speaker

Visible and Infrared Imaging Spectroscopy of Paintings
John K. Delaney*, Jason G. Zeibel**, and Roy Littleton**
*Andrew W. Mellon Senior Imaging Scientist, Scientific Research Department, National Gallery of Art, DC.
**Night Vision & Electronic Sensors Directorate, US Army Research, Development & Engineering Command.

Imaging spectroscopy has been primarily developed for remote sensing of the Earth. In this talk we present our findings on the application to paintings in order to non-destructively identify and map artist materials as well as improve the visibility of under-drawings and preparatory sketches. Diffuse reflectance hyper-spectral images (0.4 to 1.65 microns) of paintings by P. Picasso, Giorgione, A. Derain and Leonardo da Vinci's Ginevra de' Benci were collected using novel cameras from the Night Vision & Electronic Sensors Directorate. The resulting image cubes were analyzed using the hyper-spectral tools from ENVI as well as Kubelka-Munk spectral fitting models in order to map and identify the major pigments. Comparison of results from imaging spectroscopy to those results from methods such as x-ray fluorescence spectrometry (XRF), and electron microscopy have been used to both validate these methods as well as to obtain a more complete understanding of the materials used in these paintings. Examination of the infrared images revealed improved visualization of original sketches and paint changes compared to those obtained with broad spectral band infrared imaging typically used by Art Conservators. To date, these results suggest that imaging spectroscopy can be an important in situ tool for the identification of materials and/or can serve as a guide for the selection of sites for further chemical analysis.

John Delaney is the Andrew W. Mellon Senior Imaging Scientist at the National Gallery of Art, where his research focuses on the development of in situ imaging methods for art conservation and understanding of the optical properties of varnishes. He received his PhD from The Rockefeller University and completed post-doctoral studies at the University of Arizona and The Johns Hopkins University School of Medicine. Prior to joining the National Gallery of Art he was the Chief Scientist and Systems Engineering Lead for the U-2 Business Unit of Airborne ISR Systems at Goodrich Corporation. Dr. Delaney has consulted with many museums in the area of infrared imaging for over 15 years. He has published 23 papers in the areas of imaging and spectroscopy.


Friday 17 October 2008

Start

Event

0800

Continental breakfast

0830

Keynote #4: Barbara O'Kane: Night Vision Lab, RDECOM, Principal Scientist for Human Performance, speaking on "Challenges in Human Sensing"

0900

Session 5: Medical - Imaging as a Biomarker
Chair: M. Loew (George Washington University)

0900

Non-Gaussian Models in Biomedical Imaging
R. Mangoubi and M. Desai (Draper Laboratory)

Most statistical models in applications rely on the Gaussian assumption. Yet, in many realistic situations, the underlying variation or uncertainty is essentially non-Gaussian. In detection problems, the Gaussian assumption leads to false alarms in cases where the tail is a fatter one, such as in the case of the Laplace density function. In classification problems, the Gaussian model for variability may be too restrictive, and other models, such as the generalized Gaussian density function, are more appropriate. We will present examples of such models as applied to applications with multiple images, and show why they improved detection and classification performance in two applications: functional magnetic resonance imaging, and stem cell classification.

0920

Localization of Fiducial Skin Markers in MR Images using Correlation Pattern Recognition for PET/MRI Nonrigid Breast Image Registration
D. Walvoord, K. Baum, M. Helguera (Rochester Institute of Technology), A. Krol (SUNY Upstate Medical University), and R. Easton Jr. (Rochester Institute of Technology)

In most instances, multiple-modality visualization of pathologies will present advantages over single-modality studies. For many medical imaging procedures, it is desirable to produce a "fused" output that simultaneously exhibits characteristics of the data from each individual modality to reduce the difficulty of the decision-making process for radiologists. Preprocessing for most data fusion algorithms typically provides the necessary registration of the input data (from each modality). Fiducial markers may be used to show common locations between the imaging modalities when the methods of image capture produce outputs with very different spatial structure, as is the case with MRI and PET imagery. The process of automating the detection of these markers has seen limited research in the medical field, and often requires manual selection throughout the 3-dimensional image stack by a human observer. The objective is to detect each marker (and locate its centroid location) in a noisy background containing additional objects with a large range of intensity values. Correlation methods employed must exhibit some "normalizing" characteristic to accommodate changes in the input image such that regions of high intensity that do not share similar spatial structure with the reference pattern are assigned low values in the output correlation plane, effectively reducing the false positive rate. The filter should accommodate within-class distortion, as the size and shape of the fiducial marker will vary through the image stack. For this work, a mean-subtracted MACH filter was constructed and applied to data that are mean-subtracted locally. The location of marker centroids in the output stack of correlation planes was determined by applying grayscale-morphology operations to extract regions-of-interest. It is apparent that a relatively high probability of detection is obtained for a wide range of thresholds for an acceptable false positive rate.

0940

Extra: Biometrics: - Head, Face, and Eyes
Chair: John Irvine (Draper Laboratory)

Image Enhancement for Minutiae-Based Fingerprint Identification

M. Sepasian, W. Balachandran, C. Mares, and M. Azimi (affiliation available soon)

The purpose of this paper is to investigate the performance of a three-step procedure for the fingerprint identification and enhancement, using CLAHE (contrast limited adaptive histogram equalization) together with 'Clip Limit', standard deviation and sliding neighborhood as stages during processing of the fingerprint image. Firstly, CLAHE with clip limit is applied to enhance the contrast of the small tiles existing in the fingerprint image and to combine the neighboring tiles using a bilinear interpolation in order to eliminate the artificially induced boundaries. In a second step, the image is decomposed into an array of distinct blocks and the discrimination of the blocks is obtained by computing the standard deviation of the matrix elements to remove the image background and obtain the boundaries for the region of interest. Finally, by using a slide neighborhood processing, an enhancement of the image is obtained by clarifying the Minutiae (endpoints and bifurcations) in each specific pixel, process known as thinning. The paper presents the motivation for developing this method, its phases, and its possible advantages through the simulate investigation.

1000

Coffee Break

1020

Session 6: ATR
Chair: J. Kretsch (National Geospatial-Intelligence Agency)

1020


Rapid Training of Image Classifiers Through Adaptive, Multi-frame Sampling Methods
R. Eaton, J. Lowell, M. Snorrason (Charles River Analytics), John M. Irvine (Draper Laboratory) and Jonathan Mills (AMRDEC)

Computer vision methods, such as automated target recognition (ATR) techniques, have the potential to improve the accuracy of military systems for weapon deployment and targeting, resulting in greater utility and reduced collateral damage. A major challenge, however, is training the ATR algorithm to the specific environment and mission. Because of the wide range of operating conditions encountered in practice, advanced training based on a pre-selected training set may not provide the robust performance needed. Training on a mission-specific image set is a promising approach, but requires rapid selection of a small, but highly representative training set to support time-critical operations. To remedy these problems and make short-notice seeker missions a reality, we propose Learning and Mining using Bagged Augmented Decision Trees (LAMBAST). LAMBAST examines large databases and extracts sparse, representative subsets of target and clutter samples of interest. For data mining, LAMBAST uses a variant of decision trees, called random decision trees (RDTs). This approach guards against overfitting and can incorporate novel, mission-specific data after initial training via perpetual learning. We augment these trees with a distribution modeling component that eliminates redundant information, ignores misrepresentative class distributions in the database, and stops training when decision boundaries are sufficiently sampled. These augmented random decision trees enable fast investigation of multiple images to train a reliable, mission-specific ATR. This paper presents the random decision tree framework, develops the sampling procedure for efficient construction of the sample, and illustrates the procedure using relevant examples.


1040

Efficient Real Time Object Detection and Localization in Images
O. Andrushchenko, F. Lure, and T. Ramsay (Guardian Technologies International Inc.)

An approach to automated recognition and localization of object of interest in real-time recorded images is described. The models generated through this process are very compact and require/generate fewer numbers of features and support vectors than models created by the standard SVM technique. This approach can be implemented in two ways: with and without a preliminary analysis of entire images. The former involves feature extraction from the entire images and a follow-up classification; the localization is conducted as a second step. The latter deals only with sub images of the entire images where the sub images are received as a result of a manual or automatic segmentation procedure. This real-time learning involves the determination of highly discriminative features between sub images of the object and other non-object image areas using a large-scale feature extraction. Next, a 3-stage feature selection procedure combined with a fast Support Vector Machine (SVM) classifier is employed to develop small models applicable in real-time settings. The efficiency of the proposed approach is demonstrated with real-world examples including mammography and other images.


1100

Understanding Computer Vision Challenges
C. Oertel (MITRE)

After nearly half a century of computer vision research, application-specific systems are common but the goal of developing a robust, general-purpose computer vision system remains out of reach. Rather than focus on the strengths and weaknesses of current computer vision approaches, this paper will enumerate and investigate the challenges that must be overcome before this goal can be achieved. Key challenges include handling variations in environment or acquisition parameters such as lighting, view angle, distance, and image quality; recognizing naturally occurring as well as intentionally deceptive variations in object appearance; providing robust general-purpose image segmentation and co-registration; generating 3-D representations from 2-D images; developing useful object representations; providing required knowledge that is not represented in the image itself; and managing computational complexity. Each of these challenges, along with their relevance to solving the vision problem, will be discussed. Understanding these challenges as a whole may provide insight into underlying mechanisms that will provide the backbone of a robust general-purpose computer vision system.

 

1120


Temporal Structure Methods for Image-Based Change Analysis
R. Rimey and D. Keefe (Lockheed Martin)

This paper addresses change analysis, the exploitation of massive numbers of image-derived change detections. We use the term "change analysis" to emphasize the intelligence value contained within large numbers of change detections, rather than the emphasis by most researchers to date on "change detection" and the intelligence value of isolated change detections. The work reported here addresses change detections from regularly collected images over long time intervals, such as an image each hour for several weeks, or an image each day for several months. Our methods emphasize local temporal descriptions and include minimal spatial information about activities. Our three methods adapt and extend: (1) the Hamid techniques [1]; (2) Latent Semantic Analysis (LSA); and (3) probabilistic LSA (pLSA). These methods allow us to: (a) Detect an activity as a deviation from normal activity, and describe each anomaly; (b) Discover categories of activity, describe a category of activity, and assign an activity to a category; (c) Find the most similar activity in a historical database. Our experiments utilize a webcam of an outdoor marketplace, measuring 100x200 feet, with images collected every few minutes over 74 days. We present experimental results that compare our methods (1)-(3) for performing functions (a)-(c). We discuss how our techniques are equally applicable for change analysis using wide-area imaging sensors. Reference: [1] R. Hamid, et al., "A Novel Sequence Representation for Unsupervised Analysis of Human Activities," To appear in AI Journal, http://www-static.cc.gatech.edu/~raffay/publications.htm.


1140

Lunch

1300

Precision Interpolation and Resampling for Multiple-Image Analysis
A. Schaum (Naval Research Laboratory)

The sub-pixel analysis of large volumes of multiple digital images requires precise methods of resampling. An analysis of current interpolators, especially commercial products, reveals an emphasis on cosmetics at the expense of accuracy. Some of the techniques are designed specifically to be misleading. Furthermore, noncommercial methods concerned with fidelity to the underlying data have been found wanting, even in univariate applications. We describe a new set of principles useful in designing a set of robust interpolators that are purely local, and hence massively parallelizable, and are more accurate than current ones. These principles also extend the current state of the art methods used on uniform grids to apply to arbitrarily sampled ones. They show, furthermore, why such methods have been unnecessarily constrained in the past and what the real constraints are. Some new issues arise for multivariate interpolation, but the new principles and methods are readily extended.

1320

Dominant Component Suppression for Background Suppression and Spectral Characterization of Image Anomalies
J. Sweet (McClendon)

This paper describes a novel approach to background suppression termed Dominant Component Suppression (DCS) that extends the basic concept in two ways. First, DCS adapts (via unsupervised clustering) to the major backgrounds or dominant components in the scene and significantly reduces the spectral signal from these interferers. This process brings the dataset closer to the often assumed multivariate normality and the simple data model of target and white noise. Second, the DCS algorithm produces residual spectra that have the same spectral dimensions and radiometric units as the input data thus facilitating spectroscopic exploitation of the data. This experiment uses an AVRIS hyperspectral image corrected to reflectance, which has a spectral target material linearly imbedded at a range of sub-pixel abundances. While artificial, this technique provides accurate ground truth for this exploratory experiment. Spectral Angle Mapper was applied to the reflectance and residual images and the results indicate that exploitation can be dramatically enhanced when conducted in residual space by reducing false alarms.

1340

Detection of Ephemeral Changes in Sequences of Images
J. Theiler (Los Alamos National Laboratory) and S. Alder-Golden (Spectral Sciences)

While the identification of "interesting" features in a single image is an almost hopelessly open-ended task, the detection of interesting _changes_ in a pair of co-registered images is a more feasible undertaking. The change detection problem is nonetheless confounded by pervasive differences (in illumination, calibration, registration, etc.) that are inevitable between images, but the anomalous change detection paradigm treats these differences as something that can be learned from the images themselves, and those changes that do not fit the pervasive pattern are identified as anomalous. It is these anomalous changes that are candidates for interesting features; ultimately, a human analyst decides what is _truly_ interesting, but the algorithm's job is to identify a short list of candidates for the analyst to investigate.

A recently developed machine learning framework extends the existing change detection methodology to arbitrary data distributions, and even for Gaussian distributions has been shown to exhibit improved performance. But anomalous change detection algorithms have so far considered only the problem of finding pairwise changes; when more than two images are available, there is an opportunity to exploit multiple correlations and produce a more effective change detector. One such global approach has been demonstrated using an RX anomaly detector. In our paper, we will show how to extend the machine learning approach to multiple images, enabling detection of an ephemeral change in one or a few of the images by exploiting the information in all of the images. This can be done in spite of the combinatorial number of ways a change might be present in a series of images.

1400

Overhead Image Statistics
V. Vijayaraj, A. Cheriyadar (Oak Ridge Laboratory), P. Sallee (Booz Allen Hamilton), B. Colder (Colder Scientific Solutions), R. Vatasavai, E. Bright, and B. Bhaduri (Oak Ridge Laboratory)

In this paper we study statistical properties of high-resolution overhead images for different land use categories. Various local and global statistical image properties based on the shape of the power spectrum, image gradient distributions, edge co-occurrence, and inter-scale wavelet coefficient distributions were computed and analyzed. Our analysis was performed on a database of high-resolution (1 meter) overhead images collected from different downtown, suburban, commercial, agricultural and wooded categories. We discuss how various statistical properties relate to these image categories and highlight their relationship. The variations in power spectrum contour shapes for different categories, unique gradient distribution characteristics of wooded categories, similarity in edge co-occurrence statistics for overhead and natural images and unique edge co-occurrence statistics of downtown categories are presented in this work. Though previous work on natural image statistics has showed some of the unique characteristics for different categories, such relationships for overhead images are not well understood. The statistical properties of natural images were used in previous studies to develop prior image models, to predict and index objects in a scene and to improve computer vision models. We envision that our research findings can be used to augment and adapt computer vision algorithms that rely on prior image statistics to process overhead images, calibrate the performance of overhead image analysis algorithms and derive features that give better discrimination among overhead image categories.

1420

Intelligent Multimodal Sensors Design with Hyperspectral Imaging for Tracking Moving Target in Real Time
T. Wang and Z. Zhu (City college of New York)

Real time moving target tracking and identification with hyperspectral imagery are very challenge with current devices and algorithms. The increased information content of hyperspectral imaging has enabled improved classification and quantification of targets of interest. However, recording hyperspectral data for target classification is very time consuming. We design a sensor platform with multi-modalities that consist of dual-panoramic (or omnidirectional) peripheral vision system and a narrow field-of-view hyperspectral fovea. Thus, we only need to capture hyperspectal images in regions of interest. This design is inspired by biological concept of the human vision system where the periphery vision of retina is used to detect motion and the center (or fovea) of retina is used to distinguish color and objects. The proposed intelligent sensors works as the follows. Two panchromatic images with 360 degree of field-of-view are generated by rotation two line scanners around a common rotating axis, pointing to two directions. Regions of interest having moving targets can be easily and quickly determined by applying background subtraction. The next position and time of a moving target can be roughly estimated from the difference of two regions of the target in the images of the two scanners. Then, a fovea hyperspectral imaging with high-resolution is directed only to each small region of interest that has a potential interesting target. A least-square difference with variations normalizing method is used to classify the target spectra with training spectra. Finally signatures of targets can then be efficiently and effectively determined in real time. We will evaluate our design and the proposed algorithm under different scenarios involving different targets and backgrounds using a hyperspectral scene simulation tool. Important issues such as multimodal component integration, region of interest extraction, target tracking, hyperspectral image analyzing and target signature identification will be discussed in detail. This work is sponsored by the AFOSR for Integrated Multi-Modal Sensing, Processing and Exploitation under the Discovery Challenge Trusts (DCTs) Program.

1440

Closing remarks & preview of AIPR-2009
J. Irvine, General Chair