| Start |
Event
|
| 0800 |
Continental breakfast
|
| 0830 |
Workshop Kickoff
C. Cohen, General Chair
|
| 0840 |
Keynote 1: Gary Adams, daVinci Systems, speaking on "Who
Moved my Pixels? - or - The Effective Use of Motion Estimation in the
Repair and Restoration of Motion Picture Film"
|
| 0910 |
Session 1: Co-locating Multiple Images
Chairs: A. Williams (Harsh Environment Applied Technologies) and H.
Rhody (Rochester Institute of Technology)
|
| 0910 |
Bandwidth Efficient Sensor Architectures with Multiple Feature
Extraction
J. Caulfield (Cyan Systems), J. Elliot (Nova Sensors), J. Curzan (Nova
Sensors), M. Massie (Nova Sensors), and P. McCarley (Air Force Research
Lab, Eglin AFB)
Current generation sensors have very high data rates which rely on
high speed image processors and can cause data bottlenecks and increase
the support system electronics size to where high resolution sensors
are hard to integrate on compact platforms. Cyan Systems is developing
techniques for image processing that resides near the FPA Sensor. This
compact image processor works to create localized processing suitable
for in large format sensor in the visible and IR region.
We will demonstrate the ability to spatially downsample the output bandwidth
and detect targets of interest. The algorithm has been developed with
the ability for the near FPA electronics to automatically adapt the
imaging and detection parameters to extract targets without losing the
sensitivity or altering the false alarm or missed detection rate of
existing off focal plane processing systems. Advanced FPA's with near
FPA algorithms for pre-cueing have the potential to minimize the data
throughput bottlenecks from very large format IRFPAs of 1024 x 1024
pixels and larger.
Improvements in Acuity processing concepts will be presented to include
lower SNR detection and preservation of edges made possible by improvements
in the dim target feature extraction.
|
| 0930 |
A spacial feature enhanced MMI algorithm for multi-modal wild-fire
image registration
X. Fan and H. Rhody (Rochester Institute of Technology)
The integration of multispectral airborne imagery and geographic data
for wildfire research and emergency response requires the 3D multiple
view registration. Registration of maps, visible imagery and IR imagery,
especially LWIR, is challenging because of the differences in brightness,
color and features that are available in the different modalities. We
have developed a semi-automated workflow for the registration and exploitation
of this imagery and data that can produce quick-turnaround products
for research and wildfire management. The techniques are based upon
an enhancement of the conventional maximization of mutual information
(MMI) algorithm through an efficient utilization of feature information.
This technique largely overcomes the problems that arise from uncorrelated
variations in pixel intensity between visible sensors, LWIR sensors
that respond to temperature variations, and artificial colorations present
in maps. A measure of registration confidence based upon the kurtosis
of search space has been developed to enable operators to be cued to
examine suspicious results produced by the semi-automated workflow algorithms.
Experiments on real wild-fire imagery demonstrate the performance of
the techniques.
|
| 0950 |
Coffee Break
|
| 1010 |
Real-time, Multiple Hot-Target Identification and Multi-Spectral
Fusion
M. Khadaria, M. Pusateri (Pennsylvania State University), and D. Siviter
(Harsh Environment Applied Technologies)
Night vision systems, consisting of intensified visible light imagers
and various infrared band imagers, are widely deployed. Long-wave infrared
imagers are useful due to their ability to detect the thermal emissions
associated with human body temperature. However, long-wave infrared
images provide an unnatural view of the world that causes the user to
spend additional time interpreting the image, slowing their response.
New digital night vision systems are available with multiple apertures
providing both intensified visible and long-wave infrared imagers. While
color fusion algorithms have been used to combine both images, reviews
of the resulting fused imagery have been mixed. An alternative fusion
approach is to display the intensified image and provide graphic cues
in the imagery as to the location of hot-targets.
In this paper we report on our hot-target detection algorithm and the
method of cueing we use to identify locations of hot-targets. Multiple
hot-targets are detected using edge and intensity information from the
long-wave imagery. Detected blobs are then classified as hot-targets
based on blob extent to minimize false alarms. The algorithm was targeted
for use in a head mounted multi-spectral night vision systems that have
stringent storage, computational and power requirements. The detection
algorithm is independent of inter-frame information. Most operations
in the algorithm are binary to minimize the computational, memory, and
power requirements for real-time video processing. We will present test
videos of the algorithm applied to various long wave scenes; the scenes
were captured with a 320x240 sensor, using a 40° horizontal field
of view. Acceptable detection results were obtained for multiple hot-targets
within a 100 meter range.
|
| 1030 |
Fused Exploitation of LIDAR Point Clouds and Hyperspectral
for Improved Target Detection
D. Messinger, M. Foster, and J. Schott (Rochester Institute of Technology)
Recent work has demonstrated the utility of performing rare point target
detection in hyperspectral imagery in the calibrated radiance domain,
instead of the (atmospherically compensated) estimated surface reflectance
domain. This is achieved by using a physics-based forward model to predict
how the target signature (reflectance) will be manifested in the radiance
image under a variety of partially known atmospheric conditions. Typical
methods use MODTRAN to model the atmospheric contributions to the measured
signal, but ignore or over simplify any local geometric impacts on the
signal. In this work, we present a methodology that utilizes a co-temporal,
but low resolution LIDAR point cloud to improve target detection by
constraining the geometric terms in the forward model. The three dimensional
LIDAR point cloud is used to estimate, on a per-pixel basis in the hyperspectral
image, fractional solar illumination, exposure to the full sky, surface
normal, and subpixel presence of a target based on three dimensional
target detection of a geometric target model. The method employed for
the spatial target detection is subject to high false alarm rates, but
when combined with the spectral information can be used to mitigate
against challenging scenarios such as targets in hard canopy shadows.
The method has been demonstrated on both real and synthetic data and
examples will be presented for each.
|
| 1050 |
Image Fusion with Multiband Linear Arrays
M. Michelizzi, C. Maraviglia (Naval Research Laboratory), and K. Cox
(Space/Ground Systems Solutions)
The Naval Research Lab (NRL) has been studying the application of multiband
linear array sensors to develop image understanding algorithms. There
are a number of applications for this work including collision avoidance
of ships in a maritime environment. The scanning linear array sensors
used for this study include bands in the infrared (IR) and a visible
band line scanner.
The system architecture for the multiband linear array image fusion
system will be described including the sensors and a custom gimbal that
scans the area around the ship. Actual sensor imagery will be shown
describing the image registration and scan to scan issues from multiple
linear array sensors.
Algorithms for searching, acquiring, and tracking targets while continuing
to scan using the multiple scanned linear arrays will be described.
Imagery from the individual modes of operation will be shown. An alternate
approach to this problem is the use of multiband two dimensional focal
plane arrays (FPAs). The advantages and disadvantages of using linear
arrays verses a two dimensional FPA for multiband image fusion and understanding
will be discussed in this paper.
|
| 1110 |
Real-time mapping and navigation by fusion of multiple electro-optic
sensors
R. Sandoval, M. Pusateri, J. Fry, D. Lesutis (Pennsylvania State University),
and J. Siviter (Harsh Environment Applied Technologies)
Autonomous vehicle control can benefit from the abstraction of data
from multiple sensors into an information stream containing only relevant
aspects of the environment. We denote this information stream as a virtual
environment; the virtual environment differs from the real environment
in that only objects relevant to the vehicle are identified and mapped
in the virtual environment. We are jointly developing an autonomous
surface vehicle (ASV). The ASV will compete in the 2009 ASV challenge
sponsored by the Association for Unmanned Vehicle Systems International
and Office of Naval Research.
The ASV is equipped with a sensor suite including two forward looking
color CCD cameras. By applying image processing and computer vision
techniques, such as edge and blob detection, stereo image disparity,
the ASV fuses its electro-optical data with its other sensor data to
generate a virtual reconstruction of the elements of the real world
relevant to the competition tasks. The virtual environment provides
all information used in task and objective completion, including waypoint
and buoy navigation, target identification and elimination, friendly
identification and recovery, docking, and obstacle avoidance. The fusion
of sensor data to form a virtual environment is a prime goal in the
ASV's design which allows for the simplification of task, objective
and behavioral programming. In this paper, we will report on the design
of the virtual environment fusion algorithm for waypoint and buoy navigation.
We will discuss the results and techniques used in the virtual reconstruction.
|
| 1130 |
Comparison of 2D Median Filter Hardware Implementations for
Real-Time Stereo Video
J. Scott, M. Mushtaq, M. Pusateri, and H. Dhawan (Pennsylvania State
University)
The two-dimensional spatial median filter is a core algorithm for impulse
noise removal in digital image processing and computer vision. While
the literature presents several analyses of median filters optimized
for a standard 3x3 pixel neighborhood configuration, a 5x5 neighborhood,
useful for imagery exhibiting noise not conforming to the classic "salt
and pepper" formation, has received little analysis. Research efforts
on hardware implementations of median filters have been devoted primarily
toward implementations with low latency and high throughput. In the
application we are investigating, the stereo visible near infrared sensors
will both require a 5x5 median filter. Since the system is a battery
powered unit, optimal power usage is a critical requirement. However,
optimal power usage for median filtering has received little attention
in the literature.
In this paper, we focus on investigating four selected hardware implementations
of a 5x5 median filter on the basis of power efficiency. Power efficiency
is extremely important when designing image fusion algorithms for night
vision goggles; battery weight must be minimized without compromising
operation time. We also analyze the latency, maximum clock rates, and
resource utilization for these implementations. The designs include
implementations of merge sort and radix sort-based elimination algorithms,
common in software implementation of median filters, and a systolic
sorting array and a Batcher sorting network, common hardware sorting
techniques. All designs were created in the Altera Quartus-II environment
for Stratix-II field programmable gate arrays (FPGAs), and were designed
to be fully pipelined, accepting input sets and generating median filter
output values every pixel clock pulse. Of the four considered designs,
the Batcher network is a clear winner in power efficiency. Also, the
Batcher network exceeds the functional and performance requirements
for resource usage, latency, and clock rate.
|
| 1150 |
Lunch
|
| 1300 |
Session 1: Co-locating Multiple Images (continued)
Chairs: A. Williams (Harsh Environment Applied Technologies) and H.
Rhody (Rochester Institute of Technology)
|
| 1300 |
Hyper-Spectral Content Aware Resizing
J. Scott, R. Tutwiler, R. Collins, and M. Pusateri (Pennsylvania State
University)
Image resizing is performed for many reasons in image processing. Often,
it is done to reduce or enlarge an image for display. It is also done
to reduce the bandwidth needed to transmit an image. Most image resizing
algorithms work based on principles of spatial or spatial frequency
interpolation. One drawback to these algorithms is that they are not
image content aware and can fail to preserve relevant features in an
image, especially during size reduction. Recently, a content aware image
resizing algorithm, called seam carving, was presented by Avidan and
Shamir.
In this paper we discuss an extension of the seam carving algorithm
to hyper-spectral imagery. For a hyper-spectral image with an MxN field
of view and with P spectral layers, our algorithm will identify a one
pixel wide path through the image field of view containing a minimum
of information and then removes it. This process is repeated until the
image size is reduced to the desired dimension. Information content
is assessed using energy and power metrics; several such metrics have
been tested with varying results. The resulting carved hyper-spectral
image has the minimum reduction in information for the resizing based
upon energy and power metrics used to quantify information. We will
present the results of seam carving applied to imagery sets: three spectra
imagery captured with VNIR, SWIR and LWIR cameras and ten spectra imagery
generated synthetically.
|
| 1320 |
Remote Sensing Data Assimilation in Environmental Models
A. Vodacek, A. Spivey (Rochester Institute of Technology), Y. Li (Rapiscan
Systems), and A. Garrett (Savannah River National Laboratory)
Remote sensing images typically provide a two dimensional snapshot
of a three dimensional and time varying world. Numerical physics-based
models of the environment can provide time varying predictions of processes
in three spatial dimensions, but these models are subject to increasing
error as time progresses. Data assimilation is the term used to describe
various numerical techniques for incorporating new data over time into
an executing model and thereby reducing prediction errors. We describe
an example of remote sensing data assimilation using the Ensemble Kalman
Filter, to illustrate some of the general procedures and requirements
of this approach.
|
| 1340 |
Automated Image Registration to 3-D Building Models
K. Walli and H. Rhody (Rochester Institute of Technology)
This paper develops a technique for the registration of multisensor/multimodality
images to 3-D models utilizing the KML encoding standard. Given the
recent advancements in 3-D online visualization of geographic information
utilizing such software tools as Google Earth, Microsoft Virtual Earth,
and NASA's Whirlwind, it is becoming increasingly necessary to understand
and utilize the capabilities of these impressive software tools in the
field of remote sensing.
Since 2-D registration will always be limited by the effects of viewing
geometry and occlusion. This approach orients the scene based model
to the same viewing perspective as the remotely sensed image to enable
traditional 2-D registration techniques. Once this is accomplished the
3-D ambiguity between the model and the image can be removed and the
image can be utilized as a texture map on the model.
This approach relies on either apriori sensor information or scene
based content to estimate the proper 3-D scene orientation relative
to the remotely sensed image. Once the initial model orientation is
estimated, an iterative approach to improve the model-to-image projection
can be implemented. Multiple methods will be discussed to test the accuracy
of the resulting model-to-image registration.
|
| 1400 |
Session 2: Data Techniques
Chair: Jim Aanstoos (Mississippi State University)
|
| 1400 |
Evaluation of Compression Techniques for Wide-area Video
A. Perera (Kitware)
Very large aerial video collectors, such as Constant Hawk, Angel Fire,
and the planned ARGUS-IS, are of increasing interest. These sensors
have very large effective focal plane arrays, and can generate a tremendous
amount of data. For example, the ARGUS-IS system will generate about
425 Gigabits/second of image data. This presents significant storage
challenge for the onboard storage, and also a significant challenge
for transmitting the data down for real-time access. One way to address
this challenge is through compression of the imagery. However, one must
be confident that the compression does not cause any loss of intelligence
in the imagery. This paper presents the results of an evaluation of
the performance of different compression algorithms. We evaluated both
single image compression algorithms (such as JPEG2000) and video compression
algorithms (such as MPEG-4 AVC). In general, we found that video compression
produces 3 to 5 times more compression than the single image compression,
at equivalent quality. The quality was measured using the Structural
Similarity metric. We also found that the stream can be compressed about
100 times without a perceptual loss in quality. This is an insufficient
amount of compression to transmit the entire image down in real-time.
For this, we would need to obtain 1000 to 2000 times compression, which
will necessarily cause a loss in image quality. The loss in image quality
does not necessarily mean loss of intelligence, and we argue that the
quality of the compressed stream must be evaluated using a task-based
metric. We also present some approaches to achieving the 1000 to 2000
compression factors. (Approved for public release, distribution unlimited.)
|
| 1420 |
Fault Tolerant Integrated Information Management Support for
Physically Constrained Iterative Deconvolution
S. Spetka and G. Ramseyer (State University of New York Institute of
Technology)
Multiple image processing algorithms are often required to process
computer vision inputs. The rapid processing of complex image streams
requires more computing power than is found in a typical PC based computer
or workstation, and the processing power of high-performance computers
(HPCs) and Linux clusters have been required to do this type of rapid
massive processing. Emerging multicore processors offer the possibility
of doing these types of processing at the PC level in real time. The
Physically Constrained Iterative Deconvolution (PCID) algorithm is a
multi-frame blind deconvolution (MFBD) parallel algorithm that allows
the extraction of simple and complex information from multiple images.
Massive computing power is required to use this algorithm in real time.
Message Passing Interface (MPI) is normally used with PCID for communications
between processors in multiprocessor systems. However, MPI has fault
tolerant issues. A tool to replace MPI for multiprocesser communications
has been developed that supports a high degree of fault-tolerance, and
facilitates multiple image processing by integration with a publication/subscription
infrastructure. This tool is demonstrated here for the PCID algorithm.
Other attributes of MPI and this tool's publication/subscription information
management support for PCID are compared and contrasted.
|
| 1440 |
A Nonlinear Manifold Learning Framework for Real-Time Motion
Estimation Using Low-Cost Sensors
L. Xie, Y. Cao, and F. Quek (Virginia Polytechnic Institute and State
University)
We propose a real-time motion synthesis framework to control the animation
of 3D avatar in real-time. Instead of relying on motion capture device
as the control signal, we use low-cost and ubiquitously available 3D
accelerometer sensors. The framework is developed under a data-driven
fashion, which includes two steps: model learning from existing high
quality motion database, and motion synthesis from the control signal.
In the model learning step, we apply a non-linear manifold learning
method to establish a high dimensional motion model which learned from
a large motion capture database. Then, by taking 3D accelerometer sensor
signal as input, we are able to synthesize high-quality motion from
the motion model we learned from the previous step. The system is performing
in real-time, which make it available to a wide range of interactive
applications, such as character control in 3D virtual environments and
occupational training.taking 3D accelerometer sensor signal as input,
we are able to synthesize high-quality motion from the motion model
we learned from the previous step. The system is performing in real-time,
which make it available to a wide range of interactive applications,
such as character control in 3D virtual environments and occupational
training.
|
| 1500 |
Coffee Break
|
| 1520 |
An Edge Detection Technique in Images
A. Awad and H. Man (Stevens Institute of Technology)
A novel edge detection method based on a neighborhood similarity criterion
is presented in this paper. In this algorithm, the pixels in the original
image that have minimum numbers of similar pixels among its neighboring
pixels in the filtering window are labeled as edge pixels. Simulation
results show that this approach performs well in noise-free images but
it is superior to the others in images corrupted by AWGV. Moreover,
the algorithm is fast and has low computational complexity.
|
| 1540 |
Session 3: Full Body Tracking, Biometrics, and Surveillance
Chair: John Irvine (Draper Laboratory)
|
| 1540 |
Modeling and Managing Spatiotemporal Information in Distributed
Video Surveillance
P. Agouris, G. Cervone, P. Franzese, J. Radzikowski, and M. Sonwalkar
(George Mason University)
In this paper we address the issue of object tracking and sensor management
in distributed networks of video sensors. We consider persistent surveillance
applications, and two types of sensors deployment techniques:
* fixed video sensors distributed in a large urban environment (e.g.
located on rooftops), tracking moving targets (individuals or cars)
* video cameras on-board UAVs, tracking moving targets in a suburban
or rural environment.
In this paper we present our approaches to:
* Model the spatiotemporal activities of moving targets, in order to
support intelligent analysis (e.g. comparing patterns of behavior),
and relevant database issues. Towards this goal we use the model of
the spatiotemporal helixes as a concise description of spatiotemporal
activities, and relevant solutions to support spatiotemporal helix analysis
(e.g. similarity assessment).
* The use of innovative approaches in order to support the distribution
and dynamic repositioning of a fleet of moving sensors (e.g. on-board
UAVs) in order to optimize the coverage of the covered area. We proceed
through the use of innovative information landscapes to describe current
and predicted target positions, and present techniques that allow the
repositioning of sensors.
In the paper we present theoretical models and early experimental results.
|
| 1600 |
Behavior Recognition for Surveillance Applications
C. Cohen, K. Scott, M. Huber, S. Rowe, (Cybernet Systems Corporation),
and F. Morelli (Army Research Lab)
Differentiating between normal human activity and aberrant behavior
via closed circuit television cameras is a difficult and fatiguing task.
The vigilance required of human observers when engaged in such tasks
must remain constant, yet attention falls off dramatically over time.
In this paper we propose an architecture for capturing data and creating
a test and evaluation system to monitor video sensors and tag aberrant
human activities for immediate review by human monitors. A psychological
perspective provides the inspiration of depicting isolated human motion
by point-light walker (PLW) displays, as they have been shown to be
salient for recognition of action. Low level intent detection features
are used to provide an initial evaluation of actionable behaviors. This
relies on strong tracking algorithms that can function in an unstructured
environment under a variety of environmental conditions. Critical to
this is creating a description of “suspicious behavior” that can be
used by the automated system. The resulting confidence value assessments
are useful for monitoring human activities and could potentially provide
early warning of IED emplacement activities.
|
| 1620 |
A Hybrid Scoring Approach for Human Candidate Selection in
IR Image Sequences
K. Byrd and M. Chouikha (Howard University)
This paper presents a new hybrid scoring approach for accurately determining
and characterizing human candidates for Infrared (IR) Human Detection
Systems. The scoring approach is a combination of binary and fuzzy set
classification, work motivated by the need to precisely evaluate the
performance of detection and classification algorithms, given partial
candidate selections due to body-part occlusion, isolated hot spots
and portions of selected regions of interest (ROI) no longer in the
sensors field of view (FOV). The goal of this paper is to centralize
the ideas and viewpoints of scientists as they formulate thoughts and
calculate statistics related to human candidate misses and human candidate
selections. We examine not only the sensitivity, specificity and accuracy
of the candidate selection system, but also the area under the ROC Curve
(AUC), F-Measure and Matthew's Correlation Coefficient (MCC). This new
approach will hopefully lead to a standardized evaluation metric for
computer vision/detection tasks.
|
| 1730 |
Poster Session and Reception
Chair: C. Maraviglia (NRL)
|
| W-1 |
Identity Dominance: Using fingerprints to associate an individual
to a larger social structure.
M. Loew and D. Herdegen (George Washington University and MITRE)
Fingerprint pattern and ridge count analysis of two separate population
groups were used to associate an individual to a group through qualitative
and quantitative comparison. The fingerprint data from the two groups
was analyzed using a Classification and Regression Tree algorithm. Four
separate trees were produced. The first tree separated the two populations
using only finger number and pattern. Subsequent trees separated the
two populations using finger number, pattern, and ridge count. Including
ridge counts improved the per-finger classification from 56.4% to 73.9%
and 79.5% for right and left loop patterns respectively. Whorls with
both ridge counts improved the classification accuracy to 83.3%. The
classification accuracies provided the basis for determining the probability
of correctly associating a person to one of the two groups. For each
finger, the probability of correctly associating the finger to the group
is binomially distributed based upon the classification probabilities.
Association is based upon a majority vote. In the worst case with only
finger pattern and finger number available, the expected probability
of correctly associating the individual is 54.1% using all ten fingers.
Adding ridge counts raises the lower bound to 90.8%. The upper bound
using whorls with two ridge counts is 98.4%. Between these two extremes
are cases in which the patterns vary among the fingers. However, since
the probability of correctly associating the individual to the city
depends on the data available, cases where the fingerprint patterns
or the deltas are not discernable reduce to the probability of correct
association accordingly.
|
| W-2 |
Tear Duct Detector for Identifying Left versus Right Iris Images
R . Abiantun and M. Savvides (Carnegie Mellon University)
In this paper, we present different pattern recognition approaches
for automatically detecting tear ducts in iris acquired eye images for
enhancing iris recognition and detecting mislabeling in datasets. Detecting
the tear duct in an image will tell an iris recognition system whether
the presented eye image is that of a left or a right eye. This will
enable the iris matcher to match the enrolled image against images in
the database belonging to the same side, thus reducing the error rates
by eliminating the chance of matching a left iris to a right iris or
vice-versa. This is a major problem in many single iris imaging acquisition
devices currently deployed in the field where the data recorded is mislabeled
due to human error. We present several techniques of detecting tear
ducts, including boosted Haar features, support vector machines (SVM),
and more traditional approaches like PCA and LDA. Finally, we show that
tear duct detection improves the detection of left/right iris recognition
over previous approaches.
|
| W-3 |
MirrorTrack - A Real-Time Multiple Camera Approach for Multi-touch
Interactions on Glossy Display Surfaces
P. Chung, B. Fang, F. Quek (Virginia Polytechnic Institute and State
University)
This paper presents a real-time multiple camera approach for multi-touch
interaction system that takes advantage of specular display surface
(such as conventional LCD displays) and the mirror-effect in a low-azimuth
camera angle to detect and track fingers their reflections simultaneously.
Building on our prior work, 1. We use multi-resolution processing to
greatly improve runtime performance of the system; 2. We employ different
edge detection and pattern recognition algorithms for different processing
resolution to help detect fingers more accurately and efficiently; 3.
We track both the location of a fingertip and its pointing direction
so it can be identified more effectively; 4. We use a full stereo algorithm
to compute finger locations in the 3D space more accurately. Our system
has many advantages. 1. It works with any glossy flat panel display;
2. It avoids clumsy set-up time of a top-down camera with the concomitant
screen glare problems; 3. It supports both touch and hover operation;
4. It can work with large vertical display without the usual occlusion
problems. We describe our approach and implementation in detail and
present our experiment results.its pointing direction so it can be identified
more effectively; 4. We use a full stereo algorithm to compute finger
locations in the 3D space more accurately. Our system has many advantages.
1. It works with any glossy flat panel display; 2. It avoids clumsy
set-up time of a top-down camera with the concomitant screen glare problems;
3. It supports both touch and hover operation; 4. It can work with large
vertical display without the usual occlusion problems. We describe our
approach and implementation in detail and present our experiment results.
|
| W-4 |
Automatic Pain Recognition from Video Sequences using SVM
M. Monwar and S. Rezaei (University of Calgary)
In recent years, a number of studies have begun to investigate the
neural substrates for perceiving facial expressions, using neuroimaging
and other modalities. Specific expressions that have been studied include
those of fear, anger, sadness, happiness, surprise and disgust. However,
one basic category of facial expression that has not yet been investigated
is that of pain. Facial expressions of pain have been the focus of considerable
behavioral research. Such work has documented that pain expressions,
like other affective facial expressions, play an important role in social
communication. In this paper, we present an efficient video analysis
technique for recognition of a specific expression, pain, from human
faces. We employ an automatic face detector which detects face from
the stored video frame using skin color modeling technique. The pain
affected portions of the face are obtained by using a mask image Then
we identify 30 points on the eye and mouth regions, as for almost all
of the people, these portions of face are affected due to pain (specially,
brow lowering, orbit tightening, and raising of the upper lip), and
calculate the displacement of these regions from normal image to painful
image during pain. These displacement vectors are used as input to a
Support Vector Machine classifier which uses statistical learning theory
instead of heuristics or analogies with natural learning systems. We
employ incrementally trained approach for support vector machine classifier,
which speeds up the matching time as well as reduces the overhead involve
with the training phase. It also enables us to use the previously stored
data for later use. In addition, it allows for convenient combination
of training data from multiple individuals to accomplish person-independent
classification. We evaluate our method in terms of recognition performance
for a variety of classification scenarios and compare the results with
neural network based and eigenimage based automatic pain recognition
systems. The experiment results indicate that using support vector machine
as classifier can certainly improve the performance of automatic pain
recognition system and can be effectively used in health care sector.
|
| W-5 |
Multi Features Hybrid Active Shape Models for Lips Contour
Tracking in Video Sequences.
Q. Nguyen and M. Milgram (University Pierre and Marie Curie)
Lip tracking has been extensively studied in recent years because it
can significantly improve the performance of the automatic speech recognition
and face recognition systems especially in a noisy environment. In this
paper, we propose and evaluate a novel method for enhancing performance
of lips contour tracking, which is based on the concept of active shape
models (ASM) proposed by Tim Cootes, and multi features. On the first
image of the video sequence, lip region is detected using the Bayesian's
rule in which lip colour information is modelled by using the Gaussian
Mixture Model (GMM). The GMM is trained by Expectation-Maximisation
(EM) algorithm. Lip shape model is initialized in the detected region
and then it converges upon lip contours. A single feature-based ASM
presents good performance only in particular conditions but gets stuck
in local minima for noisy conditions (like beard, wrinkle, poor texture,
low contrast between lip and skin, etc). To enhance the convergence,
we propose to use 3 features: normal profile, grey level patches and
Gabor wavelets, and combine them by using a voting approach. The ASM
is not able to take into account temporal information from previous
frames therefore the lip contours are tracked by replacing the standard
ASM with a hybrid active shape model (MF-HASM) which is capable to take
advantage of the temporal information. Initial experimental results
on video sequences show that MF-HASM is more robust to local minimum
problem and gives a higher accuracy than traditional single feature-based
method in lip tracking problem.
|
| W-6 |
Low-Cost, High-Speed Computer Vision Using NVIDIA's CUDA Architecture
S. Park, S. Ponce, J. Huang, Y. Cao, and F. Quek (Virginia Polytechnic
Institute and State University)
In this paper, we introduce real time image processing techniques using
modern programmable Graphic Processing Unit (GPU). GPU is of SIMD (Single
Instruction, Multiple Data) implementation that is inherently data-parallel
computing device. By utilizing NVIDIA's new GPU programming framework,
"Compute Unified Device Architecture" (CUDA) as a computational
resource, we realize significant acceleration of image processing algorithm
computations. We show that a range of computer vision algorithms map
readily to CUDA with significant performance gains. Specifically, we
present parallelization and optimization strategies for Canny's edge
detection algorithm, and demonstrate the efficiency of our approach
by applying it to a computation and data-intensive video motion tracking
algorithm known as "Vector Coherence Mapping" (VCM) algorithm.
Our results show the promise of using such common low-cost processors
for intensive computer vision tasks.
|
| W-7 |
Exploitation of Massive Numbers of Simple Events
R. Rimey and D. Keefe (Lockheed Martin)
Emerging image-based sensor systems can observe a relatively large
area (e.g., the size of an urban neighborhood) for long time intervals
either continually or with high revisit rates. This type of sensor data
makes new types of exploitation possible, but only with the assistance
of automated exploitation aids because of the massive volume of data
that must be studied as a whole. Automated methods to extract the simplest
events from image sequences are often fairly robust (e.g., change events
derived from EO or SAR imagery or from video-derived tracks). Massive
numbers of such events (available from emerging wide-area persistent
surveillance sensor systems) can contain information of high intelligence
value. This paper examines this general-purpose problem: How massive
numbers of the simplest sensor-derived events can be exploited. We summarize
the basic functionality an intelligence analyst needs for studying this
type of event data, in short to understand the spatial structure, temporal
structure and event-pair structure within an area of regard. Then we
present several algorithms for automated exploitation of such data.
The first set of algorithms detect activities in lower-level space-
and time-varying features, which we also show can be exploited with
the aid of visualization tools. The second set of algorithms, a varient
of probabilistic Latent Semantic Analysis (pSA), describes and exploits
local temporal structure in the observed events. Our techniques are
experimentally validated using one simulated and two real datasets of
sensor derived events, which contain all change events in an suburban
Iraqi neighborhood over many weeks, in an outdoor marketplace over several
months, and inside a building over several weeks. These three seemingly
different problem domains share some characteristics in terms of the
spatial-, temporal- and event-pair structure of normal activity patterns.
|
| W-8 |
Multicamera- Multispectral Video Library - An Algorithm Development
Tool
E williams (Harsh Environment Applied Technologies), M .Pusateri (Pennsylvania
State University), and D. Siviter (Harsh Environment Applied Technologies)
HEAT has developed a ground-based forward-looking multispectral data
collection system mounted on a rugged All Terrain Vehicle (ATV) to allow
recording imagery while moving over rough terrain. The image data collected
from multiple bands of the Electro-Optical/Infrared (EO/IR) spectrum
is used to aid image fusion algorithm development for applications such
as night vision goggles. The existing system consists of VNIR, SWIR,
and LWIR cameras mounted on a ruggedized Pan/Tilt, a rack-mount PC with
a frame grabbers to capture digital images, and a 4 TB RAID for real-time
image storage. The system can also record meteorological data and GPS
information synchronized with the imagery.
HEAT has developed a methodology for algorithm development using imagery
and other important parameters about the scene of interest. The imagery
collected by the data collection system during field exercises is stored
in a database of imagery; the imagery can then be replayed into a model
running in MATLAB on a desktop PC in the lab. The synchronized raw imagery
and meteorological data would be provided as inputs to the model. The
model is used to develop image fusion algorithms to display the best
possible fused image to the human eyes. Also, target identification
algorithms are developed and optimized for best probability of detection
with lowest false alarm rate for a computer vision system. The optimized
algorithms for both displaying to the human eyes and to computer aided
target tracking can then be ported to a rugged FPGA-based system to
deploy in the real world environment. Sample raw imagery input from
the cameras into the data collection system will be shown. Examples
of fused imagery created by the fusion algorithms will also be shown.
|
| Start |
Event
|
| 0800 |
Continental breakfast
|
| 0830 |
Keynote 2: Prof. Joe Mundy, Brown University, speaking on "Change
detection in the 21st century: exploiting vast image collection resources"
|
| 0900 |
Session 4: Biometrics - Head, Face, and Eyes
Chair: R. Vorder Bruegge (FBI)
|
| 0900 |
Boosted Multi Image Features for Improved Face Detection
R. Abiantun and M. Savvides (Carnegie Mellon University)
In this paper, we present novel approaches of automatically detecting
human faces in images which is extremely important for any face recognition
system. This paper expands on the traditional Viola-Jones approach by
proposing to boost a plethora of mixed feature sets for face detection;
we do this by adding non-Haar-like elements to a large pool of mixed
features in an Adaboost framework. We show how to generate discriminative
Support Vector Machine (SVM) type features and Gabor-type features (in
various orientations and frequencies and central locations) and use
this whole pool as possible discriminative candidate feature sets in
modeling the patterns of a frontal view human face. This general and
large-diversity pool of features is used to build a strong classifier
and we show we can improve the generalization performance of the AdaBoost
approach, and as a result improving the robustness of the face detector.
We report performance on the MIT+CMU face database and compare the results
with other published face detection algorithms. We also discuss processing
times and speeding up methods to offset the increase in complexity in
order to achieve face detection in real time.
|
| 0920 |
A Robust Segmentation Approach to Iris Recognition Based on
Video
Y. Chen (?)
(Abstract available soon...)
|
| 0940 |
Coffee Break |
| 1000 |
Tracking and Recognizing Multiple Faces using Kalman Filter
and Modular PCA-based Methods
J. Foytik, P. Sankaran, and K. Asari (Old Dominion University)
Tracking and recognizing multiple faces in complex environments has
the ability to provide efficient security automation to large areas,
such as ports. Such a system could provide real-time analysis of important
individuals or notification of unwanted people. Previous research has
shown that Kalman filter techniques paired with the Viola-Jones face
detection algorithm can be used to successfully track one or more faces
in a viewing region. However, these methods have relied on basic template
face matching techniques and even cloth analysis of the shirt region
for each tracked person. These techniques provide reasonable results
in certain scenarios, but fail to reliably distinguish between tracked
people under variant conditions. A real-time face tracking and recognition
system, capable of processing multiple faces simultaneously, is presented.
As in previous related work, the system performs face detection using
the Viola-Jones algorithm, which is input as measurement values in a
Kalman visual tracking framework. Modular Principal Component Analysis
(MPCA) is used to quickly create a basic feature subspace, trained only
using face images obtained during on-line processing, to distinguish
the difference between currently tracked faces. These low-level face
recognition and Kalman systems allow multiple people to be tracked and
thoroughly analyzed by a higher-level face recognition subspace. This
subspace is created using face images from a large database of people
and processed off-line using Adaptively Weighted Modular Principal Component
Analysis (AWMPCA). The overall system is shown to provide reliable tracking
of more than one person and obtain a more accurate recognition rate
due to the ability to create a time-average of the recognized faces.
|
| 1020 |
Investigating Useful & Distinguishing Features Around the
Eyelash Region
H. Lai, M. Savvides, and T. Chen (Carnegie Mellon University)
Biometric identification is very important for National Security. Traditionally
Biometric identification is based on single modalities, like face, iris
or fingerprint. However when only face or iris is available and partial
data is present, one has to look at additional cues to be able to infer
if a match occurs. This paper explores soft-biometrics, by finding additional
cues around the eye-lash region, such as analyzing eye-lashes and direction
of these eye-lashes as possible discriminators for matches. It can be
observed that many cultures have different types of eye-lashes and eye
soft biometrics (like eye-fold). When looking at possible matches, these
features can be used to declare or dismiss a match. In fact, one can
also infer different population based on the type of eye-lashes, it
is observed that in Eastern-Asian (e.g. Chinese) typically have eye-lashes
that are pointed downward whereas for Western population this is not
the case. The same can be said for detecting!
eye-folds. We present automatic algorithm for image processing and segmenting
eye-lash, and direction for fast biometric binning and extracting distinguishable
for features that can be used to declare a match or the exclusion there-of.
We show results from the Iris-Challenge-Evaluation (ICE) dataset from
NIST and CASIA on the performance of this automatic feature selection.
|
| 1040 |
Integrating Mono-Modal Biometric Matchers through Logistic
Regression Rank Aggregation Approach
M. Monwar (University of Calgary)
Biometric system relies on person's behavioral and/or physiological
characteristics as an alternative means of person authentication (traditional
means being password, smart card, ID etc.). However, biometric system
based solely on a single biometric may not always meet security requirements.
Thus multibiometric systems are emerging as a trend which help in overcoming
limitations of single biometric solutions, such as when a user does
not have a quality sample to present to the system (an individual with
a cold attempts to authenticate to a voice recognition system), and
reduces the ability of the system to be tricked fraudulently. A reliable
and successful multibiometric system needs an effective fusion scheme
to integrate the information presented by multiple matchers. In this
research, I integrate results of the three mono-modal biometric matchers
face, ear and iris with the Logistic Regression approach of rank level
fusion method. The face matcher uses an improved Bayesian approach in
which the difference between two face images is modeled by three components:
intrinsic difference (I) that discriminates different individuals; transformation
difference (T) caused by such transformations as lighting or expression
changes; and random noise (N). The ear matcher uses Active Shape Models
(ASMs) to model the shape and local appearance of the ear in a statistical
manner. In addition, steerable features, which encode rich discriminant
information of the local structural texture and provide accurate guidance
for shape location is also be extracted from the ear image ahead of
ASMs. Steerable features Eigenearshape is used for final classification.
The iris matcher uses support vector machines. Canny's edge detection
and the Hough transform is used to find the iris/pupil boundary and
a simple thresholding method is employed for eyelash detection. The
Gabor wavelet technique is deployed in order to extract the deterministic
features in the transformed iris of a person in the form of template.
The extracted iris features then fed into a support vector machine (SVM)
for classification. The novelty of my research lies in the consolidation
of the outputs generated by these three matchers using the Logistic
Regression approach of rank level fusion. Experiments results indicate
that Logistic Regression method outperform Borda count method or Highest
rank method. The system can be a contribution to the homeland security.
|
| 1100 |
Multiple Image Information Extraction
M. Pushpalatha (Srijayachamarajendra College of Engineering)
A general and efficient design approach using a wavelet basis function
neural classifier to cope with small training sets of high dimension,
which is a problem frequently encountered in face recognition is presented.
In order to avoid over fitting and reduce the computational burden,
face features are extracted by the principal component analysis (PCA)
method. A hybrid learning algorithm is proposed which is used to train
the wavelet neural networks so that the dimension of the search space
is drastically reduced in the gradient paradigm. Although many algorithms
have been proposed to configure conventional neural networks like RBF
NN and Incremental learning radial basis function neural networks (RAN)
for various applications including face recognition, here we would like
to provide more insights into these algorithms and compare their performances
with our method. Simulation results conducted on the ORL database show
that the system achieves improved performance both in terms of error
rates of classification and learning efficiency.
|
| 1120 |
Nonlinear Manifold Embedding of a Virtual 2D Face Database
For Face Recognition
P. Sankaran, Q. Wang, and V. Asari (Old Dominoin University)
A nonlinear manifold approach on modeling face images with multiple
orientations for face recognition is presented in this paper. Face images
are treated as manifolds in state space instead of classical representation
as fixed points. This approach helps to identify the underlying low
dimensional pattern of the face set under consideration in a better
way and place a new test point to a more accurate position in the state
space when compared to a fixed point approach.
One of the requirements of such an approach is a large database of
training faces covering all these multiple orientations so that a smooth
manifold could be modeled. For this we propose the creation of a large
2D face database based on an analytical 3D face surface (Prototypical
Face Model) using a few 2D images to start with. The model is now adjusted
for illumination and color followed by a registration process to account
for 3D to 2D conversion distortions. This is followed by shape and texture
fitting resulting in a sequence of virtual 2D face images. These faces
are now used to train the nonlinear manifold.
Multiple patterns are trained by modeling nonlinear manifolds for each
pattern. A test face image can now be projected onto these trained manifolds
and a minimum distance to manifold approach can be used to recognize
the test case.
|
| 1140 |
Human Recognition Using Visible Iris and Face Images from a
Single Source
R. Tompkins and K. Asari (Old Dominion University)
Here we a present an algorithm which uses high-resolution visible
images and extracts face and iris regions in order to establish identity.
The system uses component-based Viola Jones face detection to locate
potential subjects. Face Recognition is performed using a subspace of
face images from a large database of people and processed off-line using
Adaptively Weighted Modular Principal Component Analysis (AWMPCA). Component
locations indicate the approximate location of the iris, and the Hough
transform is used to estimate the location and boundary of the iris
region. This location is input as a measurement value into a conditional
density propagation visual tracking framework, which is used to position
the camera in order to maintain a view of the iris. After a perspective
estimation and correction step, the segmented iris is transformed into
a coordinateless rectangular region. The recognition step consists of
a two dimensional PCA technique performed on overlapping patches of
the transformed iris region. Identity is established using a weighted
fusion of the iris and face recognition steps.
|
| 1200 |
Lunch
|
| 1330 |
Keynote 3: Jonathon Phillips: NIST, discussing the
Multiple Biometrics Grand Challenge
|
| 1400 |
Session 4: Biometrics - Head, Face, and Eyes
(continued)
Chair: R. Vorder Bruegge (FBI) |
| 1400 |
Fractal Encoding of Low Resolution Iris Imagery for Improved
Matching
T. Trebaol and M. Savvides (Carnegie Mellon University)
Fractals are geometric shapes that are composed of many parts, each
of which is self-similar to the rest. A fractal can be constructed from
one, or a set of simple shapes and by iteratively re-applying a series
of affine transformations to each shape. The stochastically self-similar
nature of the eye suggests that it may be appropriate to model iris
images as fractals. Images encoded as fractals without compression can
be decoded and enlarged with higher spatial resolution than images resized
with pixel interpolation (such as bilinear, cubic or spline). In this
paper we compare the iris matching performance of images encoded using
fractal methods to iris matching performance achieved by traditional
image resizing with different types of interpolation. The results are
based on the Iris Challenge Evaluation (ICE) dataset from NIST and we
compare with commercial grade fractal encoding methods using Adobe plug-ins
as-well as fractal encoding methods found in literature. While this
approach is shown to work in the domain of iris biometric (particularly
in low resolution - at-a-distance iris image acquisition) , we also
propose this in the future as a possible potential tool to improve low
resolution facial images (single image super-resolution).
|
| 1420 |
A Comparative Study of the Multi-linear PCA for Face Recognition
J. Wang (Florida International University)
This is a comparative study among different methods for face recognition
application. The purpose of this paper is to determine if multi-linear
PCA method (also known as nD PCA) can provide higher accuracy and comparable
processing time. The recognition accuracy and average running time among
the methods (such as ICA, PCA, KPCA and 2D PCA) are compared. Also,
the mathematical foundation for evaluating the computational complexity
and the memory requirements for feature bases for each method are discussed.
Unfolding is an important concept in multi-linear PCA as it distinguishes
the technique from 1D PCA for face recognition. In this proposed application,
a 3D tensor is unfolded into two 2D matrices and the eigenvectors of
these two 2D matrices are computed. The recognition process is reduced
to evaluating the differences in norms between the image projections
from training images and the image projection of the testing image.
The smallest norm difference determines the most resemblance between
the testing images and the images in the training set. In this application,
the unfolding method is different from the traditional methods as Backward
Cyclic and Forward Cyclic.
The AT&T database is used to test all of the above mentioned methods.
The database consists of 40 subjects with 10 images each. Given the
highest recognition accuracy among those methods, the multi-linear PCA
has shown to hold the best or similar accuracies for a faster running
time in contrast to all other methods. It should be noted that although
multi-linear PCA has a slightly slower run time than the 2D PCA, it
nonetheless provides a more accurate face recognition process.
|
| 1440 |
A Video-based Face Detection and Recognition System using Cascade
Face Verification Modules
P. Zhang (Alcorn State University)
Face detection and recognition in a video is a challenging research
topic as overall processes must be done timely and efficiently. In this
paper, a novel face detection and recognition system using three cascade
fast face verification modules and an ensemble classifier is presented.
Firstly, the head of the tester is serially verified by our proposed
three verification modules: face skin verification module, face symmetry
verification module, and eye template verification module. The three
verification modules can eliminate the tilted faces, the backs of the
head, and any other non-face moving objects. Only the frontal face images
are sent to face recognition engine. This verification strategy can
facilitate the workload of face detection and recognition in a video
process. In addition, the frontal face detection reliability can be
adjusted by simply setting the verification threshold coefficients in
the verification modules so that this mechanism is suitable for different
applications. Secondly, three hybrid feature sets are applied to face
recognition. A novel ensemble classifier scheme is proposed to congregate
three individual Artificial Neural Network (ANN) classifiers trained
by the three hybrid feature sets. A computationally efficient fitness
function of genetic algorithms is proposed to evolve the best weights
for the proposed ensemble classifier. Experiments demonstrated that
the frontal face detection rate can be achieved as high as 95% in the
low quality video images. The overall face recognition rate and reliability
are increased at the same time using the proposed ensemble classifier
in the system.
|
| 1500 |
Session 3: Full Body Tracking, Biometrics, and Surveillance
(continued)
Chair: John Irvine (Draper Laboratory)
|
| 1500 |
A Survey on Behavior Analysis in Video Surveillance for Homeland
Security Applications
T. Ko (Raytheon)
Surveillance cameras are inexpensive and everywhere these days but
the manpower required to monitor and analyze them is expensive. Consequently
the videos from these cameras are usually monitored sparingly or not
at all; they are often used merely as archives, to refer back to once
an incident is known to have taken place. Surveillance cameras can be
a far more useful tool if instead of passively recording footage they
can be used to detect events requiring attention as they happen, and
take action in real time. This is the goal of automated visual surveillance:
to obtain a description of what is happening in a monitored area, and
then to take appropriate action based on that interpretation. Video
surveillance for humans is one of the most active research topics in
computer vision. It has a wide spectrum of promising homeland security
applications. Video management and interpretation systems have become
quite capable in recent years. This paper looks into how hardware and
software can be put together to solve surveillance problems in an age
of increased concern with public safety and security. In general, the
framework of a video surveillance system includes the following stages:
modeling of environments, detection of motion, classification of moving
objects, tracking, behavior understanding and description, and fusion
of information from multiple cameras. Despite recent progress in computer
vision and other related areas, there are still major technical challenges
to be overcome before reliable automated video surveillance can be realized.
This paper reviews developments and general strategies of stages involved
in video surveillance, and it analyzes the feasibility and challenges
for combining motion analysis, behavior analysis, and standoff biometrics
for identification of known suspects, anomaly detection, and behavior
understanding.
|
| 1520 |
Human Gesture Tracking Using an Agent-based Tracking System
B. Fang, P. Chung, and F. Quek (Virginia Polytechnic Institute and State
University)
We present an agent-based motion tracking and gesture recognition system
to generate motion data using stereo calibrated cameras. The novelty
of our approach is that agents are bound to body-parts (bone structure)
being tracked. These agents are autonomous, self-aware entities that
are capable of communicating with other agents to perform tracking within
agent coalitions. Each agent seeks for "evidence" for its
existence both from low-level features (e.g. motion vector fields, color
blobs) as well as from its peers (other agents representing body-parts
with which it is compatible). Multiple agents may represent different
"candidates" for a body-part, and compete for a place within
a coalition that constitutes the tracking of an articulated human body.
The power of our approach is the flexibility by which domain information
may be encoded within each agent to produce an overall tracking solution.
We demonstrate the effectiveness of tracking system by testing actions
(random moving, and walking).
|
| 1540 |
Coffee Break |
| 1600 |
Fast Classification of Indecent Video by Low Complexity Repetitive
Motion Detection
T. Endeshaw, J. Garcia, and A. Jakobsson (Karlstad University)
This paper proposes a fast method for detection of indecent video content
using repetitive movement analysis. Unlike skin detection, motion will
provide invariant features irrespective of race and color. The video
material to be evaluated is divided into short fixed-length sections.
By filtering different combinations of B-frame motion vectors using
adjacency in time and space, one dominant motion vector is constructed
for each frame. The power spectral density estimate of this dominant
motion vector over the short sections of video frames is then computed
using a periodogram with a Hamming window. The resulting power spectrum
is then subjected to a selection window to restrict the spectrum to
an limited frequency range typical of indecent movement, as empirically
derived by us. A threshold detector is then applied to detect repetitive
motion in video sections. However, there are many instances where repetitive
motion occurs in these shorter sections without the video as a whole
being indecent. As a second step, an additional detector is employed
to determine if the sections over a longer period of time can be classified
as as having indecent material. The proposed method is resource efficient
not requiring the IDCT step of the video decoding. Further, the computationally
expensive spectral estimation calculations are done on only one value
per frame. Evaluations performed using a restricted set of videos with
different amounts of texture, lighting conditions and complex backgrounds
show very promising results with high true positive probability (>85%)
for a low false positive probability (<10%) for the intermediate
repetitive motion detection. After the second longer sequence estimator
the results were, for the limited testing set, close to ideal. As a
third step additional selectivity, that is complementary but more resource
demanding (for example color or audio analysis) could be employed to
further decrease the probability of false positives.
|
| 1620 |
Dual IR Spectral Video Inspection of a Concealed Live Animal
M Hsu (George Washington University), K .Byrd (Howard University), C.
Hsu (George Washington University), and H. Szu (NRL/ONR)
Multiple spectral videos have been widely used in different fields
such as inspection, image synthesis, 3D objects modeling, obstacle detection,
collision avoidance, artificial intelligence navigation, medical applications,
etc. Almost all researches of multiple images restoration and registrations
were focused on 3D rigid-body objects. In this paper, we presented a
scheme for 3D deformable live animal whose passive ID-recognition was
generated from a fusion, beyond the traditional affine registration
via 3 control points, of a pair of long (8~12? wavelength) and middle
(3~5?) Infrared (IR) video cameras, used similarly in early passive
breast cancer detection[Szu et al. Patent 20040181375 "Nonlinear
blind demixing of single pixel..."]. In the experiment, the living
animal, hamster, was concealed in a nighttime environment. We could
only observe by means of the cryogenic infrared spectral video cameras
made by FLIR for variable frame rates (10~100). Because of the thermal
transparency coefficient is a function of the spectral density, the
traditional night vision camera, used at the airport inspection, may
not be suited for a concealed hamster inside an airport carried-on box
made by Petland. For example, we could only deduce the centered position
of hamster by the LIR video camera in a nighttime plastic container,
and the details motion, e.g. the salient tail feature of hamster, must
be deduced by means of MIR video camera. The adaptive neighborhood histogram
modification method is equivalent to a blurred low-pass, but statistically
correct averaged result of the overall animal center. Therefore, it
could be used as a local reference image segments for the restoration
and registration of MIR imagery sequence locally. Such a pair of spectral
video has generalized the original "self-reference local matched
filters" video restoration technique, originally purpose by Szu
in 1980 for day-video harbor surveillance imaging through water waves.
In this paper we have provided a fast generalization of local instance
of good seeing by local shift-add-reject algorithm of image registration
for a live animal in a concealed environment.
|
| 1730 |
Pre-Banquet Reception
|
| 1830 |
Evening Banquet and Speaker
Visible and Infrared Imaging Spectroscopy of Paintings
John K. Delaney*, Jason G. Zeibel**, and Roy Littleton**
*Andrew W. Mellon Senior Imaging Scientist, Scientific Research Department,
National Gallery of Art, DC.
**Night Vision & Electronic Sensors Directorate, US Army Research,
Development & Engineering Command.
Imaging spectroscopy has been primarily developed for remote sensing
of the Earth. In this talk we present our findings on the application
to paintings in order to non-destructively identify and map artist materials
as well as improve the visibility of under-drawings and preparatory
sketches. Diffuse reflectance hyper-spectral images (0.4 to 1.65 microns)
of paintings by P. Picasso, Giorgione, A. Derain and Leonardo da Vinci's
Ginevra de' Benci were collected using novel cameras from the Night
Vision & Electronic Sensors Directorate. The resulting image cubes
were analyzed using the hyper-spectral tools from ENVI as well as Kubelka-Munk
spectral fitting models in order to map and identify the major pigments.
Comparison of results from imaging spectroscopy to those results from
methods such as x-ray fluorescence spectrometry (XRF), and electron
microscopy have been used to both validate these methods as well as
to obtain a more complete understanding of the materials used in these
paintings. Examination of the infrared images revealed improved visualization
of original sketches and paint changes compared to those obtained with
broad spectral band infrared imaging typically used by Art Conservators.
To date, these results suggest that imaging spectroscopy can be an important
in situ tool for the identification of materials and/or can serve as
a guide for the selection of sites for further chemical analysis.
John Delaney is the Andrew W. Mellon Senior Imaging Scientist at the
National Gallery of Art, where his research focuses on the development
of in situ imaging methods for art conservation and understanding of
the optical properties of varnishes. He received his PhD from The Rockefeller
University and completed post-doctoral studies at the University of
Arizona and The Johns Hopkins University School of Medicine. Prior to
joining the National Gallery of Art he was the Chief Scientist and Systems
Engineering Lead for the U-2 Business Unit of Airborne ISR Systems at
Goodrich Corporation. Dr. Delaney has consulted with many museums in
the area of infrared imaging for over 15 years. He has published 23
papers in the areas of imaging and spectroscopy.
|
| Start |
Event
|
| 0800 |
Continental breakfast
|
| 0830 |
Keynote #4: Barbara O'Kane: Night Vision Lab, RDECOM, Principal
Scientist for Human Performance, speaking on "Challenges in Human
Sensing"
|
| 0900 |
Session 5: Medical - Imaging as a Biomarker
Chair: M. Loew (George Washington University)
|
| 0900 |
Non-Gaussian Models in Biomedical Imaging
R. Mangoubi and M. Desai (Draper Laboratory)
Most statistical models in applications rely on the Gaussian assumption.
Yet, in many realistic situations, the underlying variation or uncertainty
is essentially non-Gaussian. In detection problems, the Gaussian assumption
leads to false alarms in cases where the tail is a fatter one, such
as in the case of the Laplace density function. In classification problems,
the Gaussian model for variability may be too restrictive, and other
models, such as the generalized Gaussian density function, are more
appropriate. We will present examples of such models as applied to applications
with multiple images, and show why they improved detection and classification
performance in two applications: functional magnetic resonance imaging,
and stem cell classification.
|
| 0920 |
Localization of Fiducial Skin Markers in MR Images using Correlation
Pattern Recognition for PET/MRI Nonrigid Breast Image Registration
D. Walvoord, K. Baum, M. Helguera (Rochester Institute of Technology),
A. Krol (SUNY Upstate Medical University), and R. Easton Jr. (Rochester
Institute of Technology)
In most instances, multiple-modality visualization of pathologies will
present advantages over single-modality studies. For many medical imaging
procedures, it is desirable to produce a "fused" output that
simultaneously exhibits characteristics of the data from each individual
modality to reduce the difficulty of the decision-making process for
radiologists. Preprocessing for most data fusion algorithms typically
provides the necessary registration of the input data (from each modality).
Fiducial markers may be used to show common locations between the imaging
modalities when the methods of image capture produce outputs with very
different spatial structure, as is the case with MRI and PET imagery.
The process of automating the detection of these markers has seen limited
research in the medical field, and often requires manual selection throughout
the 3-dimensional image stack by a human observer. The objective is
to detect each marker (and locate its centroid location) in a noisy
background containing additional objects with a large range of intensity
values. Correlation methods employed must exhibit some "normalizing"
characteristic to accommodate changes in the input image such that regions
of high intensity that do not share similar spatial structure with the
reference pattern are assigned low values in the output correlation
plane, effectively reducing the false positive rate. The filter should
accommodate within-class distortion, as the size and shape of the fiducial
marker will vary through the image stack. For this work, a mean-subtracted
MACH filter was constructed and applied to data that are mean-subtracted
locally. The location of marker centroids in the output stack of correlation
planes was determined by applying grayscale-morphology operations to
extract regions-of-interest. It is apparent that a relatively high probability
of detection is obtained for a wide range of thresholds for an acceptable
false positive rate.
|
| 0940 |
Extra: Biometrics: - Head, Face, and Eyes
Chair: John Irvine (Draper Laboratory)
Image Enhancement for Minutiae-Based Fingerprint Identification
M. Sepasian, W. Balachandran, C. Mares, and M. Azimi (affiliation available
soon)
The purpose of this paper is to investigate the performance of a three-step
procedure for the fingerprint identification and enhancement, using
CLAHE (contrast limited adaptive histogram equalization) together with
'Clip Limit', standard deviation and sliding neighborhood as stages
during processing of the fingerprint image. Firstly, CLAHE with clip
limit is applied to enhance the contrast of the small tiles existing
in the fingerprint image and to combine the neighboring tiles using
a bilinear interpolation in order to eliminate the artificially induced
boundaries. In a second step, the image is decomposed into an array
of distinct blocks and the discrimination of the blocks is obtained
by computing the standard deviation of the matrix elements to remove
the image background and obtain the boundaries for the region of interest.
Finally, by using a slide neighborhood processing, an enhancement of
the image is obtained by clarifying the Minutiae (endpoints and bifurcations)
in each specific pixel, process known as thinning. The paper presents
the motivation for developing this method, its phases, and its possible
advantages through the simulate investigation.
|
| 1000 |
Coffee Break
|
| 1020 |
Session 6: ATR
Chair: J. Kretsch (National Geospatial-Intelligence Agency)
|
| 1020 |
Rapid Training of Image Classifiers Through Adaptive, Multi-frame
Sampling Methods
R. Eaton, J. Lowell, M. Snorrason (Charles River Analytics), John M.
Irvine (Draper Laboratory) and Jonathan Mills (AMRDEC)
Computer vision methods, such as automated target recognition (ATR)
techniques, have the potential to improve the accuracy of military systems
for weapon deployment and targeting, resulting in greater utility and
reduced collateral damage. A major challenge, however, is training the
ATR algorithm to the specific environment and mission. Because of the
wide range of operating conditions encountered in practice, advanced
training based on a pre-selected training set may not provide the robust
performance needed. Training on a mission-specific image set is a promising
approach, but requires rapid selection of a small, but highly representative
training set to support time-critical operations. To remedy these problems
and make short-notice seeker missions a reality, we propose Learning
and Mining using Bagged Augmented Decision Trees (LAMBAST). LAMBAST
examines large databases and extracts sparse, representative subsets
of target and clutter samples of interest. For data mining, LAMBAST
uses a variant of decision trees, called random decision trees (RDTs).
This approach guards against overfitting and can incorporate novel,
mission-specific data after initial training via perpetual learning.
We augment these trees with a distribution modeling component that eliminates
redundant information, ignores misrepresentative class distributions
in the database, and stops training when decision boundaries are sufficiently
sampled. These augmented random decision trees enable fast investigation
of multiple images to train a reliable, mission-specific ATR. This paper
presents the random decision tree framework, develops the sampling procedure
for efficient construction of the sample, and illustrates the procedure
using relevant examples.
|
| 1040 |
Efficient Real Time Object Detection and Localization in Images
O. Andrushchenko, F. Lure, and T. Ramsay (Guardian Technologies International
Inc.)
An approach to automated recognition and localization of object of
interest in real-time recorded images is described. The models generated
through this process are very compact and require/generate fewer numbers
of features and support vectors than models created by the standard
SVM technique. This approach can be implemented in two ways: with and
without a preliminary analysis of entire images. The former involves
feature extraction from the entire images and a follow-up classification;
the localization is conducted as a second step. The latter deals only
with sub images of the entire images where the sub images are received
as a result of a manual or automatic segmentation procedure. This real-time
learning involves the determination of highly discriminative features
between sub images of the object and other non-object image areas using
a large-scale feature extraction. Next, a 3-stage feature selection
procedure combined with a fast Support Vector Machine (SVM) classifier
is employed to develop small models applicable in real-time settings.
The efficiency of the proposed approach is demonstrated with real-world
examples including mammography and other images.
|
| 1100 |
Understanding Computer Vision Challenges
C. Oertel (MITRE)
After nearly half a century of computer vision research, application-specific
systems are common but the goal of developing a robust, general-purpose
computer vision system remains out of reach. Rather than focus on the
strengths and weaknesses of current computer vision approaches, this
paper will enumerate and investigate the challenges that must be overcome
before this goal can be achieved. Key challenges include handling variations
in environment or acquisition parameters such as lighting, view angle,
distance, and image quality; recognizing naturally occurring as well
as intentionally deceptive variations in object appearance; providing
robust general-purpose image segmentation and co-registration; generating
3-D representations from 2-D images; developing useful object representations;
providing required knowledge that is not represented in the image itself;
and managing computational complexity. Each of these challenges, along
with their relevance to solving the vision problem, will be discussed.
Understanding these challenges as a whole may provide insight into underlying
mechanisms that will provide the backbone of a robust general-purpose
computer vision system.
|
| 1120 |
Temporal Structure Methods for Image-Based Change Analysis
R. Rimey and D. Keefe (Lockheed Martin)
This paper addresses change analysis, the exploitation of massive numbers
of image-derived change detections. We use the term "change analysis"
to emphasize the intelligence value contained within large numbers of
change detections, rather than the emphasis by most researchers to date
on "change detection" and the intelligence value of isolated
change detections. The work reported here addresses change detections
from regularly collected images over long time intervals, such as an
image each hour for several weeks, or an image each day for several
months. Our methods emphasize local temporal descriptions and include
minimal spatial information about activities. Our three methods adapt
and extend: (1) the Hamid techniques [1]; (2) Latent Semantic Analysis
(LSA); and (3) probabilistic LSA (pLSA). These methods allow us to:
(a) Detect an activity as a deviation from normal activity, and describe
each anomaly; (b) Discover categories of activity, describe a category
of activity, and assign an activity to a category; (c) Find the most
similar activity in a historical database. Our experiments utilize a
webcam of an outdoor marketplace, measuring 100x200 feet, with images
collected every few minutes over 74 days. We present experimental results
that compare our methods (1)-(3) for performing functions (a)-(c). We
discuss how our techniques are equally applicable for change analysis
using wide-area imaging sensors. Reference: [1] R. Hamid, et al., "A
Novel Sequence Representation for Unsupervised Analysis of Human Activities,"
To appear in AI Journal, http://www-static.cc.gatech.edu/~raffay/publications.htm.
|
| 1140 |
Lunch
|
| 1300 |
Precision Interpolation and Resampling for Multiple-Image Analysis
A. Schaum (Naval Research Laboratory)
The sub-pixel analysis of large volumes of multiple digital images
requires precise methods of resampling. An analysis of current interpolators,
especially commercial products, reveals an emphasis on cosmetics at
the expense of accuracy. Some of the techniques are designed specifically
to be misleading. Furthermore, noncommercial methods concerned with
fidelity to the underlying data have been found wanting, even in univariate
applications. We describe a new set of principles useful in designing
a set of robust interpolators that are purely local, and hence massively
parallelizable, and are more accurate than current ones. These principles
also extend the current state of the art methods used on uniform grids
to apply to arbitrarily sampled ones. They show, furthermore, why such
methods have been unnecessarily constrained in the past and what the
real constraints are. Some new issues arise for multivariate interpolation,
but the new principles and methods are readily extended.
|
| 1320 |
Dominant Component Suppression for Background Suppression and
Spectral Characterization of Image Anomalies
J. Sweet (McClendon)
This paper describes a novel approach to background suppression termed
Dominant Component Suppression (DCS) that extends the basic concept
in two ways. First, DCS adapts (via unsupervised clustering) to the
major backgrounds or dominant components in the scene and significantly
reduces the spectral signal from these interferers. This process brings
the dataset closer to the often assumed multivariate normality and the
simple data model of target and white noise. Second, the DCS algorithm
produces residual spectra that have the same spectral dimensions and
radiometric units as the input data thus facilitating spectroscopic
exploitation of the data. This experiment uses an AVRIS hyperspectral
image corrected to reflectance, which has a spectral target material
linearly imbedded at a range of sub-pixel abundances. While artificial,
this technique provides accurate ground truth for this exploratory experiment.
Spectral Angle Mapper was applied to the reflectance and residual images
and the results indicate that exploitation can be dramatically enhanced
when conducted in residual space by reducing false alarms.
|
| 1340 |
Detection of Ephemeral Changes in Sequences of Images
J. Theiler (Los Alamos National Laboratory) and S. Alder-Golden
(Spectral Sciences)
While the identification of "interesting" features in a
single image is an almost hopelessly open-ended task, the detection
of interesting _changes_ in a pair of co-registered images is a more
feasible undertaking. The change detection problem is nonetheless confounded
by pervasive differences (in illumination, calibration, registration,
etc.) that are inevitable between images, but the anomalous change detection
paradigm treats these differences as something that can be learned from
the images themselves, and those changes that do not fit the pervasive
pattern are identified as anomalous. It is these anomalous changes that
are candidates for interesting features; ultimately, a human analyst
decides what is _truly_ interesting, but the algorithm's job is to identify
a short list of candidates for the analyst to investigate.
A recently developed machine learning framework extends the existing
change detection methodology to arbitrary data distributions, and even
for Gaussian distributions has been shown to exhibit improved performance.
But anomalous change detection algorithms have so far considered only
the problem of finding pairwise changes; when more than two images are
available, there is an opportunity to exploit multiple correlations
and produce a more effective change detector. One such global approach
has been demonstrated using an RX anomaly detector. In our paper, we
will show how to extend the machine learning approach to multiple images,
enabling detection of an ephemeral change in one or a few of the images
by exploiting the information in all of the images. This can be done
in spite of the combinatorial number of ways a change might be present
in a series of images.
|
| 1400 |
Overhead Image Statistics
V. Vijayaraj, A. Cheriyadar (Oak Ridge Laboratory), P. Sallee (Booz
Allen Hamilton), B. Colder (Colder Scientific Solutions), R. Vatasavai,
E. Bright, and B. Bhaduri (Oak Ridge Laboratory)
In this paper we study statistical properties of high-resolution overhead
images for different land use categories. Various local and global statistical
image properties based on the shape of the power spectrum, image gradient
distributions, edge co-occurrence, and inter-scale wavelet coefficient
distributions were computed and analyzed. Our analysis was performed
on a database of high-resolution (1 meter) overhead images collected
from different downtown, suburban, commercial, agricultural and wooded
categories. We discuss how various statistical properties relate to
these image categories and highlight their relationship. The variations
in power spectrum contour shapes for different categories, unique gradient
distribution characteristics of wooded categories, similarity in edge
co-occurrence statistics for overhead and natural images and unique
edge co-occurrence statistics of downtown categories are presented in
this work. Though previous work on natural image statistics has showed
some of the unique characteristics for different categories, such relationships
for overhead images are not well understood. The statistical properties
of natural images were used in previous studies to develop prior image
models, to predict and index objects in a scene and to improve computer
vision models. We envision that our research findings can be used to
augment and adapt computer vision algorithms that rely on prior image
statistics to process overhead images, calibrate the performance of
overhead image analysis algorithms and derive features that give better
discrimination among overhead image categories.
|
| 1420 |
Intelligent Multimodal Sensors Design with Hyperspectral Imaging
for Tracking Moving Target in Real Time
T. Wang and Z. Zhu (City college of New York)
Real time moving target tracking and identification with hyperspectral
imagery are very challenge with current devices and algorithms. The
increased information content of hyperspectral imaging has enabled improved
classification and quantification of targets of interest. However, recording
hyperspectral data for target classification is very time consuming.
We design a sensor platform with multi-modalities that consist of dual-panoramic
(or omnidirectional) peripheral vision system and a narrow field-of-view
hyperspectral fovea. Thus, we only need to capture hyperspectal images
in regions of interest. This design is inspired by biological concept
of the human vision system where the periphery vision of retina is used
to detect motion and the center (or fovea) of retina is used to distinguish
color and objects. The proposed intelligent sensors works as the follows.
Two panchromatic images with 360 degree of field-of-view are generated
by rotation two line scanners around a common rotating axis, pointing
to two directions. Regions of interest having moving targets can be
easily and quickly determined by applying background subtraction. The
next position and time of a moving target can be roughly estimated from
the difference of two regions of the target in the images of the two
scanners. Then, a fovea hyperspectral imaging with high-resolution is
directed only to each small region of interest that has a potential
interesting target. A least-square difference with variations normalizing
method is used to classify the target spectra with training spectra.
Finally signatures of targets can then be efficiently and effectively
determined in real time. We will evaluate our design and the proposed
algorithm under different scenarios involving different targets and
backgrounds using a hyperspectral scene simulation tool. Important issues
such as multimodal component integration, region of interest extraction,
target tracking, hyperspectral image analyzing and target signature
identification will be discussed in detail. This work is sponsored by
the AFOSR for Integrated Multi-Modal Sensing, Processing and Exploitation
under the Discovery Challenge Trusts (DCTs) Program.
|
| 1440 |
Closing remarks & preview of AIPR-2009
J. Irvine, General Chair
|