Scientifical publications

SparCLeS: Dynamic <formula formulatype="inline"><tex Notation="TeX">$ell_{1}$</tex></formula> Sparse Classifiers With Level Sets for Robust Beard/Moustache Detection and Segmentation

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Robust facial hair detection and segmentation is a highly valued soft biometric attribute for carrying out forensic facial analysis. In this paper, we propose a novel and fully automatic system, called SparCLeS, for beard/moustache detection and segmentation in challenging facial images. SparCLeS uses the multiscale self-quotient (MSQ) algorithm to preprocess facial images and deal with illumination variation. Histogram of oriented gradients (HOG) features are extracted from the preprocessed images and a dynamic sparse classifier is built using these features to classify a facial region as either containing skin or facial hair. A level set based approach, which makes use of the advantages of both global and local information, is then used to segment the regions of a face containing facial hair. Experimental results demonstrate the effectiveness of our proposed system in detecting and segmenting facial hair regions in images drawn from three databases, i.e., the NIST Multiple Biometric Grand Challenge (MBGC) still face database, the NIST Color Facial Recognition Technology FERET database, and the Labeled Faces in the Wild (LFW) database.

Cross-Domain Object Recognition Via Input-Output Kernel Analysis

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
It is of great importance to investigate the domain adaptation problem of image object recognition, because now image data is available from a variety of source domains. To understand the changes in data distributions across domains, we study both the input and output kernel spaces for cross-domain learning situations, where most labeled training images are from a source domain and testing images are from a different target domain. To address the feature distribution change issue in the reproducing kernel Hilbert space induced by vector-valued functions, we propose a domain adaptive input-output kernel learning (DA-IOKL) algorithm, which simultaneously learns both the input and output kernels with a discriminative vector-valued decision function by reducing the data mismatch and minimizing the structural error. We also extend the proposed method to the cases of having multiple source domains. We examine two cross-domain object recognition benchmark data sets, and the proposed method consistently outperforms the state-of-the-art domain adaptation and multiple kernel learning methods.

Regularized Feature Reconstruction for Spatio-Temporal Saliency Detection

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Multimedia applications such as image or video retrieval, copy detection, and so forth can benefit from saliency detection, which is essentially a method to identify areas in images and videos that capture the attention of the human visual system. In this paper, we propose a new spatio-temporal saliency detection framework on the basis of regularized feature reconstruction. Specifically, for video saliency detection, both the temporal and spatial saliency detection are considered. For temporal saliency, we model the movement of the target patch as a reconstruction process using the patches in neighboring frames. A Laplacian smoothing term is introduced to model the coherent motion trajectories. With psychological findings that abrupt stimulus could cause a rapid and involuntary deployment of attention, our temporal model combines the reconstruction error, regularizer, and local trajectory contrast to measure the temporal saliency. For spatial saliency, a similar sparse reconstruction process is adopted to capture the regions with high center-surround contrast. Finally, the temporal saliency and spatial saliency are combined together to favor salient regions with high confidence for video saliency detection. We also apply the spatial saliency part of the spatio-temporal model to image saliency detection. Experimental results on a human fixation video dataset and an image saliency detection dataset show that our method achieves the best performance over several state-of-the-art approaches.

Texture Enhanced Histogram Equalization Using TV-<formula formulatype="inline"><tex Notation="TeX">${rm L}^{1}$</tex></formula> Image Decomposition

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Histogram transformation defines a class of image processing operations that are widely applied in the implementation of data normalization algorithms. In this paper, we present a new variational approach for image enhancement that is constructed to alleviate the intensity saturation effects that are introduced by standard contrast enhancement (CE) methods based on histogram equalization. In this paper, we initially apply total variation (TV) minimization with a ${rm L}^{1}$ fidelity term to decompose the input image with respect to cartoon and texture components. Contrary to previous papers that rely solely on the information encompassed in the distribution of the intensity information, in this paper, the texture information is also employed to emphasize the contribution of the local textural features in the CE process. This is achieved by implementing a nonlinear histogram warping CE strategy that is able to maximize the information content in the transformed image. Our experimental study addresses the CE of a wide variety of image data and comparative evaluations are provided to illustrate that our method produces better results than conventional CE strategies.

Gaussian Blurring-Invariant Comparison of Signals and Images

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
We present a Riemannian framework for analyzing signals and images in a manner that is invariant to their level of blurriness, under Gaussian blurring. Using a well known relation between Gaussian blurring and the heat equation, we establish an action of the blurring group on image space and define an orthogonal section of this action to represent and compare images at the same blur level. This comparison is based on geodesic distances on the section manifold which, in turn, are computed using a path-straightening algorithm. The actual implementations use coefficients of images under a truncated orthonormal basis and the blurring action corresponds to exponential decays of these coefficients. We demonstrate this framework using a number of experimental results, involving 1D signals and 2D images. As a specific application, we study the effect of blurring on the recognition performance when 2D facial images are used for recognizing people.

Fast SIFT Design for Real-Time Visual Feature Extraction

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Visual feature extraction with scale invariant feature transform (SIFT) is widely used for object recognition. However, its real-time implementation suffers from long latency, heavy computation, and high memory storage because of its frame level computation with iterated Gaussian blur operations. Thus, this paper proposes a layer parallel SIFT (LPSIFT) with integral image, and its parallel hardware design with an on-the-fly feature extraction flow for real-time application needs. Compared with the original SIFT algorithm, the proposed approach reduces the computational amount by 90% and memory usage by 95%. The final implementation uses 580-K gate count with 90-nm CMOS technology, and offers 6000 feature points/frame for VGA images at 30 frames/s and ${sim}{2000}$ feature points/frame for 1920$,times,$1080 images at 30 frames/s at the clock rate of 100 MHz.

Artistic Image Analysis Using Graph-Based Learning Approaches

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
We introduce a new methodology for the problem of artistic image analysis, which among other tasks, involves the automatic identification of visual classes present in an art work. In this paper, we advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the novelties of our methodology is the proposed formulation that is a principled way of combining these two similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results compared with the following baseline algorithms: bag of visual words, label propagation, matrix completion, and structural learning. We also show that the proposed approach leads to a more efficient inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary problems), where we show the inference and training running times, and quantitative comparisons with respect to several retrieval and annotation performance measures.

Self-Supervised Online Metric Learning With Low Rank Constraint for Scene Categorization

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Conventional visual recognition systems usually train an image classifier in a bath mode with all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online recognition. Because the image data characteristics could change over time, it is important for the classifier to adapt to the new data incrementally. In this paper, we present an online metric learning method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is learned to maximize the margin of the distance among different classes of samples. By considering the low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based label propagation, while the model can also self-update using the more confident new samples. With the ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate the effectiveness and efficiency of our algorithm.

Nonlocal Regularization of Inverse Problems: A Unified Variational Framework

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
We introduce a unifying energy minimization framework for nonlocal regularization of inverse problems. In contrast to the weighted sum of square differences between image pixels used by current schemes, the proposed functional is an unweighted sum of inter-patch distances. We use robust distance metrics that promote the averaging of similar patches, while discouraging the averaging of dissimilar patches. We show that the first iteration of a majorize–minimize algorithm to minimize the proposed cost function is similar to current nonlocal methods. The reformulation thus provides a theoretical justification for the heuristic approach of iterating nonlocal schemes, which re-estimate the weights from the current image estimate. Thanks to the reformulation, we now understand that the widely reported alias amplification associated with iterative nonlocal methods are caused by the convergence to local minimum of the nonconvex penalty. We introduce an efficient continuation strategy to overcome this problem. The similarity of the proposed criterion to widely used nonquadratic penalties (e.g., total variation and $ell_{p}$ semi-norms) opens the door to the adaptation of fast algorithms developed in the context of compressive sensing; we introduce several novel algorithms to solve the proposed nonlocal optimization problem. Thanks to the unifying framework, these fast algorithms are readily applicable for a large class of distance metrics.

Corner Detection and Classification Using Anisotropic Directional Derivative Representations

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
This paper proposes a corner detector and classifier using anisotropic directional derivative (ANDD) representations. The ANDD representation at a pixel is a function of the oriented angle and characterizes the local directional grayscale variation around the pixel. The proposed corner detector fuses the ideas of the contour- and intensity-based detection. It consists of three cascaded blocks. First, the edge map of an image is obtained by the Canny detector and from which contours are extracted and patched. Next, the ANDD representation at each pixel on contours is calculated and normalized by its maximal magnitude. The area surrounded by the normalized ANDD representation forms a new corner measure. Finally, the nonmaximum suppression and thresholding are operated on each contour to find corners in terms of the corner measure. Moreover, a corner classifier based on the peak number of the ANDD representation is given. Experiments are made to evaluate the proposed detector and classifier. The proposed detector is competitive with the two recent state-of-the-art corner detectors, the He & Yung detector and CPDA detector, in detection capability and attains higher repeatability under affine transforms. The proposed classifier can discriminate effectively simple corners, Y-type corners, and higher order corners.

Classification of Time Series of Multispectral Images With Limited Training Data

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Image classification usually requires the availability of reliable reference data collected for the considered image to train supervised classifiers. Unfortunately when time series of images are considered, this is seldom possible because of the costs associated with reference data collection. In most of the applications it is realistic to have reference data available for one or few images of a time series acquired on the area of interest. In this paper, we present a novel system for automatically classifying image time series that takes advantage of image(s) with an associated reference information (i.e., the source domain) to classify image(s) for which reference information is not available (i.e., the target domain). The proposed system exploits the already available knowledge on the source domain and, when possible, integrates it with a minimum amount of new labeled data for the target domain. In addition, it is able to handle possible significant differences between statistical distributions of the source and target domains. Here, the method is presented in the context of classification of remote sensing image time series, where ground reference data collection is a highly critical and demanding task. Experimental results show the effectiveness of the proposed technique. The method can work on multimodal (e.g., multispectral) images.

Fast <formula formulatype="inline"><tex Notation="TeX">$ell_{1}$</tex></formula>-Minimization Algorithms for Robust Face Recognition

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
$ell_{1}$-minimization refers to finding the minimum $ell_{1}$-norm solution to an underdetermined linear system ${mbi{b}}=A{mbi{x}}$. Under certain conditions as described in compressive sensing theory, the minimum $ell_{1}$-norm solution is also the sparsest solution. In this paper, we study the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework, known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular $ell_{1}$-minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate message passing, and TFOCS. To aid peer evaluation, the code for all the algorithms has been made publicly available.

Robust Face Representation Using Hybrid Spatial Feature Interdependence Matrix

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
A key issue in face recognition is to seek an effective descriptor for representing face appearance. In the context of considering the face image as a set of small facial regions, this paper presents a new face representation approach coined spatial feature interdependence matrix (SFIM). Unlike classical face descriptors which usually use a hierarchically organized or a sequentially concatenated structure to describe the spatial layout features extracted from local regions, SFIM is attributed to the exploitation of the underlying feature interdependences regarding local region pairs inside a class specific face. According to SFIM, the face image is projected onto an undirected connected graph in a manner that explicitly encodes feature interdependence-based relationships between local regions. We calculate the pair-wise interdependence strength as the weighted discrepancy between two feature sets extracted in a hybrid feature space fusing histograms of intensity, local binary pattern and oriented gradients. To achieve the goal of face recognition, our SFIM-based face descriptor is embedded in three different recognition frameworks, namely nearest neighbor search, subspace-based classification, and linear optimization-based classification. Extensive experimental results on four well-known face databases and comprehensive comparisons with the state-of-the-art results are provided to demonstrate the efficacy of the proposed SFIM-based descriptor.

Motion Estimation Using the Correlation Transform

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
The zero-mean normalized cross-correlation is shown to improve the accuracy of optical flow, but its analytical form is quite complicated for the variational framework. This paper addresses this issue and presents a new direct approach to this matching measure. Our approach uses the correlation transform to define very discriminative descriptors that are pre-computed and that have to be matched in the target frame. It is equivalent to the computation of the optical flow for the correlation transforms of the images. The smoothness energy is non-local and uses a robust penalty in order to preserve motion discontinuities. The model is associated with a fast and parallelizable minimization procedure based on the projected-proximal point algorithm. The experiments confirm the strength of this model and implicitly demonstrate the correctness of our solution. The results demonstrate that the involved data term is very robust with respect to changes in illumination, especially where large illumination exists.

Single Image Dehazing by Multi-Scale Fusion

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Haze is an atmospheric phenomenon that significantly degrades the visibility of outdoor scenes. This is mainly due to the atmosphere particles that absorb and scatter the light. This paper introduces a novel single image approach that enhances the visibility of such degraded images. Our method is a fusion-based strategy that derives from two original hazy image inputs by applying a white balance and a contrast enhancing procedure. To blend effectively the information of the derived inputs to preserve the regions with good visibility, we filter their important features by computing three measures (weight maps): luminance, chromaticity, and saliency. To minimize artifacts introduced by the weight maps, our approach is designed in a multiscale fashion, using a Laplacian pyramid representation. We are the first to demonstrate the utility and effectiveness of a fusion-based technique for dehazing based on a single degraded image. The method performs in a per-pixel fashion, which is straightforward to implement. The experimental results demonstrate that the method yields results comparative to and even better than the more complex state-of-the-art techniques, having the advantage of being appropriate for real-time applications.

Joint Sparse Learning for 3-D Facial Expression Generation

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
3-D facial expression generation, including synthesis and retargeting, has received intensive attentions in recent years, because it is important to produce realistic 3-D faces with specific expressions in modern film production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the high-dimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional representations. Based on JSL, we can effectively and efficiently generate various expressions of a 3-D face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of 3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones in terms of quality, time cost, and robustness.

Robust Model for Segmenting Images With/Without Intensity Inhomogeneities

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
Intensity inhomogeneities and different types/levels of image noise are the two major obstacles to accurate image segmentation by region-based level set models. To provide a more general solution to these challenges, we propose a novel segmentation model that considers global and local image statistics to eliminate the influence of image noise and to compensate for intensity inhomogeneities. In our model, the global energy derived from a Gaussian model estimates the intensity distribution of the target object and background; the local energy derived from the mutual influences of neighboring pixels can eliminate the impact of image noise and intensity inhomogeneities. The robustness of our method is validated on segmenting synthetic images with/without intensity inhomogeneities, and with different types/levels of noise, including Gaussian noise, speckle noise, and salt and pepper noise, as well as images from different medical imaging modalities. Quantitative experimental comparisons demonstrate that our method is more robust and more accurate in segmenting the images with intensity inhomogeneities than the local binary fitting technique and its more recent systematic model. Our technique also outperformed the region-based Chan–Vese model when dealing with images without intensity inhomogeneities and produce better segmentation results than the graph-based algorithms including graph-cuts and random walker when segmenting noisy images.

Learning Prototype Hyperplanes for Face Verification in the Wild

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
In this paper, we propose a new scheme called Prototype Hyperplane Learning (PHL) for face verification in the wild using only weakly labeled training samples (i.e., we only know whether each pair of samples are from the same class or different classes without knowing the class label of each sample) by leveraging a large number of unlabeled samples in a generic data set. Our scheme represents each sample in the weakly labeled data set as a mid-level feature with each entry as the corresponding decision value from the classification hyperplane (referred to as the prototype hyperplane) of one Support Vector Machine (SVM) model, in which a sparse set of support vectors is selected from the unlabeled generic data set based on the learnt combination coefficients. To learn the optimal prototype hyperplanes for the extraction of mid-level features, we propose a Fisher's Linear Discriminant-like (FLD-like) objective function by maximizing the discriminability on the weakly labeled data set with a constraint enforcing sparsity on the combination coefficients of each SVM model, which is solved by using an alternating optimization method. Then, we use the recent work called Side-Information based Linear Discriminant (SILD) analysis for dimensionality reduction and a cosine similarity measure for final face verification. Comprehensive experiments on two data sets, Labeled Faces in the Wild (LFW) and YouTube Faces, demonstrate the effectiveness of our scheme.

Fast Computation of Rotation-Invariant Image Features by an Approximate Radial Gradient Transform

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
We present the radial gradient transform (RGT) and a fast approximation, the approximate RGT (ARGT). We analyze the effects of the approximation on gradient quantization and histogramming. The ARGT is incorporated into the rotation-invariant fast feature (RIFF) algorithm. We demonstrate that, using the ARGT, RIFF extracts features 16&times; faster than SURF while achieving a similar performance for image matching and retrieval.

A Continuous Method for Reducing Interpolation Artifacts in Mutual Information-Based Rigid Image Registration

IEEE Transactions on Image Processing - Thu, 01/08/2013 - 00:00
We propose an approach for computing mutual information in rigid multimodality image registration. Images to be registered are modeled as functions defined on a continuous image domain. Analytic forms of the probability density functions for the images and the joint probability density function are first defined in 1D. We describe how the entropies of the images, the joint entropy, and mutual information can be computed accurately by a numerical method. We then extend the method to 2D and 3D. The mutual information function generated is smooth and does not seem to have the typical interpolation artifacts that are commonly observed in other standard models. The relationship between the proposed method and the partial volume (PV) model is described. In addition, we give a theoretical analysis to explain the nonsmoothness of the mutual information function computed by the PV model. Numerical experiments in 2D and 3D are presented to illustrate the smoothness of the mutual information function, which leads to robust and accurate numerical convergence results for solving the image registration problem.