Navigation |
IEEE Transactions on Image ProcessingStable Orthogonal Local Discriminant Embedding for Linear Dimensionality ReductionManifold learning is widely used in machine learning and pattern recognition. However, manifold learning only considers the similarity of samples belonging to the same class and ignores the within-class variation of data, which will impair the generalization and stableness of the algorithms. For this purpose, we construct an adjacency graph to model the intraclass variation that characterizes the most important properties, such as diversity of patterns, and then incorporate the diversity into the discriminant objective function for linear dimensionality reduction. Finally, we introduce the orthogonal constraint for the basis vectors and propose an orthogonal algorithm called stable orthogonal local discriminate embedding. Experimental results on several standard image databases demonstrate the effectiveness of the proposed dimensionality reduction approach.
Categories: Scientifical publications
Motion-Aware Gradient Domain Video CompositionFor images, gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting Poisson image blending to video presents new challenges due to the added temporal dimension. In video, the human eye is sensitive to small changes in blending boundaries across frames and slight differences in motions of the source patch and target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target videos and optimizing a consistent blending boundary based on a user-provided blending trimap for the source video. Our approach extends mean-value coordinates interpolation to support hybrid blending with a dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can efficiently deal with complex video sequences beyond the capabilities of alpha blending.
Categories: Scientifical publications
Structural Texture Similarity Metrics for Image Analysis and RetrievalWe develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are essentially identical. The proposed metrics extend the ideas of structural similarity and are guided by research in texture analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to investigate metric performance in the context of “known-item search,” the retrieval of textures that are “identical” to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures.
Categories: Scientifical publications
Simultaneous Facial Feature Tracking and Facial Expression RecognitionThe tracking and recognition of facial activities from images or videos have attracted great attention in computer vision field. Facial activities are characterized by three levels. First, in the bottom level, facial feature points around each facial component, i.e., eyebrow, mouth, etc., capture the detailed face shape information. Second, in the middle level, facial action units, defined in the facial action coding system, represent the contraction of a specific set of facial muscles, i.e., lid tightener, eyebrow raiser, etc. Finally, in the top level, six prototypical facial expressions represent the global facial muscle movement and are commonly used to describe the human emotion states. In contrast to the mainstream approaches, which usually only focus on one or two levels of facial activities, and track (or recognize) them separately, this paper introduces a unified probabilistic framework based on the dynamic Bayesian network to simultaneously and coherently represent the facial evolvement in different levels, their interactions and their observations. Advanced machine learning methods are introduced to learn the model based on both training data and subjective prior knowledge. Given the model and the measurements of facial motions, all three levels of facial activities are simultaneously recognized through a probabilistic inference. Extensive experiments are performed to illustrate the feasibility and effectiveness of the proposed model on all three level facial activities.
Categories: Scientifical publications
A Generalized Random Walk With Restart and its Application in Depth Up-Sampling and Interactive SegmentationIn this paper, the origin of random walk with restart (RWR) and its generalization are described. It is well known that the random walk (RW) and the anisotropic diffusion models share the same energy functional, i.e., the former provides a steady-state solution and the latter gives a flow solution. In contrast, the theoretical background of the RWR scheme is different from that of the diffusion-reaction equation, although the restarting term of the RWR plays a role similar to the reaction term of the diffusion-reaction equation. The behaviors of the two approaches with respect to outliers reveal that they possess different attributes in terms of data propagation. This observation leads to the derivation of a new energy functional, where both volumetric heat capacity and thermal conductivity are considered together, and provides a common framework that unifies both the RW and the RWR approaches, in addition to other regularization methods. The proposed framework allows the RWR to be generalized (GRWR) in semilocal and nonlocal forms. The experimental results demonstrate the superiority of GRWR over existing regularization approaches in terms of depth map up-sampling and interactive image segmentation.
Categories: Scientifical publications
Variational Optical Flow Estimation Based on Stick Tensor VotingVariational optical flow techniques allow the estimation of flow fields from spatio-temporal derivatives. They are based on minimizing a functional that contains a data term and a regularization term. Recently, numerous approaches have been presented for improving the accuracy of the estimated flow fields. Among them, tensor voting has been shown to be particularly effective in the preservation of flow discontinuities. This paper presents an adaptation of the data term by using anisotropic stick tensor voting in order to gain robustness against noise and outliers with significantly lower computational cost than (full) tensor voting. In addition, an anisotropic complementary smoothness term depending on directional information estimated through stick tensor voting is utilized in order to preserve discontinuity capabilities of the estimated flow fields. Finally, a weighted non-local term that depends on both the estimated directional information and the occlusion state of pixels is integrated during the optimization process in order to denoise the final flow field. The proposed approach yields state-of-the-art results on the Middlebury benchmark.
Categories: Scientifical publications
Exploring Visual and Motion Saliency for Automatic Video Object ExtractionThis paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the use of any training data (i.e., not limited to any particular type of object). To separate foreground and background regions within and across video frames, the proposed method utilizes visual and motion saliency information extracted from the input video. A conditional random field is applied to effectively combine the saliency induced features, which allows us to deal with unknown pose and scale variations of the foreground object (and its articulated parts). Based on the ability to preserve both spatial continuity and temporal consistency in the proposed VOE framework, experiments on a variety of videos verify that our method is able to produce quantitatively and qualitatively satisfactory VOE results.
Categories: Scientifical publications
Enhanced Compressed Sensing Recovery With Level Set NormalsWe propose a compressive sensing algorithm that exploits geometric properties of images to recover images of high quality from few measurements. The image reconstruction is done by iterating the two following steps: 1) estimation of normal vectors of the image level curves, and 2) reconstruction of an image fitting the normal vectors, the compressed sensing measurements, and the sparsity constraint. The proposed technique can naturally extend to nonlocal operators and graphs to exploit the repetitive nature of textured images to recover fine detail structures. In both cases, the problem is reduced to a series of convex minimization problems that can be efficiently solved with a combination of variable splitting and augmented Lagrangian methods, leading to fast and easy-to-code algorithms. Extended experiments show a clear improvement over related state-of-the-art algorithms in the quality of the reconstructed images and the robustness of the proposed method to noise, different kind of images, and reduced measurements.
Categories: Scientifical publications
Colorization-Based Compression Using OptimizationIn this paper, we formulate the colorization-based coding problem into an optimization problem, i.e., an $L_{1}$ minimization problem. In colorization-based coding, the encoder chooses a few representative pixels (RP) for which the chrominance values and the positions are sent to the decoder, whereas in the decoder, the chrominance values for all the pixels are reconstructed by colorization methods. The main issue in colorization-based coding is how to extract the RP well therefore the compression rate and the quality of the reconstructed color image becomes good. By formulating the colorization-based coding into an $L_{1}$ minimization problem, it is guaranteed that, given the colorization matrix, the chosen set of RP becomes the optimal set in the sense that it minimizes the error between the original and the reconstructed color image. In other words, for a fixed error value and a given colorization matrix, the chosen set of RP is the smallest set possible. We also propose a method to construct the colorization matrix that colorizes the image in a multiscale manner. This, combined with the proposed RP extraction method, allows us to choose a very small set of RP. It is shown experimentally that the proposed method outperforms conventional colorization-based coding methods as well as the JPEG standard and is comparable with the JPEG2000 compression standard, both in terms of the compression rate and the quality of the reconstructed color image.
Categories: Scientifical publications
Orientation Imaging Microscopy With Optimized Convergence Angle Using CBED Patterns in TEMsGrain size statistics, texture, and grain boundary distribution are microstructural characteristics that greatly influence materials properties. These characteristics can be derived from an orientation map obtained using orientation imaging microscopy (OIM) techniques. The OIM techniques are generally performed using a transmission electron microscopy (TEM) for nanomaterials. Although some of these techniques have limited applicability in certain situations, others have limited availability because of external hardware required. In this paper, an automated method to generate orientation maps using convergence beam electron diffraction patterns obtained in a conventional TEM setup is presented. This method is based upon dynamical diffraction theory that describes electron diffraction more accurately as compared with kinematical theory used by several existing OIM techniques. In addition, the method of this paper uses wide angle convergent beam electron diffraction for performing OIM. It is shown in this paper that the use of the wide angle convergent electron beam provides additional information that is not available otherwise. Together, the presented method exploits the additional information and combines it with the calculations from the dynamical theory to provide accurate orientation maps in a conventional TEM setup. The automated method of this paper is applied to a platinum thin film sample. The presented method correctly identified the texture preference in the sample.
Categories: Scientifical publications
Grassmannian Regularized Structured Multi-View Embedding for Image ClassificationImages are usually represented by features from multiple views, e.g., color and texture. In image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the features are often different in nature and it is nontrivial to fuse them. Particularly, some extracted features are redundant or noisy and are consequently not discriminative for classification. To alleviate these problems in an image classification context, we propose in this paper a novel multi-view embedding framework, termed as Grassmannian regularized structured multi-view embedding, or GrassReg for short. GrassReg transfers the graph Laplacian obtained from each view to a point on the Grassmann manifold and penalizes the disagreement between different views according to Grassmannian distance. Therefore, a view that is consistent with others is more important than a view that disagrees with others for learning a unified subspace for multi-view data representation. In addition, we impose the group sparsity penalty onto the low-dimensional embeddings obtained hence they can better explore the group structure of the intrinsic data distribution. Empirically, we compare GrassReg with representative multi-view algorithms and show the effectiveness of GrassReg on a number of multi-view image data sets.
Categories: Scientifical publications
Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion DetectionRecently, sparse representation has been applied to visual tracking to find the target with the minimum reconstruction error from a target template subspace. Though effective, these L1 trackers require high computational costs due to numerous calculations for $ell_{1}$ minimization. In addition, the inherent occlusion insensitivity of the $ell_{1}$ minimization has not been fully characterized. In this paper, we propose an efficient L1 tracker, named bounded particle resampling (BPR)-L1 tracker, with a minimum error bound and occlusion detection. First, the minimum error bound is calculated from a linear least squares equation and serves as a guide for particle resampling in a particle filter (PF) framework. Most of the insignificant samples are removed before solving the computationally expensive $ell_{1}$ minimization in a two-step testing. The first step, named $tau$ testing, compares the sample observation likelihood to an ordered set of thresholds to remove insignificant samples without loss of resampling precision. The second step, named max testing, identifies the largest sample probability relative to the target to further remove insignificant samples without altering the tracking result of the current frame. Though sacrificing minimal precision during resampling, max testing achieves significant speed up on top of $tau$ testing. The BPR-L1 technique can also be beneficial to other trackers that have minimum error bounds in a PF framework, especially for trackers based on sparse representations. After the error-bound calculation, BPR-L1 performs occlusion detection by investigating the trivial coefficients in the $ell_{1}$ minimization. These coefficients, by design, contain rich information about image corruptions, including occlusion. Detected occlusions are then used to enhance the template updating. For evaluation, we conduct experiments on three video applications: biometrics (head movement, hand holding object, singers on stage), pedestrians (urban travel, hallway monitoring), and cars in traffic (wide area motion imagery, ground-mounted perspectives). The proposed BPR-L1 method demonstrates an excellent performance as compared with nine state-of-the-art trackers on eleven challenging benchmark sequences.
Categories: Scientifical publications
Multiview Hessian Regularization for Image AnnotationThe rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.
Categories: Scientifical publications
GPU Accelerated Edge-Region Based Level Set Evolution Constrained by 2D Gray-Scale HistogramDue to its intrinsic nature which allows to easily handle complex shapes and topological changes, the level set method (LSM) has been widely used in image segmentation. Nevertheless, LSM is computationally expensive, which limits its applications in real-time systems. For this purpose, we propose a new level set algorithm, which uses simultaneously edge, region, and 2D histogram information in order to efficiently segment objects of interest in a given scene. The computational complexity of the proposed LSM is greatly reduced by using the highly parallelizable lattice Boltzmann method (LBM) with a body force to solve the level set equation (LSE). The body force is the link with image data and is defined from the proposed LSE. The proposed LSM is then implemented using an NVIDIA graphics processing units to fully take advantage of the LBM local nature. The new algorithm is effective, robust against noise, independent to the initial contour, fast, and highly parallelizable. The edge and region information enable to detect objects with and without edges, and the 2D histogram information enable the effectiveness of the method in a noisy environment. Experimental results on synthetic and real images demonstrate subjectively and objectively the performance of the proposed method.
Categories: Scientifical publications
Sparse Stochastic Processes and Discretization of Linear Inverse ProblemsWe present a novel statistically-based discretization paradigm and derive a class of maximum a posteriori (MAP) estimators for solving ill-conditioned linear inverse problems. We are guided by the theory of sparse stochastic processes, which specifies continuous-domain signals as solutions of linear stochastic differential equations. Accordingly, we show that the class of admissible priors for the discretized version of the signal is confined to the family of infinitely divisible distributions. Our estimators not only cover the well-studied methods of Tikhonov and $ell_{1}$-type regularizations as particular cases, but also open the door to a broader class of sparsity-promoting regularization schemes that are typically nonconvex. We provide an algorithm that handles the corresponding nonconvex problems and illustrate the use of our formalism by applying it to deconvolution, magnetic resonance imaging, and X-ray tomographic reconstruction problems. Finally, we compare the performance of estimators associated with models of increasing sparsity.
Categories: Scientifical publications
Sparse/DCT (S/DCT) Two-Layered Representation of Prediction Residuals for Video CodingIn this paper, we propose a cascaded sparse/DCT (S/DCT) two-layer representation of prediction residuals, and implement this idea on top of the state-of-the-art high efficiency video coding (HEVC) standard. First, a dictionary is adaptively trained to contain featured patterns of residual signals so that a high portion of energy in a structured residual can be efficiently coded via sparse coding. It is observed that the sparse representation alone is less effective in the R-D performance due to the side information overhead at higher bit rates. To overcome this problem, the DCT representation is cascaded at the second stage. It is applied to the remaining signal to improve coding efficiency. The two representations successfully complement each other. It is demonstrated by experimental results that the proposed algorithm outperforms the HEVC reference codec HM5.0 in the Common Test Condition.
Categories: Scientifical publications
Representing and Retrieving Video Shots in Human-Centric Brain Imaging SpaceMeaningful representation and effective retrieval of video shots in a large-scale database has been a profound challenge for the image/video processing and computer vision communities. A great deal of effort has been devoted to the extraction of low-level visual features, such as color, shape, texture, and motion for characterizing and retrieving video shots. However, the accuracy of these feature descriptors is still far from satisfaction due to the well-known semantic gap. In order to alleviate the problem, this paper investigates a novel methodology of representing and retrieving video shots using human-centric high-level features derived in brain imaging space (BIS) where brain responses to natural stimulus of video watching can be explored and interpreted. At first, our recently developed dense individualized and common connectivity-based cortical landmarks (DICCCOL) system is employed to locate large-scale functional brain networks and their regions of interests (ROIs) that are involved in the comprehension of video stimulus. Then, functional connectivities between various functional ROI pairs are utilized as BIS features to characterize the brain's comprehension of video semantics. Then an effective feature selection procedure is applied to learn the most relevant features while removing redundancy, which results in the formation of the final BIS features. Afterwards, a mapping from low-level visual features to high-level semantic features in the BIS is built via the Gaussian process regression (GPR) algorithm, and a manifold structure is then inferred, in which video key frames are represented by the mapped feature vectors in the BIS. Finally, the manifold-ranking algorithm concerning the relationship among all data is applied to measure the similarity between key frames of video shots. Experimental results on the TRECVID 2005 dataset demonstrate the superiority of the proposed work in comparison with traditional methods.
Categories: Scientifical publications
Multivariate Slow Feature Analysis and Decorrelation Filtering for Blind Source SeparationWe generalize the method of Slow Feature Analysis (SFA) for vector-valued functions of several variables and apply it to the problem of blind source separation, in particular to image separation. It is generally necessary to use multivariate SFA instead of univariate SFA for separating multi-dimensional signals. For the linear case, an exact mathematical analysis is given, which shows in particular that the sources are perfectly separated by SFA if and only if they and their first-order derivatives are uncorrelated. When the sources are correlated, we apply the following technique called Decorrelation Filtering: use a linear filter to decorrelate the sources and their derivatives in the given mixture, then apply the unmixing matrix obtained on the filtered mixtures to the original mixtures. If the filtered sources are perfectly separated by this matrix, so are the original sources. A decorrelation filter can be numerically obtained by solving a nonlinear optimization problem. This technique can also be applied to other linear separation methods, whose output signals are decorrelated, such as ICA. When there are more mixtures than sources, one can determine the actual number of sources by using a regularized version of SFA with decorrelation filtering. Extensive numerical experiments using SFA and ICA with decorrelation filtering, supported by mathematical analysis, demonstrate the potential of our methods for solving problems involving blind source separation.
Categories: Scientifical publications
Parameter Estimation for Blind and Non-Blind Deblurring Using Residual Whiteness MeasuresImage deblurring (ID) is an ill-posed problem typically addressed by using regularization, or prior knowledge, on the unknown image (and also on the blur operator, in the blind case). ID is often formulated as an optimization problem, where the objective function includes a data term encouraging the estimated image (and blur, in blind ID) to explain the observed data well (typically, the squared norm of a residual) plus a regularizer that penalizes solutions deemed undesirable. The performance of this approach depends critically (among other things) on the relative weight of the regularizer (the regularization parameter) and on the number of iterations of the algorithm used to address the optimization problem. In this paper, we propose new criteria for adjusting the regularization parameter and/or the number of iterations of ID algorithms. The rationale is that if the recovered image (and blur, in blind ID) is well estimated, the residual image is spectrally white; contrarily, a poorly deblurred image typically exhibits structured artifacts (e.g., ringing, oversmoothness), yielding residuals that are not spectrally white. The proposed criterion is particularly well suited to a recent blind ID algorithm that uses continuation, i.e., slowly decreases the regularization parameter along the iterations; in this case, choosing this parameter and deciding when to stop are one and the same thing. Our experiments show that the proposed whiteness-based criteria yield improvements in SNR, on average, only 0.15 dB below those obtained by (clairvoyantly) stopping the algorithm at the best SNR. We also illustrate the proposed criteria on non-blind ID, reporting results that are competitive with state-of-the-art criteria (such as Monte Carlo-based GSURE and projected SURE), which, however, are not applicable for blind ID.
Categories: Scientifical publications
Image Processing Using Smooth Ordering of its PatchesWe propose an image processing scheme based on reordering of its patches. For a given corrupted image, we extract all patches with overlaps, refer to these as coordinates in high-dimensional space, and order them such that they are chained in the “shortest possible path,” essentially solving the traveling salesman problem. The obtained ordering applied to the corrupted image implies a permutation of the image pixels to what should be a regular signal. This enables us to obtain good recovery of the clean image by applying relatively simple one-dimensional smoothing operations (such as filtering or interpolation) to the reordered set of pixels. We explore the use of the proposed approach to image denoising and inpainting, and show promising results in both cases.
Categories: Scientifical publications
|