The Minimum Volume Embedding Algorithm

Minimum Volume Embedding (referred to herein as MVE) is a state-of-the-art method for summarizing large, high-dimensional data compactly (Shaw and Jebara, 2007). Subsequently, processing complex datasets with MVE makes it easy to perform many interesting analytics operations. For instance, finding clusters, subgroups, correlations, and categories become much easier when data is low-dimensional – as do making predictions and forecasts. Since such operations are difficult to perform with high-dimensional data (the so-called "curse of dimensionality") MVE is a vital step in creating a data analytics platform. In addition, MVE makes data more compact, thereby permitting efficient data storage and data retrieval.

The MVE procedure can be decomposed into the 5 steps detailed below. Assume we have N high-dimensional items or objects. Each object may be a person, with their own behavioral data or movement trail. Alternatively, the object may be a place, with its historical traffic density distributed over the work-week. The input to MVE is N large complex objects of dimensionality D. The output of MVE is a set of N low-dimensional summaries of the objects, for instance of dimensionality d. In most applications, the desired dimension d is much smaller than the data's original dimensionality D. For instance, the inputs may have hundreds of thousands of dimensions while the output of MVE is a two- or three-dimensional summary of each object.

Step 1: MVE begins by calculating the raw similarity between all pairs of objects. This produces a matrix of size N times N, where each entry is the similarity score between object A and object B. Any symmetric similarity measure can be used. A standard measure is the generalized inner product between object A and object B (also known as a "kernel"). Another example of the standard measure of similarity is the Bhattacharyya affinity between two densities, such as the similarity between the weekly traffic density at place A and place B.

Step 2: MVE then finds the most important pairs of objects that exhibit an unusually high level of similarity and "connects" them. For instance, we build a connection between all pairs of objects A and B if they are more than 60% or 70% similar. Another approach is to pick a few of the most similar objects for each object under consideration – in other words, find the k most similar neighbors for each object where k=2, 3, 4 or more. This produces a sparse set of connections, or important similarity relationship, between the objects in our database that should be preserved. The MVE algorithm will guarantee that such pairs of similar objects will remain similar after the data becomes compressed and low-dimensional.

Step 3: MVE then recovers a low-dimensional version (or caricature) of the original N objects where all the pairwise relationships that were deemed important still remain. This is done by minimizing the dimensionality of the dataset while ensuring that all the pairwise connections in the original data maintain similar similarity levels in the low-dimensional representation of the data. In other words, pairs of objects that were connected and were more than 60% similar, remain 60% similar even after the dimensionality is reduced. This minimization is solved by finding another matrix of size N times N that has low dimensionality but still mimics the similarities and distances that were observed in the original data matrix. The MVE algorithm measures dimensionality as the ratio of the sum of the top d eigenvalues of its solution divided by the total N eigenvalues of the data matrix. The optimization performed is a semi-definite program – a constrained minimization problem that minimizes the dimensionality subject to the constraint that all connected objects preserve their local similarities. Standard semidefinite programming (SDP) as well as singular value decomposition (SVD) solvers are used iteratively to recover the MVE solution. Both commercial and public domain SDP and SVD solvers can be used.

Step 4: Given the final MVE data matrix of size N times N, the low-dimensional representation of the data is recovered by a final singular value decomposition (SVD). The leading d eigenvectors of the matrix are used as the coordinates of each object in the low-dimensional space MVE produces. It is important to note that the MVE data matrix produces a much better solution than simply applying SVD to the original data matrix, which may need up to D dimensions to faithfully preserve the important connections between pairs of objects in the database. This latter method is commonly referred to as Principal Components Analysis (PCA), which has been the most popular method in dimensionality reduction until recently.

Step 5: Once Step 4 completes, MVE has produced a low-dimensional version of the original data which now occupies d dimensions. The N low dimensional objects are now easily processed by straightforward automatic algorithms. For instance, clustering is performed by grouping the low dimensional versions of the N objects using a standard algorithm such as the k-means procedure. Alternatively, the user may simply visualize the data and then manually identify the clusters of interest, outliers and important patterns – since a human can easily visualize and manipulate two and three-dimensional data.

It is important to note that other procedures may be used to replace Steps 1 and 2 if these intermediate computations are available as ground-truth or from an external (covariate) source. For instance, MVE may be directly provided with the N times N data matrix instead of the original N high-dimensional objects. It is straightforward to then launch the MVE procedure in Step 2 directly. Similarly, if we are given the connectivity from a known network, Step 2 can be skipped since the connected pairs are given.

T. Jebara. "Bayesian Out-Trees" Uncertainty in Artificial Intelligence, UAI, July 2008

T. Jebara. "Out-Tree Dependent Nonparametric Bayesian Inference" (Version still not finalized)
Workshop on Nonparameteric Bayes, July 2008

J. Wang, T. Jebara and S.F. Chang. "Graph Transduction via Alternating Minimization" International Conference on Machine Learning, ICML, July 2008

T. Jebara. "Learning from Out-Tree Dependent Data" Snowbird Machine Learning Workshop, April 2008

 

 
     

Macrosense
Macrosense Login

Citysense
Citysense Site

Technology
Machine Learning
MVE Algorithm

Principles

Media Center
Press Coverage

About Us
Executive Team
Advisors
Careers
Contact