Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data - Nature Machine Intelligence

As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex non-linear manifolds in high-dimensional space for visualization, classification, clustering and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. Here we propose a sampling-based scalable manifold learning technique that enables uniform and discriminative embedding (SUDE) for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data and then incorporates the non-landmarks into the learned space by constrained locally linear embedding. We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks and applied it to analyse single-cell data and detect anomalies in electrocardiogram signals. SUDE exhibits a distinct advantage in scalability with respect to data size and embedding dimension and shows promising performance in cluster separation, integrity and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.

Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data - Nature Machine Intelligence

POPULAR CATEGORY

corporate

entertainment

research

misc

wellness

athletics