Generalized K-Means Clustering

https://news.ycombinator.com/rss Hits: 2
Summary

Generalized K-Means Clustering Security: This project follows security best practices. See SECURITY.md for vulnerability reporting and dependabot.yml for automated dependency updates. 馃啎 DataFrame API (Spark ML) is the default. Version 0.6.0 introduces a modern, RDD-free DataFrame-native API with Spark ML integration. See DataFrame API Examples for end-to-end usage. This project generalizes K-Means to multiple Bregman divergences and advanced variants (Bisecting, X-Means, Soft/Fuzzy, Streaming, K-Medians, K-Medoids). It provides: A DataFrame/ML API (recommended), and A legacy RDD API kept for backwards compatibility (archived below). What's in here Multiple divergences: Squared Euclidean, KL, Itakura鈥揝aito, L1/Manhattan (K-Medians), Generalized-I, Logistic-loss Variants: Bisecting, X-Means (BIC/AIC), Soft K-Means, Structured-Streaming K-Means, K-Medoids (PAM/CLARA) Scale: Tested on tens of millions of points in 700+ dimensions Tooling: Scala 2.13 (primary) / 2.12, Spark 4.0.x / 3.5.x / 3.4.x Spark 4.0.x : Scala 2.13 only (Scala 2.12 support dropped in Spark 4.0) Spark 3.x : Both Scala 2.13 and 2.12 supported Quick Start (DataFrame API) Recommended for all new projects. The DataFrame API follows the Spark ML Estimator/Model pattern. import org . apache . spark . ml . linalg . Vectors import com . massivedatascience . clusterer . ml . GeneralizedKMeans val df = spark.createDataFrame( Seq ( Tuple1 ( Vectors .dense( 0.0 , 0.0 )), Tuple1 ( Vectors .dense( 1.0 , 1.0 )), Tuple1 ( Vectors .dense( 9.0 , 8.0 )), Tuple1 ( Vectors .dense( 8.0 , 9.0 )) )).toDF( " features " ) val gkm = new GeneralizedKMeans () .setK( 2 ) .setDivergence( " kl " ) // "squaredEuclidean", "itakuraSaito", "l1", "generalizedI", "logistic" .setAssignmentStrategy( " auto " ) // "auto" | "crossJoin" (SE fast path) | "broadcastUDF" (general Bregman) .setMaxIter( 20 ) val model = gkm.fit(df) val pred = model.transform(df) pred.show( false ) More recipes: see DataFrame API Examples. What CI Validates Our com...

First seen: 2025-10-26 08:00

Last seen: 2025-10-26 09:01