andrewpalumbo.github.io

Mahout 1.0 Features by Engine

  Single Machine MapReduce Spark H2O Flink
Mahout Math-Scala Core Library and Scala DSL          
Mahout Distributed BLAS. Distributed Row Matrix API with R and Matlab like operators. Distributed ALS, SPCA, SSVD, thin-QR. Similarity Analysis.     x x in development
           
Mahout Interactive Shell          
Interactive REPL shell for Spark optimized Mahout DSL     x    
           
Collabritive Filtering with CLI Drivers          
User-Based Collaborative Filtering x   x    
Item-Based Collaborative Filtering x x x    
Matrix Factorization with ALS x x      
Matrix Factorization with ALS on Implicit Feedback x x      
Weighted Matrix Factorization, SVD++ x        
           
Classification with CLI Drivers          
Logistic Regression - trained via SGD x        
Naive Bayes / Complementary Naive Bayes   x in development in development  
Random Forest   x      
Hidden Markov Models - single machine x        
Multilayer Perceptron - single machine x        
           
Clustering with CLI Drivers          
Canopy Clustering deprecated deprecated      
k-Means Clustering x x      
Fuzzy k-Means x x      
Streaming k-Means x x      
Spectral Clustering   x      
           
Dimensionality Reduction with CLI Drivers - note: most scala-based dimensionality reduction algorithms are available through the Math-Scala Core Library for all engines          
Singular Value Decomposition x x      
Lanczos Algorithm x x      
Stochastic SVD x x      
PCA (via Stochastic SVD) x x      
QR Decomposition x x      
           
Topic Models          
Latent Dirichlet Allocation x x      
           
Miscellaneous          
RowSimilarityJob   x x    
ConcatMatrices   x      
Collocations   x      
Sparse TF-IDF Vectors from Text   x      
XML Parsing   x      
Email Archive Parsing   x      
Lucene Integration   x      
Evolutionary Processes x