input data structure
the requirements for input data structure for each advanced analytics task in MLlib.
In the case of classification and regression, you want to get your data into a column of type Double to represent the label and a column of type Vector (either dense or sparse) to represent the features.
- In the case of recommendation, you want to get your data into a column of users, a column of targets (say movies or books), and a column of ratings. 
- In the case of unsupervised learning, a column of type - Vector(either dense or sparse) is needed to represent the features.
- In the case of graph analytics, you will want a DataFrame of vertices and a DataFrame of edges. 
The best way to do this is through transformers.
PreviousTrain, validation, and test dataNextWhy are matrices/vectors used in machine learning/data analysis?
Last updated
Was this helpful?