# When and why should you use MLlib (versus scikit-learn versus TensorFlow versus foo package)

There are numerous tools for performing machine learning on a single-machine, and while there are several great options to choose from, these single machine tools do have their limits either in terms of the size of data you can train on or the processing time. This means single-machine tools ( Scikit , TensorFlow) are *complementary* tools, not competitive ones. When you hit those scalability issues, take advantage of Spark’s abilities.

There are **two key use cases where you want to leverage Spark’s ability to scale.** **Firstly, you want to leverage Spark for preprocessing and feature generation to reduce the amount of time it might take to produce training and test sets from a large amount of data**. Then you might leverage single-machine learning algorithms to train on those given data sets. **Secondly, when your input data or model size become too difficult or inconvenient to put on one machine, use Spark to do the heavy lifting. Spark makes big data machine learning simple.**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://nag-9-s.gitbook.io/machine-learning/mlib/when-and-why-should-you-use-mllib-versus-scikit-learn-versus-tensorflow-versus-foo-package.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
