# Transformations

Every transformation creates a new RDD. One important thing to remember about RDDs is that they are lazily evaluated; when transformation is called upon them, no actual work is done right away. Only the information about the source of RDD is stored and the transformation has to be applied.

Transformations construct a new RDD from a previous one. Transformations just apply the transformation and typically will not include transferring the data across the nodes.

RDDs are lazily evaluated data structures. In short, that means there is no processing associated with calling transformations on RDDs right away.

a lot of benefits from the fact that the transformations are lazily evaluated

Some of them are that operations can be grouped together, reducing the networking between the nodes processing the data; there are no multiple passes over the same data.

there is a pitfall associated with all this

Upon user request for action, Spark will perform the calculations. If the data between the steps is not cached, Spark will reevaluate the expressions again because

RDDs are not materialized. We can instruct Spark to materialize the computed operations by calling cache or persist.

Every transformation creates a new RDD.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://nag-9-s.gitbook.io/spark-notes/resilient-distributed-dataset-rdd/transformations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
