Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the Databricks Certification Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam

Go to Exam

672 Students Passed

91% Average Score

97% Same Questions

Viewing page 2 out of 6 pages

Viewing questions 11-20 out of questions

Questions # 11:

The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame

itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

3.+------+----------------------------------+-----------------------------+-------------------+

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))

Options:

1. length

2. regexp_extract

3. upper

4. col("itemName")

5. as

1. size

2. regexp_replace

3. lower

4. "itemName"

5. alias

1. lower

2. regexp_replace

3. length

4. "itemName"

5. alias

1. length

2. regexp_replace

3. lower

4. col("itemName")

5. alias

1. size

2. regexp_extract

3. lower

4. col("itemName")

5. alias

Answer

Questions # 12:

Which of the following is a characteristic of the cluster manager?

Options:

Each cluster manager works on a single partition of data.

The cluster manager receives input from the driver through the SparkContext.

The cluster manager does not exist in standalone mode.

The cluster manager transforms jobs into DAGs.

In client mode, the cluster manager runs on the edge node.

Questions # 13:

The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.

Excerpt of DataFrame transactionsDf:

transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))

Options:

The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value wrapped in the col operator so they

should be expressed like drop(col(predError), col(productId), col(value)).

The select operator should be replaced with the deselect operator.

The column names in the select operator should not be strings and wrapped in the col operator, so they should be expressed like select(~col(predError), ~col(productId), ~col(value)).

The select operator should be replaced by the drop operator.

The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value as strings.

(Correct)

Questions # 14:

Which of the following describes a difference between Spark's cluster and client execution modes?

Options:

In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client mode.

In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.

In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.

In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.

In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.

Answer

Questions # 15:

Which of the following code blocks reorders the values inside the arrays in column attributes of DataFrame itemsDf from last to first one in the alphabet?

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

itemsDf.withColumn('attributes', sort_array(col('attributes').desc()))

itemsDf.withColumn('attributes', sort_array(desc('attributes')))

itemsDf.withColumn('attributes', sort(col('attributes'), asc=False))

itemsDf.withColumn("attributes", sort_array("attributes", asc=False))

itemsDf.select(sort_array("attributes"))

Questions # 16:

Which of the following describes a narrow transformation?

Options:

narrow transformation is an operation in which data is exchanged across partitions.

A narrow transformation is a process in which data from multiple RDDs is used.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

A narrow transformation is an operation in which data is exchanged across the cluster.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Answer

Questions # 17:

The code block displayed below contains an error. The code block should write DataFrame transactionsDf as a parquet file to location filePath after partitioning it on column storeId. Find the error.

Code block:

transactionsDf.write.partitionOn("storeId").parquet(filePath)

Options:

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

The partitionOn method should be called before the write method.

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

Column storeId should be wrapped in a col() operator.

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Questions # 18:

The code block displayed below contains an error. The code block should return a DataFrame in which column predErrorAdded contains the results of Python function add_2_if_geq_3 as applied to

numeric and nullable column predError in DataFrame transactionsDf. Find the error.

Code block:

1.def add_2_if_geq_3(x):

2. if x is None:

3. return x

4. elif x >= 3:

5. return x+2

6. return x

8.add_2_if_geq_3_udf = udf(add_2_if_geq_3)

10.transactionsDf.withColumnRenamed("predErrorAdded", add_2_if_geq_3_udf(col("predError")))

Options:

The operator used to adding the column does not add column predErrorAdded to the DataFrame.

Instead of col("predError"), the actual DataFrame with the column needs to be passed, like so transactionsDf.predError.

The udf() method does not declare a return type.

UDFs are only available through the SQL API, but not in the Python API as shown in the code block.

The Python function is unable to handle null values, resulting in the code block crashing on execution.

Questions # 19:

Which of the following describes the difference between client and cluster execution modes?

Options:

In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.

In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.

In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.

In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.

In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

Questions # 20:

The code block shown below should return a DataFrame with all columns of DataFrame transactionsDf, but only maximum 2 rows in which column productId has at least the value 2. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__).__3__