Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the Databricks Certification Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.
Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 exam


807 Students Passed

85% Average Score

96% Same Questions
Viewing page 2 out of 5 pages
Viewing questions 11-20 out of questions
Questions # 11:

A data scientist is analyzing a large dataset and has written a PySpark script that includes several transformations and actions on a DataFrame. The script ends with acollect()action to retrieve the results.

How does Apache Spark™'s execution hierarchy process the operations when the data scientist runs this script?

Options:

A.

The script is first divided into multiple applications, then each application is split into jobs, stages, and finally tasks.

B.

The entire script is treated as a single job, which is then divided into multiple stages, and each stage is further divided into tasks based on data partitions.

C.

Thecollect()action triggers a job, which is divided into stages at shuffle boundaries, and each stage is split into tasks that operate on individual data partitions.

D.

Spark creates a single task for each transformation and action in the script, and these tasks are grouped into stages and jobs based on their dependencies.

Questions # 12:

A data analyst builds a Spark application to analyze finance data and performs the following operations:filter,select,groupBy, andcoalesce.

Which operation results in a shuffle?

Options:

A.

groupBy

B.

filter

C.

select

D.

coalesce

Questions # 13:

An engineer has two DataFrames: df1 (small) and df2 (large). A broadcast join is used:

python

CopyEdit

frompyspark.sql.functionsimportbroadcast

result = df2.join(broadcast(df1), on='id', how='inner')

What is the purpose of using broadcast() in this scenario?

Options:

Options:

A.

It filters the id values before performing the join.

B.

It increases the partition size for df1 and df2.

C.

It reduces the number of shuffle operations by replicating the smaller DataFrame to all nodes.

D.

It ensures that the join happens only when the id values are identical.

Questions # 14:

A data scientist is working on a large dataset in Apache Spark using PySpark. The data scientist has a DataFramedfwith columnsuser_id,product_id, andpurchase_amountand needs to perform some operations on this data efficiently.

Which sequence of operations results in transformations that require a shuffle followed by transformations that do not?

Options:

A.

df.filter(df.purchase_amount > 100).groupBy("user_id").sum("purchase_amount")

B.

df.withColumn("discount", df.purchase_amount * 0.1).select("discount")

C.

df.withColumn("purchase_date", current_date()).where("total_purchase > 50")

D.

df.groupBy("user_id").agg(sum("purchase_amount").alias("total_purchase")).repartition(10)

Questions # 15:

A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately.

Which two characteristics of Apache Spark's execution model explain this behavior?

Choose 2 answers:

Options:

A.

The Spark engine requires manual intervention to start executing transformations.

B.

Only actions trigger the execution of the transformation pipeline.

C.

Transformations are executed immediately to build the lineage graph.

D.

The Spark engine optimizes the execution plan during the transformations, causing delays.

E.

Transformations are evaluated lazily.

Questions # 16:

A data engineer needs to persist a file-based data source to a specific location. However, by default, Spark writes to the warehouse directory (e.g., /user/hive/warehouse). To override this, the engineer must explicitly define the file path.

Which line of code ensures the data is saved to a specific location?

Options:

Options:

A.

users.write(path="/some/path").saveAsTable("default_table")

B.

users.write.saveAsTable("default_table").option("path", "/some/path")

C.

users.write.option("path", "/some/path").saveAsTable("default_table")

D.

users.write.saveAsTable("default_table", path="/some/path")

Questions # 17:

A data engineer replaces the exact percentile() function with approx_percentile() to improve performance, but the results are drifting too far from expected values.

Which change should be made to solve the issue?

Question # 17

Options:

A.

Decrease the first value of the percentage parameter to increase the accuracy of the percentile ranges

B.

Decrease the value of the accuracy parameter in order to decrease the memory usage but also improve the accuracy

C.

Increase the last value of the percentage parameter to increase the accuracy of the percentile ranges

D.

Increase the value of the accuracy parameter in order to increase the memory usage but also improve the accuracy

Questions # 18:

A Data Analyst is working on the DataFramesensor_df, which contains two columns:

Which code fragment returns a DataFrame that splits therecordcolumn into separate columns and has one array item per row?

A)

Question # 18

B)

Question # 18

C)

Question # 18

D)

Question # 18

Options:

A.

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select("record_datetime", "sensor_id", "status", "health")

B.

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

C.

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

D.

exploded_df = exploded_df.select("record_datetime", "record_exploded")

Questions # 19:

Which Spark configuration controls the number of tasks that can run in parallel on the executor?

Options:

Options:

A.

spark.executor.cores

B.

spark.task.maxFailures

C.

spark.driver.cores

D.

spark.executor.memory

Questions # 20:

A data engineer needs to write a Streaming DataFrame as Parquet files.

Given the code:

Question # 20

Which code fragment should be inserted to meet the requirement?

A)

Question # 20

B)

Question # 20

C)

Question # 20

D)

Question # 20

Which code fragment should be inserted to meet the requirement?

Options:

A.

.format("parquet")

.option("location", "path/to/destination/dir")

B.

CopyEdit

.option("format", "parquet")

.option("destination", "path/to/destination/dir")

C.

.option("format", "parquet")

.option("location", "path/to/destination/dir")

D.

.format("parquet")

.option("path", "path/to/destination/dir")

Viewing page 2 out of 5 pages
Viewing questions 11-20 out of questions
TOP CODES

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.