Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the Databricks Certification Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.
Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam


672 Students Passed

91% Average Score

97% Same Questions
Viewing page 4 out of 6 pages
Viewing questions 31-40 out of questions
Questions # 31:

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

Options:

A.

from pyspark import StorageLevel

transactionsDf.cache(StorageLevel.MEMORY_ONLY)

B.

transactionsDf.cache()

C.

transactionsDf.storage_level('MEMORY_ONLY')

D.

transactionsDf.persist()

E.

transactionsDf.clear_persist()

F.

from pyspark import StorageLevel

transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Questions # 32:

Which of the following statements about Spark's execution hierarchy is correct?

Options:

A.

In Spark's execution hierarchy, a job may reach over multiple stage boundaries.

B.

In Spark's execution hierarchy, manifests are one layer above jobs.

C.

In Spark's execution hierarchy, a stage comprises multiple jobs.

D.

In Spark's execution hierarchy, executors are the smallest unit.

E.

In Spark's execution hierarchy, tasks are one layer above slots.

Questions # 33:

Which of the following statements about RDDs is incorrect?

Options:

A.

An RDD consists of a single partition.

B.

The high-level DataFrame API is built on top of the low-level RDD API.

C.

RDDs are immutable.

D.

RDD stands for Resilient Distributed Dataset.

E.

RDDs are great for precisely instructing Spark on how to do a query.

Questions # 34:

Which of the following code blocks generally causes a great amount of network traffic?

Options:

A.

DataFrame.select()

B.

DataFrame.coalesce()

C.

DataFrame.collect()

D.

DataFrame.rdd.map()

E.

DataFrame.count()

Questions # 35:

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

A sample of DataFrame itemsDf is below.

Question # 35

Code block:

itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

Options:

A.

Since itemId is the index, it does not need to be an argument to the select() method.

B.

The alias() method needs to be called after the select() method.

C.

The explode() method expects a Column object rather than a string.

D.

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

E.

The split() method should be used inside the select() method instead of the explode() method.

Questions # 36:

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?

Sample of itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

A.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", StringType()),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

B.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType),

3. StructField("attributes", ArrayType(StringType)),

4. StructField("supplier", StringType)])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

C.

1.itemsDf = spark.read.schema('itemId integer, attributes , supplier string').parquet(filePath)

D.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType(StringType())),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

E.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType([StringType()])),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

Questions # 37:

Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?

Options:

A.

1.counter = 0

2.

3.for index, row in itemsDf.iterrows():

4. if 'Inc.' in row['supplier']:

5. counter = counter + 1

6.

7.print(counter)

B.

1.counter = 0

2.

3.def count(x):

4. if 'Inc.' in x['supplier']:

5. counter = counter + 1

6.

7.itemsDf.foreach(count)

8.print(counter)

C.

print(itemsDf.foreach(lambda x: 'Inc.' in x))

D.

print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

E.

1.accum=sc.accumulator(0)

2.

3.def check_if_inc_in_supplier(row):

4. if 'Inc.' in row['supplier']:

5. accum.add(1)

6.

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Questions # 38:

Which of the following is one of the big performance advantages that Spark has over Hadoop?

Options:

A.

Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.

B.

Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.

C.

Spark achieves great performance by storing data and performing computation in memory, whereas large jobs in Hadoop require a large amount of relatively slow disk I/O operations.

D.

Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.

E.

Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user-friendly API.

Questions # 39:

The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching

value in column itemId with a value in column transactionsId of DataFrame transactionsDf. Find the error.

Code block:

itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)

Options:

A.

The join statement is incomplete.

B.

The union method should be used instead of join.

C.

The join method is inappropriate.

D.

The merge method should be used instead of join.

E.

The join expression is malformed.

Questions # 40:

Which of the following is a problem with using accumulators?

Options:

A.

Only unnamed accumulators can be inspected in the Spark UI.

B.

Only numeric values can be used in accumulators.

C.

Accumulator values can only be read by the driver, but not by executors.

D.

Accumulators do not obey lazy evaluation.

E.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Viewing page 4 out of 6 pages
Viewing questions 31-40 out of questions
TOP CODES

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.