Weekend Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple70

Pass the Databricks ML Data Scientist Databricks-Machine-Learning-Professional Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.
Exam Databricks-Machine-Learning-Professional Premium Access

View all detail and faqs for the Databricks-Machine-Learning-Professional exam


482 Students Passed

85% Average Score

97% Same Questions
Viewing page 1 out of 2 pages
Viewing questions 1-10 out of questions
Questions # 1:

A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.

Which of the following deployment strategies can be used to meet these requirements?

Options:

A.

Edge/on-device

B.

Streaming

C.

None of these strategies will meet the requirements.

D.

Batch

E.

Real-time

Questions # 2:

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

Options:

A.

MLflow APIs

B.

AutoML APIs

C.

MLflow Client

D.

Jobs cannot be created programmatically

E.

Databricks REST APIs

Questions # 3:

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.

Which of the following tools can the machine learning engineer use to assess their theory?

Options:

A.

Kolmogorov-Smirnov (KS) test

B.

One-way Chi-squared Test

C.

Two-way Chi-squared Test

D.

Jenson-Shannon distance

E.

None of these

Questions # 4:

Which of the following MLflow operations can be used to delete a model from the MLflow Model Registry?

Options:

A.

client.transition_model_version_stage

B.

client.delete_model_version

C.

client.update_registered_model

D.

client.delete_model

E.

client.delete_registered_model

Questions # 5:

Which of the following Databricks-managed MLflow capabilities is a centralized model store?

Options:

A.

Models

B.

Model Registry

C.

Model Serving

D.

Feature Store

E.

Experiments

Questions # 6:

A data scientist has developed and logged a scikit-learn random forest model model, and then they ended their Spark session and terminated their cluster. After starting a new cluster, they want to review the feature_importances_ of the original model object.

Which of the following lines of code can be used to restore the model object so that feature_importances_ is available?

Options:

A.

mlflow.load_model(model_uri)

B.

client.list_artifacts(run_id)["feature-importances.csv"]

C.

mlflow.sklearn.load_model(model_uri)

D.

This can only be viewed in the MLflow Experiments UI

E.

client.pyfunc.load_model(model_uri)

Questions # 7:

After a data scientist noticed that a column was missing from a production feature set stored as a Delta table, the machine learning engineering team has been tasked with determining when the column was dropped from the feature set.

Which of the following SQL commands can be used to accomplish this task?

Options:

A.

VERSION

B.

DESCRIBE

C.

HISTORY

D.

DESCRIBE HISTORY

E.

TIMESTAMP

Questions # 8:

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has alreadytuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

Options:

A.

Z-Ordering

B.

Bin-packing

C.

Write as a Parquet file

D.

Data skipping

E.

Tuning the file size

Questions # 9:

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

Options:

A.

Jensen-Shannon test

B.

Summary statistics trends

C.

Chi-squared test

D.

None of these can be used to monitor feature drift

E.

Kolmogorov-Smirnov (KS) test

Questions # 10:

A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in themodel_urivariable and its Run ID in therun_idvariable. They have also determined that the model was logged with the name"model". Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name"best_model".

Which of the following lines of code can they use to register the model to the MLflow Model Registry?

Options:

A.

mlflow.register_model(model_uri, "best_model")

B.

mlflow.register_model(run_id, "best_model")

C.

mlflow.register_model(f"runs:/{run_id}/best_model", "model")

D.

mlflow.register_model(model_uri, "model")

E.

mlflow.register_model(f"runs:/{run_id}/model")

Viewing page 1 out of 2 pages
Viewing questions 1-10 out of questions
TOP CODES

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.