Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Home
Databricks
Databricks Certification
Databricks-Certified-Professional-Data-Scientist
Databricks Certified Professional Data Scientist Exam Questions and Answers

Pass the Databricks Certification Databricks-Certified-Professional-Data-Scientist Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.

Exam Databricks-Certified-Professional-Data-Scientist Premium Access

View all detail and faqs for the Databricks-Certified-Professional-Data-Scientist exam

Go to Exam

759 Students Passed

90% Average Score

98% Same Questions

Viewing page 4 out of 5 pages

Viewing questions 31-40 out of questions

Questions # 31:

A data scientist is asked to implement an article recommendation feature for an on-line magazine.

The magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only the style and subject matter of the current article is available for making recommendations. All of the magazine's articles are stored in a database in a format suitable for analytics.

Which method should the data scientist try first?

Options:

K Means Clustering

Naive Bayesian

Logistic Regression

Association Rules

Answer

Explanation

kmeans uses an iterative algorithm that minimizes the sum of distances from each object to its cluster centroid, over all clusters. This algorithm moves objects between clusters until the sum cannot be decreased further. The result is a set of clusters that are as compact and well-separated as possible. You can control the details of the minimization using several optional input parameters to kmeans, including ones for the initial values of the cluster centroids, and for the maximum number of iterations.

Clustering is primarily an exploratory technique to discover hidden structures of the data: possibly as a prelude to more focused analysis or decision processes. Some specific applications of k-means are image processing^ medical and customer segmentation. Clustering is often used as a lead-in to classification. Once the clusters are identified,

labels can be applied to each cluster to classify each group based on its characteristics. Marketing and sales groups use k-means to better identify customers who have similar behaviors and spending patterns.

Questions # 32:

Select the correct statement which applies to logistic regression

Options:

Computationally inexpensive, easy to implement knowledge representation easy to interpret

May have low accuracy

Works with Numeric values

Answer

A, B, C

Questions # 33:

Assume some output variable "y" is a linear combination of some independent input variables "A" plus some independent noise "e". The way the independent variables are combined is defined by a parameter vector B y=AB+e where X is an m x n matrix. B is a vector of n unknowns, and b is a vector of m values. Assuming that m is not equal to n and the columns of X are linearly independent, which expression correctly solves for B?

Question # 33

Options:

Option A

Option B

Option C

Option D

Answer

Explanation

This is the standard solution of the normal equations for linear regression. Because A is not square, you cannot simply take its inverse.

Questions # 34:

You are asked to create a model to predict the total number of monthly subscribers for a specific magazine. You are provided with 1 year's worth of subscription and payment data, user demographic data, and 10 years worth of content of the magazine (articles and pictures). Which algorithm is the most appropriate for building a predictive model for subscribers?

Options:

Linear regression

Logistic regression

Decision trees

TF-IDF

Answer

Explanation

Explanation : A data model explicitly describes a relationship between predictor and response variables. Linear regression fits a data model that is linear in the model coefficients. The most common type of linear regression is a least-squares fit, which can fit both lines and polynomials, among other linear models.

Before you model the relationship between pairs of quantities, it is a good idea to perform correlation analysis to establish if a linear relationship exists between these quantities. Be aware that variables can have nonlinear relationships, which correlation analysis cannot detect. For more information, see Linear Correlation.

If you need to fit data with a nonlinear model, transform the variables to make the relationship linear. Alternatively try to fit a nonlinear function directly using either the Statistics and Machine Learning Toolbox nlinfit function, the Optimization Toolbox Isqcurvefit function, or by applying functions in the Curve Fitting Toolbox.

Questions # 35:

While working with Netflix the movie rating websites you have developed a recommender system that has produced ratings predictions for your data set that are consistently exactly 1 higher for the user-item pairs in your dataset than the ratings given in the dataset. There are n items in the dataset. What will be the calculated RMSE of your recommender system on the dataset?

Options:

n/2

Answer

Explanation

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values predicted by a model or an estimator and the values actually observed. Basically, the RMSD represents the sample standard deviation of the differences between predicted values and observed values. These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent. RMSE is calculated as the square root of the mean of the squares of the errors. The error in every case in this example is 1. The square of 1 is 1 The average of n items with value 1 is 1 The square root of 1 is 1 The RMSE is therefore 1

Questions # 36:

Select the statement which applies correctly to the Naive Bayes

Options:

Works with a small amount of data

Sensitive to how the input data is prepared

Works with nominal values

Answer

A, B, C

Questions # 37:

You are doing advanced analytics for the one of the medical application using the regression and you have two variables which are weight and height and they are very important input variables, which cannot be ignored and they are also highly co-related. What is the best solution for that?

Options:

You will take cube root of height

You will take square root of weight

You will take square of the height.

You would consider using BMI (Body Mass Index)

Answer

Explanation

If multiple variables are highly co-related then it is better you consider using the either of the variable which correlates more (which is not in the given option) or go for the new variable which is a function of the both the variable in this case it could be BMI (Body Mass Index). Because it is a function of both weight and height as per the below formula. BMI = Weight/(Height * Height)

Questions # 38:

A bio-scientist is working on the analysis of the cancer cells. To identify whether the cell is cancerous or not, there has been hundreds of tests are done with small variations to say yes to the problem. Given the test result for a sample of healthy and cancerous cells, which of the following technique you will use to determine whether a cell is healthy?

Options:

Linear regression

Collaborative filtering

Naive Bayes

Identification Test

Answer

Explanation

In this problem you have been given high-dimensional independent variables like yes, no: test results etc. and you have to predict either valid or not valid (One of two). So all of the below technique can be applied to this problem.

Support vector machines Naive Bayes Logistic regression Random decision forests

Questions # 39:

The method based on principal component analysis (PCA) evaluates the features according to

Options:

The projection of the largest eigenvector of the correlation matrix on the initial dimensions

According to the magnitude of the components of the discriminate vector

The projection of the smallest eigenvector of the correlation matrix on the initial dimensions

None of the above

Answer

Explanation

Feature Selection:

The method based on principal component analysis (PCA) evaluates the features

according to the projection of the largest eigenvector of the correlation matrix on the

initial dimensions, the method based on Fisher's linear discriminate analysis

evaluates. Them according to the magnitude of the components of the discriminate

vector.

Questions # 40:

In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

Options:

Discovery

Data Preparation

Model Building

Communicate Results

Answer

Viewing page 4 out of 5 pages

Viewing questions 31-40 out of questions

Modal title

Registered Required

In order to participate in the comments you need to be logged-in.
You can sign-up or login (it's free).

TOP CODES

Top selling exam codes in the certification world, popular, in demand and updated to help you pass on the first try.

2V0-11.25

ADM-201

Agentforce-Specialist

CMMC-CCP

Data-Cloud-Consultant

PDI

PSE-Strata-Pro-24

Secure-Software-Design

Sharing-and-Visibility-Architect

Workday-Pro-Integrations

ZDTA