Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the Cloudera Certified Associate CCA CCA175 Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.

Exam CCA175 Premium Access

View all detail and faqs for the CCA175 exam

Go to Exam

890 Students Passed

97% Average Score

90% Same Questions

Viewing page 3 out of 3 pages

Viewing questions 21-30 out of questions

Questions # 21:

Problem Scenario 30 : You have been given three csv files in hdfs as below.

EmployeeName.csv with the field (id, name)

EmployeeManager.csv (id, manager Name)

EmployeeSalary.csv (id, Salary)

Using Spark and its API you have to generate a joined output as below and save as a text tile (Separated by comma) for final distribution and output must be sorted by id.

ld,name,salary,managerName

EmployeeManager.csv

E01,Vishnu

E02,Satyam

E03,Shiv

E04,Sundar

E05,John

E06,Pallavi

E07,Tanvir

E08,Shekhar

E09,Vinod

E10,Jitendra

EmployeeName.csv

E01,Lokesh

E02,Bhupesh

E03,Amit

E04,Ratan

E05,Dinesh

E06,Pavan

E07,Tejas

E08,Sheela

E09,Kumar

E10,Venkat

EmployeeSalary.csv

E01,50000

E02,50000

E03,45000

E04,45000

E05,50000

E06,45000

E07,50000

E08,10000

E09,10000

E10,10000

Options:

Questions # 22:

Problem Scenario 67 : You have been given below code snippet.

lines = sc.parallelize(['lts fun to have fun,','but you have to know how.'])

M = lines.map( lambda x: x.replace(',7 ').replace('.',' 'J.replaceC-V ').lower())

r2 = r1.flatMap(lambda x: x.split())

r3 = r2.map(lambda x: (x, 1))

operation1

r5 = r4.map(lambda x:(x[1],x[0]))

r6 = r5.sortByKey(ascending=False)

r6.take(20)

Write a correct code snippet for operationl which will produce desired output, shown below. [(2, 'fun'), (2, 'to'), (2, 'have'), (1, its'), (1, 'know1), (1, 'how1), (1, 'you'), (1, 'but')]

Options:

Questions # 23:

Problem Scenario 40 : You have been given sample data as below in a file called spark15/file1.txt

3070811,1963,1096,,"US","CA",,1,

3022811,1963,1096,,"US","CA",,1,56

3033811,1963,1096,,"US","CA",,1,23

Below is the code snippet to process this tile.

val field= sc.textFile("spark15/f ilel.txt")

val mapper = field.map(x=> A)

mapper.map(x => x.map(x=> {B})).collect

Please fill in A and B so it can generate below final output

Array(Array(3070811,1963,109G, 0, "US", "CA", 0,1, 0)

,Array(3022811,1963,1096, 0, "US", "CA", 0,1, 56)

,Array(3033811,1963,1096, 0, "US", "CA", 0,1, 23)

)

Options:

Questions # 24:

Problem Scenario 44 : You have been given 4 files , with the content as given below:

spark11/file1.txt

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework

spark11/file2.txt

The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

spark11/file3.txt

his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking

spark11/file4.txt

Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use Storm to transform unstructured data as it flows into a system into a desired format

(spark11Afile1.txt)

(spark11/file2.txt)

(spark11/file3.txt)

(sparkl 1/file4.txt)

Write a Spark program, which will give you the highest occurring words in each file. With their file name and highest occurring words.

Options:

Questions # 25:

Problem Scenario 4: You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

Import Single table categories (Subset data} to hive managed table , where category_id between 1 and 22

Options:

Questions # 26:

Problem Scenario 53 : You have been given below code snippet.

val a = sc.parallelize(1 to 10, 3)

operation1

b.collect

Output 1

Array[lnt] = Array(2, 4, 6, 8,10)

operation2

Output 2

Array[lnt] = Array(1,2, 3)

Write a correct code snippet for operation1 and operation2 which will produce desired output, shown above.

Options:

Questions # 27:

Problem Scenario 35 : You have been given a file named spark7/EmployeeName.csv (id,name).

EmployeeName.csv

E01,Lokesh

E02,Bhupesh

E03,Amit

E04,Ratan

E05,Dinesh

E06,Pavan

E07,Tejas

E08,Sheela

E09,Kumar

E10,Venkat

1. Load this file from hdfs and sort it by name and save it back as (id,name) in results directory. However, make sure while saving it should be able to write In a single file.