Summer Certification Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = getmirror

Pass the Cloudera Certified Associate CCA CCA175 Questions and answers with ExamsMirror

Practice at least 50% of the questions to maximize your chances of passing.

Exam CCA175 Premium Access

View all detail and faqs for the CCA175 exam

Go to Exam

852 Students Passed

94% Average Score

95% Same Questions

Viewing page 2 out of 3 pages

Viewing questions 11-20 out of questions

Questions # 11:

Problem Scenario 91 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}

{"first_name":"Amir", "last_name":"Khan"}

{"first_name":"Rajesh", "last_name":"Khanna"}

{"first_name":"Priynka", "last_name":"Chopra"}

{"first_name":"Kareena", "last_name":"Kapoor"}

{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json tile locally.

2. Load this tile on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

Options:

Questions # 12:

Problem Scenario 81 : You have been given MySQL DB with following details. You have been given following product.csv file

product.csv

productID,productCode,name,quantity,price

1001,PEN,Pen Red,5000,1.23

1002,PEN,Pen Blue,8000,1.25

1003,PEN,Pen Black,2000,1.25

1004,PEC,Pencil 2B,10000,0.48

1005,PEC,Pencil 2H,8000,0.49

1006,PEC,Pencil HB,0,9999.99

Now accomplish following activities.

1. Create a Hive ORC table using SparkSql

2. Load this data in Hive table.

3. Create a Hive parquet table using SparkSQL and load data in it.

Options:

Answer

Answer:

See the explanation for Step by Step Solution and configuration.

Questions # 13:

Problem Scenario 6 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Compression Codec : org.apache.hadoop.io.compress.SnappyCodec

Please accomplish following.

1. Import entire database such that it can be used as a hive tables, it must be created in default schema.

2. Also make sure each tables file is partitioned in 3 files e.g. part-00000, part-00002, part-00003

3. Store all the Java files in a directory called java_output to evalute the further

Options:

Questions # 14:

Problem Scenario 55 : You have been given below code snippet.

val pairRDDI = sc.parallelize(List( ("cat",2), ("cat", 5), ("book", 4),("cat", 12))) val pairRDD2 = sc.parallelize(List( ("cat",2), ("cup", 5), ("mouse", 4),("cat", 12)))

operation1

Write a correct code snippet for operationl which will produce desired output, shown below.

Array[(String, (Option[lnt], Option[lnt]))] = Array((book,(Some(4},None)), (mouse,(None,Some(4))), (cup,(None,Some(5))), (cat,(Some(2),Some(2)), (cat,(Some(2),Some(12))), (cat,(Some(5),Some(2))), (cat,(Some(5),Some(12))), (cat,(Some(12),Some(2))), (cat,(Some(12),Some(12)))J

Options:

Questions # 15:

Problem Scenario 7 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following.

1. Import department tables using your custom boundary query, which import departments between 1 to 25.

2. Also make sure each tables file is partitioned in 2 files e.g. part-00000, part-00002

3. Also make sure you have imported only two columns from table, which are department_id,department_name

Options:

Questions # 16:

Problem Scenario 33 : You have given a files as below.

spark5/EmployeeName.csv (id,name)

spark5/EmployeeSalary.csv (id,salary)

Data is given below:

EmployeeName.csv

E01,Lokesh

E02,Bhupesh

E03,Amit

E04,Ratan

E05,Dinesh

E06,Pavan

E07,Tejas

E08,Sheela

E09,Kumar

E10,Venkat

EmployeeSalary.csv

E01,50000

E02,50000

E03,45000

E04,45000

E05,50000

E06,45000

E07,50000

E08,10000

E09,10000

E10,10000

Now write a Spark code in scala which will load these two tiles from hdfs and join the same, and produce the (name.salary) values.

And save the data in multiple tile group by salary (Means each file will have name of employees with same salary). Make sure file name include salary as well.

Options:

Questions # 17:

Problem Scenario 45 : You have been given 2 files , with the content as given Below

(spark12/technology.txt)

(spark12/salary.txt)

(spark12/technology.txt)

first,last,technology

Amit,Jain,java

Lokesh,kumar,unix

Mithun,kale,spark

Rajni,vekat,hadoop

Rahul,Yadav,scala

(spark12/salary.txt)

first,last,salary

Amit,Jain,100000

Lokesh,kumar,95000

Mithun,kale,150000

Rajni,vekat,154000

Rahul,Yadav,120000

Write a Spark program, which will join the data based on first and last name and save the joined results in following format, first Last.technology.salary

Options:

Questions # 18:

Problem Scenario 87 : You have been given below three files

product.csv (Create this file in hdfs)

productID,productCode,name,quantity,price,supplierid

1001,PEN,Pen Red,5000,1.23,501

1002,PEN,Pen Blue,8000,1.25,501

1003,PEN,Pen Black,2000,1.25,501

1004,PEC,Pencil 2B,10000,0.48,502

1005,PEC,Pencil 2H,8000,0.49,502

1006,PEC,Pencil HB,0,9999.99,502

2001,PEC,Pencil 3B,500,0.52,501

2002,PEC,Pencil 4B,200,0.62,501

2003,PEC,Pencil 5B,100,0.73,501

2004,PEC,Pencil 6B,500,0.47,502

supplier.csv

supplierid,name,phone

501,ABC Traders,88881111

502,XYZ Company,88882222

503,QQ Corp,88883333

products_suppliers.csv

productID,supplierID

2001,501

2002,501

2003,501

2004,502

2001,503

Now accomplish all the queries given in solution.

Select product, its price , its supplier name where product price is less than 0.6 using SparkSQL

Options:

Answer

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation

Solution :

Step 1:

hdfs dfs -mkdir sparksql2

hdfs dfs -put product.csv sparksq!2/

hdfs dfs -put supplier.csv sparksql2/

hdfs dfs -put products_suppliers.csv sparksql2/

Step 2 : Now in spark shell

// this Is used to Implicitly convert an RDD to a DataFrame.

import sqlContext.impIicits._

// Import Spark SQL data types and Row.

import org.apache.spark.sql._

// load the data into a new RDD

val products = sc.textFile("sparksql2/product.csv")

val supplier = sc.textFileC'sparksq^supplier.csv")

val prdsup = sc.textFile("sparksql2/products_suppliers.csv"}

// Return the first element in this RDD

products.fi rst()

supplier.first{).

prdsup.first()

//define the schema using a case class

case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price: Float, supplierid:lnteger)

case class Suplier(supplierid: Integer, name: String, phone: String)

case class PRDSUP(productid: Integer.supplierid: Integer)

// create an RDD of Product objects

val prdRDD = products.map(_.split('\")).map(p => Product(p(0).tolnt,p(1),p(2),p(3).tolnt,p(4).toFloat,p(5).toint))

val supRDD = supplier.map(_.split(",")).map(p => Suplier(p(0).tolnt,p(1),p(2)))

val prdsupRDD = prdsup.map(_.split(",")).map(p => PRDSUP(p(0).tolnt,p(1}.tolnt}}

prdRDD.first()

prdRDD.count()

supRDD.first() supRDD.count()

prdsupRDD.first() prdsupRDD.count(}

// change RDD of Product objects to a DataFrame

val prdDF = prdRDD.toDF()

val supDF = supRDD.toDF()

val prdsupDF = prdsupRDD.toDF()

// register the DataFrame as a temp table prdDF.registerTempTablef'products")

supDF.registerTempTablef'suppliers")

prdsupDF.registerTempTablef'productssuppliers"}

//Select product, its price , its supplier name where product price is less than 0.6

val results = sqlContext.sql(......SELECT products.name, price, suppliers.name as sup_name FROM products JOIN suppliers ON products.supplierlD= suppliers.supplierlD WHERE price < 0.6......]

results. show()

Questions # 19:

Problem Scenario 24 : You have been given below comma separated employee information.

Data Set:

name,salary,sex,age

alok,100000,male,29

jatin,105000,male,32

yogesh,134000,male,39

ragini,112000,female,35

jyotsana,129000,female,39

valmiki,123000,male,29

Requirements:

Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.

1. Create a flume conf file using fastest channel, which write data in hive warehouse directory, in a table called flumemaleemployee (Create hive table as well tor given data).

2. While importing, make sure only male employee data is stored.

Options:

Answer

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation

Step 1 : Create hive table for flumeemployee.'

CREATE TABLE flumemaleemployee

(

name string,

salary int,

sex string,

age int

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf.

#Define source , sink, channel and agent.

agent1 .sources = source1

agent1 .sinks = sink1

agent1 .channels = channel1

# Describe/configure source1

agent1 .sources.source1.type = netcat

agent1 .sources.source1.bind = 127.0.0.1

agent1.sources.sourcel.port = 44444

#Define interceptors

agent1.sources.source1.interceptors=il

agent1 .sources.source1.interceptors.i1.type=regex_filter

agent1 .sources.source1.interceptors.i1.regex=female

agent1 .sources.source1.interceptors.i1.excludeEvents=true

## Describe sink1

agent1 .sinks, sinkl.channel = memory-channel

agent1.sinks.sink1.type = hdfs

agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee

hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text

agentl .sinks.sink1.hdfs.fileType = Data Stream

# Now we need to define channel1 property.

agent1.channels.channel1.type = memory

agent1.channels.channell.capacity = 1000

agent1.channels.channel1.transactionCapacity = 100

# Bind the source and sink to the channel

agent1 .sources.source1.channels = channel1

agent1 .sinks.sink1.channel = channel1

step 3 : Run below command which will use this configuration file and append data in hdfs.

Start flume service:

flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl

Step 4 : Open another terminal and use the netcat service, nc localhost 44444

Step 5 : Enter data line by line.

alok,100000,male,29

jatin,105000,male,32

yogesh,134000,male,39

ragini,112000,female,35

jyotsana,129000,female,39

valmiki.123000.male.29

Step 6 : Open hue and check the data is available in hive table or not.

Step 7 : Stop flume service by pressing ctrl+c

Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;

Questions # 20:

Problem Scenario 15 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. In mysql departments table please insert following record. Insert into departments values(9999, '"Data Science"1);

2. Now there is a downstream system which will process dumps of this file. However, system is designed the way that it can process only files if fields are enlcosed in(') single quote and separate of the field should be (-} and line needs to be terminated by : (colon).

3. If data itself contains the " (double quote } than it should be escaped by \.

4. Please import the departments table in a directory called departments_enclosedby and file should be able to process by downstream system.