25.8.20
This website uses cookies to ensure you get the best experience on our website. Learn more

CCA Spark and Hadoop Developer

CCA Spark and Hadoop Developer is a performance-based certification that requires candidates to write code in Scala and Python and run it on a Cloudera Distribution of Apache Hadoop cluster.

Skills / Knowledge

  • Import data from a MySQL database into HDFS using Sqoop
  • Export data to a MySQL database from HDFS using Sqoop
  • Change the delimiter and file format of data during import using Sqoop
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Load data into and out of HDFS using the Hadoop File System (FS) commands
  • Load data from HDFS and store results back to HDFS using Spark
  • Join disparate datasets together using Spark
  • Calculate aggregate statistics (e.g., average or sum) using Spark
  • Filter data into a smaller dataset using Spark
  • Write a query that produces ranked or sorted data using Spark
  • Read and/or create a table in the Hive metastore in a given schema