Install, Setup and Manage Google Cloud SDK to Use Python From Anaconda

As a power user of Google Cloud Platform, you definately need to use gcloud, gsutil and bq commands to work with GCP, which means you need to install Google Cloud SDK on your local computer. You can install the Cloud SDK through many options, including versioned archives, installer, apt-get/yum for Linux distro, and even Docker image. This post describes the process of installing the Cloud SDK through versioned archive on operating systems that have already installed Python through Anaconda. The process has been tested on both Windows 10 and Ubuntu 18.04. Read More...

Install Hortonworks HDP 3.1.0 on A Cluster of VMWare Virtual Machines

Hortonworks Tutorials

This post describes the process to install Hortontworks HDP 3.1.0 on a cluster of three VMWare virtual machines. The process includes four major steps: 1) set up the cluster environemnt; 2) set up a local repository for both Ambari and HDP stacks; 3) Install Ambari server and agent; 4) install, configure and deploy the cluster. This installation process might work for other versions too. Please check the product versions through Hortonworks support matrix: https://supportmatrix.hortonworks.com/ Read More...

Set Up Scala Development Environment for Apache Spark in Standalone Mode

Apache Spark Tutorials

With the Apache Spark installed through the steps described in last post, this post will introduce you the steps to set up a Scala development environment for Spark and build a WordCount application through Maven and SBT. Althrough Spark can be programmed with either Java, Scala, or Python, this post will focus on Scala. There are a couple of reasons: 1) Spark itself is written in Scala; 2) Scala’s functional programming model is a good fit for distributed processing, thus has less code and boilerplate stuff than Java; 3) Scala compiles to Java bytecode, which gives faster performance than Python. Read More...