Skip to content

There are No Mistakes in Life, Only Lessons..!!!

  • HOME
  • ABOUT
  • BIG DATA
  • MICROSOFT AZURE
  • CLOUD
  • SNOWFLAKE

There are No Mistakes in Life, Only Lessons..!!!

December 26, 2022December 26, 2022

Best Practices in Cluster Provisioning and Management – An Overview

For creating a cluster, we want some idea about how the cluster should be created and cluster management. Here I am noting down some points for cluster creation and management process.

Platform Requirements

  • Cloudera distribution is a good option to create a Hadoop cluster since it got a well-structured repository and a well-defined documentation set(Advanced users may go for the builds from the Apache community).
  • Cloudera Manager is designed to make the administration of Hadoop simple and straightforward at any scale. With help of the Cloudera manager, you can easily deploy and centrally operate the complete Hadoop stack. The application automates the installation process, reducing deployment time from a week to minutes.
  • Centos is the best option as OS since it’s developed on RHEL architecture and supports all RHEL add-ones.
  • Yum install <packages> is a command that is used frequently for installing packages from a remote repository. Yum, the install will pick the repository URL from /etc/yum.repos.d, download the packages and install them on the machine. Normally yum will work on a machine having internet access. But if we want to install packages in an isolated environment, a normal yum install will not work, because the remote repository may not be accessible in an isolated environment. In that situation, we are creating a local yum repository.
  • It’s better to turn off the Graphical user experience in all the host machines, for efficient utilization of memory.
  • For each installation add required environmental variables in the /etc/bashrc file or /etc/profile for public access.
  • For updating environment variables from /etc/bashrc file or /etc/profile files use the ‘source’ command.

Required Services

  • Ensure sshd service is running in each node to make Secure Shell access active.
  • Ensure IPtables service is stopped.
  • Oracle jdk 1.6+ should be used (instead of open JDK) for JVM Process Status(JPS) which is used for displaying currently running Hadoop daemons.

Generic Points

  • For tarball or source build installations ‘/opt’ location is preferred.
  • Rebooting the Linux machine to change configurations is a bad practice and may negatively affect the overall cluster balance.
  • For network connection, issues restart network service other than rebooting the host machine.
Big data Big dataClouderaCloudera ManagerClusterhadoop

Post navigation

Previous post
Next post

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Hadoop on Windows using HDP
  • Best Practices in Cluster Provisioning and Management – An Overview
  • Snowflake Architecture
  • Analysis of Big data and Business Intelligence
  • What is Snowflake?
  • Introduction to Microsoft Azure
  • HDFS architecture
  • Introduction to Cloud Computing

Statistics

  • 4,814
©2023 | Copyright Reserved