EMR helps in creating managed hadoop clusters. Creating EMR cluster using Insisive Cloud is a two step process. Firstly, you have to create a Launch Template within Insisive Cloud. Once you create a Launch Template, you can easily create as many EMR Clusters as needed with a few clicks and gain up to 80% savings.
Launch Templates
Navigate to Launch Templates using 'Hadoop' -> 'Launch Templates'.
Selecting Region(s)
The region defaults to 'US East(N. Virginia)'. This can be changed using the Filter icon next to the region name. You can choose a specific region or hit 'Clear' to clear region. Once cleared, it will display launch templates for all regions. Initially, there will be no Launch Templates. Create a Launch Template using 'Create Config Template' button. There is a refresh button to fetch the list of latest Launch Templates. You can also toggle the view between Grid View and List View.
Intended Use
You can use Launch Templates to create pre set configurations e.g. Dev , Test, API server etc. You can then create a EMR cluster using the templates quickly with a few clicks.
Create Launch Template
Enter a name for the Launch Template
Select AWS Region using the drop down. Once the region is selected, wait for corresponding details to be loaded
Select Application Context from the drop down
Select usage duration of the cluster. Upto 50% of the jobs are completed within 6 hours. If that is the case with you, select the duration as ' Less than 6 hours'. Based on application context,this can result in guarenteed uptime with slightly higher costs than spot but less than on demand for 'Maximize Availability' option .
Select 'Availability Zone'. EMR usually runs in only one AZ but based on provisioning , in some cases, EMR may be provisioned across multiple AZ. Choosing multiple AZ will give greater flexibility in this case.
Select Security Groups
Select Availability Zones corresponding to the region selected. The corresponding subnet for the default vpc will be selected automatically.
9.Select Job Flow role and Service to be used while launching EMR cluster. * Selecting IAM Role is considered a best practice from security perspective *Add Tags that are relevant to the launch template.These will be applied to all the EMR clusters created using this Launch Template.
11. Select Master Node instance type, Core Node Instance Type and Task Node instance types. Add more task groups if needed. 12.Select the volume attributes based on the ImageID selected.
- Especially note the size of root volume and Device ID
- These are based on AMI size and if AMI is HVM / PV. The Device ID for some AMIs is /dev/xvda and some AMIs it is /dev/sd1. Choose the appropriate device ID name based on the AMI .
- It is considered a best practice to keep the root volume of small size and add additional EBS volumes with larger size. This will also help in creating AMIs quickly in the event of spot termination 13.Add Additional EBS volumes as per requirement 14.Select BootStrap actions that need to be executed on the nodes when they are created - if applicable. Go to next screen for adding steps to the EMR Clusters
- Add custom step to the EMR cluster as needed
16. Select appropriate steps to add to the cluster
For Spark Programs
For Hive Programs
For Custom JARs
17.Save the template.
18. If the template is created for the same region that is selected in the Launch Template Overview page, it will appear on the page. Else, you can change the region ( or clear the region ) to view the Launch Templates from all regions.
Edit/Delete Launch Template
You can Edit / Delete Launch Template using the '...' in the grid mode or using the Edit / Delete icons in the list mode.
Create EMR using Launch Template
In the Grid View mode, you can launch an Auto Scaling group directly using the Launch Template using the 'Create Auto Scaling Group' menu option of the menu group - '...' for the launch template.
Create EMR Cluster
A new Cluster can be created using 'Create Auto Scaling Group' button. You can select the region for the EMR Cluster. The 'Application Context' plays an important part in choosing the price and availability charatectistcs of the Auto Scaling Group
Region
Select the region where the EMR cluster is to be created. This will populate other relevant drop downs like Launch Template,Availability Zones, EMR Release, Key Name.
EMR Application
Selecting an EMR Release will populate EMR Applications drop down. Choose relevant EMR Applications from the EMR Release.
Custom Image ID
Choose the tick box if you wish to deploy a custom image
S3 folder location
Enter the S3 location where you would like the logs of EMR to be stored - if the 'Logging to S3 Folder' option is selected.
Application Context
Application Context provides indication about the criticality of the cluster and whether it should be optimized for availability for Cost / Availabilty. Currently it supports the following options
Minimum Price : Choose this option for non critical or short duration clusters. This indicates that the cluster can be prioritized for the using the lowest price among the selected instance type(s) and there is no minimum number of instances provisioned using On-Demand Instances. The instance life cycle management applies to this so as it monitor and replace instances if there are any interruptions. e.g. Dev or Test clusters that do not have critical data written at a frequency of greater than once every few seconds. ASG for Staging to verify or reproduce issues with production settings for a short duration and spin it down once the work is completed.
Optimum Price - Uptime : Choose this option for applications that need optimal mix of cost and uptime. This is the default option. This provisions a minumum number of instances in OnDemand at all times and the rest as spot/ondemand based on availability. The algorithm prefers higher available instance type(s) over price until a certain percentage. The instance life cycle management applies to this so as it monitor and replace instances if there are any interruptions. e.g. Long running Dev/Test/Staging Clusters for all types of workloads. Choosing this for Production clusters is suitable for stateless workloads. It is not suitable for production clusters if the stateless requests typically take 120 seconds or more.
Maximum Availability : Choose this option if you want to prioritize availability and reduce frequency of interruptions even if cost savings are less than optimal. This provisions a higher number of ondemand instances( ** but not all instances are OnDemand Instances**) and prefers instance types with higher availability even if they are more expensive. This is suitable for all production workloads that are stateless / content driven. Not suitable for workloads which have high frequency disk persisted storage or workloads that take stateless actions that go beyond 120 seconds.
EMR Release
Choose the EMR version.
Secondary Details
The Secondary details page captures additional details on which launch template and other information
Load Balancer Type
Choose the application Load Balancer or Target Group that is routing the requests to this ASG(s).
Note : Choosing Optimum Price - Availability or Maximum Availability will create two or more ASGs. For this, you would need a load balancer or a Target group to serve the requests to multiple ASGs.
Health Check Interval
Default time for health check of an instance within an Auto Scaling Group.
Collect Metrics
This will enable collecting CloudWatch metrics at 1-minute intervals instead of the default 5 minute interval. Note: There will be an additional charge from AWS for enhanced cloudwatch metrics collection. So enable it only if you need to gather metrics at 1-minute intervals
Tags
Add any optional tags for better tracking. A few tags are added by default. This includes the Application name, Organisation ID, User Name. This will help to categorize costs per user / Org ID/ Application Name for better cost visibility ( once we enable the tags in Cost Allocation Tags).
Instance Types
Choose the Instance Types for Master Node, Core Node and Tertiary Nodes. Optionally, add additional Task nodes.
Volume and Bootstrap actions
Add custom volumes an bootstrap actions as needed.
Custom Steps
Insisive cloud supports 'Custom Jar' , 'Hive Program' , 'Spark Program' , 'Streaming Program' as custom steps.
Preview
Hitting 'Show Preview' navigates to Preview page that shows the preview of information that will be used to provision the EMR with the main elements . . Based on the application context and instance types selected, please wait for the clusters to be created and all the instances to be provisioned.