Understanding Amazon Elasticsearch Service

Recently, I was involved in adding an Amazon Elasticsearch Service to our system. At the start, my colleague Ali Kamal and I have research and read through getting started to gather the information to understand the AWS service. And here I am to share those with you.

Amazon Elasticsearch Service

Amazon Elasticsearch Service (Amazon ES) is a managed service that makes it easy to deploy, operate, and scale Elasticsearch, a popular open-source search and analytics engine.

What Is Amazon Elasticsearch Service?

Especially, Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. Besides, it automatically stores the original document and adds a searchable reference to the document in the cluster’s index. Additionally, you can use Kibana to search and create visualizations. Amazon ES includes :

  • Numerous configurations of CPU, memory, and storage capacity, known as instance types
  • Up to 3 PB of attached storage
  • AWS Identity and Access Management (IAM) access control
  • Easy integration with Amazon VPC and VPC security groups
  • Integration with Popular Services in a secure way
  • Flexibility to integrate with business intelligence (BI) applications and custom packages
  • Stability, i.e. dedicated master nodes to offload cluster management tasks, automated snapshots to back up and restore Amazon ES domains
  • Cost-effective UltraWarm storage for read-only data

And many more

UltraWarm

AWS announces Amazon Elasticsearch Service UltraWarm general availability, posted on May 5, 2020. However, UltraWarm is the most cost-effective Elasticsearch-compatible storage solution available. It is also performance-optimized, so customers can investigate and interactively visualize their data while they embrace data at scale. Basically, you are using s3 instead of EBS storage

In addition, to use warm storage, domains must have dedicated master nodes. And also, If your domain uses a T2 instance type for your data nodes, you can’t use warm storage.

Scale up an ES domain

Before scaling up an Amazon ES domain, consider reducing the load on the domain. Otherwise use the following options

Securing Elasticsearch Service

Basically, You can allow your users to access the Amazon ES endpoints and Kibana publicly. If you want to keep it private, you have few options and they are added below

Securing Kibaka

Kibana can be secure by

  1. Configuring with Amazon Cognito authentication
  2. Using a proxy server
  3. Using an IP-based access policy 

Precisely, Amazon Cognito authentication for Kibana requires the following resources:

  • Cognito user pool
  • And also, Cognito identity pool (must be in the user pool region0
  • IAM role that has the AmazonESCognitoAccess policy attached (CognitoAccessForAmazonES)

On the other hand, You can use same cognito pools and role to secure multiple es domains.

Moreover, You can also place a proxy server between Kibana and Amazon ES. AWS recommends configuring the EC2 instance running the proxy server with an Elastic IP address. This way, you can replace the instance when necessary and still attach the same public IP address to it, to know more.

VPC options

Do we need to join Elasticsearch domain to a VPC Domain? In short, If we want to control Elasticsearch endpoint with IP-based policies or the associated security groups then VPC is not required. You can still control access for Kibana with Cognito.

In details, consider the following information

  • Firstly, when you create an ES domain with public access the endpoint is accessible from any internet-connected device, though you can control access to it.
  • Secondly, AWS recommends using access policies that specify IAM users or roles. Operating with an open access policy does not mean that anyone on the internet can access the Amazon ES domain. Rather, it means that if a request reaches the Amazon ES domain and the associated security groups permit it, the domain accepts the request without further security checks.
  • Finally, Placing an Amazon ES domain within a VPC enables secure communication between Amazon ES and other services within the VPC without the need for an internet gateway, NAT device, or VPN connection. Because of their logical isolation, domains that reside within a VPC have an extra layer of security when compared to domains that use public endpoints.

Policies

Furthermore, there are two possible ways we can control permission.

  1. Resource-based Policies: It is focused on which resources a user can access to with the ES domain. Moreover, we can restrict this to reads, writes, indexes etc.
  2. Identity-based Policies: You can specify who can access a service. So, which actions user can perform, and if applicable, the resources on which they can perform those actions.

Pricing

Specifically, AWS will charge you for Amazon Elasticsearch Service instance hours, Amazon EBS storage (if you choose this option), and data transfer. Thus, find the minimum viable instance pricing options below

Pricing optionInstance SizeHourly
On-Demandt2.micro.elasticsearch$0.018
On-Demandm5.large.elasticsearch$0.142
Reserved Instance (1 year term)m5.large.elasticsearch$0.098
Reserved Instance (3 year term)m5.large.elasticsearch$0.074
Ultrawarm Instanceultrawarm1.large.elasticsearch$0.238
Amazon Elasticsearch Service pricing As of 24 May, 2020

Instance Pricing

On-Demand Instances

On-demand elastic instance charges hourly based on the size of the instance and storage of indexes. Moreover, this is EBS only storage.

Reserved Instances

The smallest instance available is the m5.large.elasticsearch therefore this is only viable for production use

Ultrawarm Instances

UltraWarm lets you economically retain large With UltraWarm storage, you pay an hourly rate for each UltraWarm node for what you use. Further, an ultrawarm1.large.elasticsearch instance can address up to 20 TiB of storage on S3, but if you store only 1 TiB of data, you’re only billed for 1 TiB of data.

Spot Instances

With Spot Instances, you pay the Spot price that’s in effect for the time period your instances are running. Above all, Spot Instance prices are set by Amazon EC2 and adjust gradually based on long-term trends in supply and demand for Spot Instance capacity.

EBS volume pricing

You can use instance default as storage for r3 or i3 instance type, but for smaller instances, you need to select EBS storage for data nodes

  • General Purpose (SSD) Storage – $0.135 per GB / month
  • Magnetic Storage – $0.067 per GB / month

Data transfer charges

You may find details about data transfer charges here. However, I have listed some interesting info below

  • You need to pay standard AWS data transfer charges for the data transferred in and out of Amazon Elasticsearch Service.
  • Amazon ES does not bill for traffic between the Availability Zones
  • Amazon ES neither meters nor bills for sharding allocation and re-balancing traffic
  • To clarify, you will not be charged for the data transfer between nodes within your Amazon ES domain.

Sizing Amazon ES Domains

In fact, Its all depends on your need. You may not need the Dedicated master nodes at the earlier stage. In the beginning, it is possible to start with minimum viable options and scale up later.

Minimum Requirement

For instance, please find the default options for Development and Testing deployment

  • Number of nodes: 1
  • Instance type : t2.small
  • EBS volume type: General Purpose (SSD)
  • EBS storage size per node*: 10
  • One availability zone

For example, If you are using an on-demand t2.small.elasticsearch instance you will be charged $183.96 per annum for running 14 hours and $2.025 for storage, so in total it could be $185.98.

For Production

The minimum possibly viable deployment for production workloads is:

  1. No dedicated master instances
  2. Two-zone replication, with M5.large nodes
  3. Using one replica for your primaries
  4. Storage as needed, maximum 512 GB, GP2 EBS volumes for data nodes

Jon Handler mentions said “This deployment supports the recommended 400 GB of source data and the same hundreds to thousands of requests per second at a much lower, on-demand cost of $350/month. That’s an 81% reduction in cost. If you deploy smaller than the maximum, 512 GB EBS volumes, you proportionally reduce the Amazon EBS cost component of $207 per month” compared with all the best practice recommendations for ES deployment

However, nothing is stopping you to start with the same options as test environemt

Practices recommended by AWS

In conclusion, i am adding some of the best practices recommended by AWS

  1. Configure at least one replica, the Elasticsearch default, for each index.
  2. Use three dedicated master nodes.
  3. Deploy the domain across three Availability Zones. This configuration lets Amazon ES distribute replica shards to different Availability Zones than their corresponding primary shards. For a list of Regions that have three Availability Zones and some other considerations, see Configuring a Multi-AZ Domain.
  4. Upgrade to the latest Elasticsearch versions as they become available on Amazon Elasticsearch Service.
  5. Update to the latest service software as it becomes available.
  6. If appropriate for your network configuration, create the domain within a VPC.
  7. If your domain stores sensitive data, enable encryption of data at rest and node-to-node encryption.
  8. Index State Management (ISM) lets you define custom management policies to automate routine tasks and apply them to indices and index patterns. You no longer need to set up and manage external processes to run your index operations.

You can find the full list here

Hope this may help you to start with Amazon Elastic search. Please feel free to add your thoughts in the comments below.

I would like to hear your thoughts