Recently, I was involved in adding an Amazon Elasticsearch Service to our system. At the start, my colleague Ali Kamal and I have research and read through getting started to gather the information to understand the AWS service. And here I am to share those with you.
Amazon Elasticsearch Service
Amazon Elasticsearch Service (Amazon ES) is a managed service that makes it easy to deploy, operate, and scale Elasticsearch, a popular open-source search and analytics engine.What Is Amazon Elasticsearch Service?
Especially, Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. Besides, it automatically stores the original document and adds a searchable reference to the document in the cluster’s index. Additionally, you can use Kibana to search and create visualizations. Amazon ES includes :
- Numerous configurations of CPU, memory, and storage capacity, known as instance types
- Up to 3 PB of attached storage
- AWS Identity and Access Management (IAM) access control
- Easy integration with Amazon VPC and VPC security groups
- Integration with Popular Services in a secure way
- Flexibility to integrate with business intelligence (BI) applications and custom packages
- Stability, i.e. dedicated master nodes to offload cluster management tasks, automated snapshots to back up and restore Amazon ES domains
- Cost-effective UltraWarm storage for read-only data
And many more
AWS announces Amazon Elasticsearch Service UltraWarm general availability, posted on May 5, 2020. However, UltraWarm is the most cost-effective Elasticsearch-compatible storage solution available. It is also performance-optimized, so customers can investigate and interactively visualize their data while they embrace data at scale. Basically, you are using s3 instead of EBS storage
In addition, to use warm storage, domains must have dedicated master nodes. And also, If your domain uses a T2 instance type for your data nodes, you can’t use warm storage.
Scale up an ES domain
Before scaling up an Amazon ES domain, consider reducing the load on the domain. Otherwise use the following options
- Increase EBS volumes’ size or add nodes
- Switch to a larger instance type or add more nodes
- Upgrading Elasticsearch
Securing Elasticsearch Service
Basically, You can allow your users to access the Amazon ES endpoints and Kibana publicly. If you want to keep it private, you have few options and they are added below
Kibana can be secure by
Precisely, Amazon Cognito authentication for Kibana requires the following resources:
- Cognito user pool
- And also, Cognito identity pool (must be in the user pool region0
- IAM role that has the AmazonESCognitoAccess policy attached (
On the other hand, You can use same cognito pools and role to secure multiple es domains.
Moreover, You can also place a proxy server between Kibana and Amazon ES. AWS recommends configuring the EC2 instance running the proxy server with an Elastic IP address. This way, you can replace the instance when necessary and still attach the same public IP address to it, to know more.
Do we need to join Elasticsearch domain to a VPC Domain? In short, If we want to control Elasticsearch endpoint with IP-based policies or the associated security groups then VPC is not required. You can still control access for Kibana with Cognito.
In details, consider the following information
- Firstly, when you create an ES domain with public access the endpoint is accessible from any internet-connected device, though you can control access to it.
- Secondly, AWS recommends using access policies that specify IAM users or roles. Operating with an open access policy does not mean that anyone on the internet can access the Amazon ES domain. Rather, it means that if a request reaches the Amazon ES domain and the associated security groups permit it, the domain accepts the request without further security checks.
- Finally, Placing an Amazon ES domain within a VPC enables secure communication between Amazon ES and other services within the VPC without the need for an internet gateway, NAT device, or VPN connection. Because of their logical isolation, domains that reside within a VPC have an extra layer of security when compared to domains that use public endpoints.
Furthermore, there are two possible ways we can control permission.
- Resource-based Policies: It is focused on which resources a user can access to with the ES domain. Moreover, we can restrict this to reads, writes, indexes etc.
- Identity-based Policies: You can specify who can access a service. So, which actions user can perform, and if applicable, the resources on which they can perform those actions.
Specifically, AWS will charge you for Amazon Elasticsearch Service instance hours, Amazon EBS storage (if you choose this option), and data transfer. Thus, find the minimum viable instance pricing options below
|Pricing option||Instance Size||Hourly|
|Reserved Instance (1 year term)||m5.large.elasticsearch||$0.098|
|Reserved Instance (3 year term)||m5.large.elasticsearch||$0.074|
On-demand elastic instance charges hourly based on the size of the instance and storage of indexes. Moreover, this is EBS only storage.
The smallest instance available is the
m5.large.elasticsearch therefore this is only viable for production use
UltraWarm lets you economically retain large With UltraWarm storage, you pay an hourly rate for each UltraWarm node for what you use. Further, an
ultrawarm1.large.elasticsearch instance can address up to 20 TiB of storage on S3, but if you store only 1 TiB of data, you’re only billed for 1 TiB of data.
With Spot Instances, you pay the Spot price that’s in effect for the time period your instances are running. Above all, Spot Instance prices are set by Amazon EC2 and adjust gradually based on long-term trends in supply and demand for Spot Instance capacity.
EBS volume pricing
You can use instance default as storage for r3 or i3 instance type, but for smaller instances, you need to select EBS storage for data nodes
- General Purpose (SSD) Storage – $0.135 per GB / month
- Magnetic Storage – $0.067 per GB / month
Data transfer charges
You may find details about data transfer charges here. However, I have listed some interesting info below
- You need to pay standard AWS data transfer charges for the data transferred in and out of Amazon Elasticsearch Service.
- Amazon ES does not bill for traffic between the Availability Zones
- Amazon ES neither meters nor bills for sharding allocation and re-balancing traffic
- To clarify, you will not be charged for the data transfer between nodes within your Amazon ES domain.
Sizing Amazon ES Domains
In fact, Its all depends on your need. You may not need the Dedicated master nodes at the earlier stage. In the beginning, it is possible to start with minimum viable options and scale up later.
For instance, please find the default options for Development and Testing deployment
- Number of nodes: 1
- Instance type : t2.small
- EBS volume type: General Purpose (SSD)
- EBS storage size per node*: 10
- One availability zone
For example, If you are using an on-demand
t2.small.elasticsearch instance you will be charged $183.96 per annum for running 14 hours and $2.025 for storage, so in total it could be $185.98.
The minimum possibly viable deployment for production workloads is:
- No dedicated master instances
- Two-zone replication, with M5.large nodes
- Using one replica for your primaries
- Storage as needed, maximum 512 GB, GP2 EBS volumes for data nodes
Jon Handler mentions said “This deployment supports the recommended 400 GB of source data and the same hundreds to thousands of requests per second at a much lower, on-demand cost of $350/month. That’s an 81% reduction in cost. If you deploy smaller than the maximum, 512 GB EBS volumes, you proportionally reduce the Amazon EBS cost component of $207 per month” compared with all the best practice recommendations for ES deployment
However, nothing is stopping you to start with the same options as test environemt
Practices recommended by AWS
In conclusion, i am adding some of the best practices recommended by AWS
- Configure at least one replica, the Elasticsearch default, for each index.
- Use three dedicated master nodes.
- Deploy the domain across three Availability Zones. This configuration lets Amazon ES distribute replica shards to different Availability Zones than their corresponding primary shards. For a list of Regions that have three Availability Zones and some other considerations, see Configuring a Multi-AZ Domain.
- Upgrade to the latest Elasticsearch versions as they become available on Amazon Elasticsearch Service.
- Update to the latest service software as it becomes available.
- If appropriate for your network configuration, create the domain within a VPC.
- If your domain stores sensitive data, enable encryption of data at rest and node-to-node encryption.
- Index State Management (ISM) lets you define custom management policies to automate routine tasks and apply them to indices and index patterns. You no longer need to set up and manage external processes to run your index operations.
You can find the full list here
Hope this may help you to start with Amazon Elastic search. Please feel free to add your thoughts in the comments below.