In this article, we sought answers to the questions of what should be the shard and disk size for our index and what are the configuration settings to check in Elasticsearch
Elasticsearch Usage with Apinizer – Part:1
Stage 1: Couchbase & InfluxDb & Elasticsearch, which one is the right decision for us?
Stage 2: What is the right Elasticsearch setting for us?
Elasticsearch Usage with Apinizer – Part:2
Stage 3: Performance Tests
Elasticsearch Usage with Apinizer – Part:3 (this article)
Stage 4: Shard and Disk Size
Stage 5: Crucial Checks
The structure of our document is the same as the stage3Condition1 index in Stage3 from previous post. The size of the index where 15 million documents will be kept per day according to customer needs is ~33.1GB. First, let’s look at the operations where the size and number of the shard is effective;
Based on the rules and tips in this link, it is said that the use of shards of 20GB-40GB size for time-based data is common. The other suggestion is to keep the number of shards in the node proportional to the amount of heap. This means that when shards below 20 GB are set for each GB, 600 shards can be set for a node with a 30 GB heap. Based on this solution, we calculated that 1 index with 1 primary shard, which holds 15 million daily data, occupies 33 GB of space. Annually 365 shards are formed and it takes 12TB (365*33GB) space on the disk. 365 shards to be formed annually complies with this less-than-600-shards rule. This gives us an elasticsearch node with 64GB system memory per year, as 1 unit of 30 GB node memory. So we decided that 1 elasticsearch node with an annual 64GB system memory and 30GB heap meets our needs. 👍
It is a valid setting for Linux and macOS operating systems. Elasticsearch uses a large number of File descriptors. The recommended value is 65,536. This is how we set it up.
# As root before starting Elasticsearch
ulimit -n 65536
# or you can use the command below to take effects permanently:
sudo vim /etc/security/limits.conf -nofile 65536
# Max file descriptor control command:
GET _nodes/stats/process?filter_path=**.max_file_descriptors
Swapping is a technique where the operating system manages memory and processor. The operating system can also enable swap memory so that the process can continue when the memory is full. This causes the memory to be divided into small pieces called pages by the VM (virtual memory). The page or swap file is stored on the hard disk, and the address of the page is kept in memory. In this way, the page allows the application to store excess data that cannot be stored in memory because the size of the memory is limited.
When Elasticsearch’s JVM makes a major garbage collection, it looks at every page in the heap. If one of the pages is taken from the disk, it must be taken back to the memory. This causes the disk to work harder. There are many ways to turn off swapping. One of them is set via Elasticsearch. If the memory lock is enabled, it should be checked whether the JVM has applied successfully.
# temporarily turning swapping off
sudo swapoff -a
# /etc/fstab file is created to disable swapping permanently.
# and the comment line starting with swap should be deleted.
# While doing this, make sure that the server has enough RAM to run the programs.
# /swapfile none swap sw 0 0
# The following feature is added to the /config/elasticsearch.yml file.
# Its default value is false.
bootstrap_memory_lock:true
# To check the memory lock;
GET _nodes?filter_path=**.mlockall
The index is stored according to the mmapfs (memory mapping file system) file system. It is the file system used for incoming storage by default in mmapfs. This value must be set to avoid an Out of Memory error. This is how we set it up. 😇
# Run the following command as root to increase this limit on Linux.
sysctl -w vm.max_map_count=262144
# Or 'vm.max_map_count' property can be updated from '/etc/sysctl.conf' file.
Based on this link, a good rule of thumb is to have at least 50% of RAM be used by the operating system for caching. Linux is more susceptible to caching file descriptors heavily. Elasticsearch also uses the disk a lot, especially during indexing.
# Memory usage can be controlled from the command line
free -m
If the heap value is below normal, this causes memory errors and the like. If it is given too much, it may cause the garbage collector to work too much, as it will be used for caching. In this case, it degrades the performance of the cluster by putting pressure on the garbage collector heap. A safe margin value can be left to deal with these scenarios.
It is recommended that half of the memory size in the current system should be given to the heap size of the Elasticsearch JVM and this value should not exceed 32 GB.
# Minimum and maximum size assigned to heap via /config/jvm.options file: 30GB
-Xms30g (Heap's starting size)
-Xmx30g (Heap's maximum size)
The CPU core number detected by Elasticsearch may be incorrect. To check this, the following request should be made.
# Request made on Kibana to control processor number:
GET _nodes/os
The values of “available_processors” and “allocated_processors” are checked. The node with 4 CPU cores is seen as 1 on the cluster. To fix this, the following field should be added to the elasticsearch.yml file and the node should be restarted.
node.processors: 4
Assume that the cluster has 5 nodes, the number of CPU cores of each node is 6 and the number of processors detected on Elasticsearch is 1. After the Processors value is set to 6, the cluster will work with 30 CPU cores instead of 5. Setting the number of processors right gives a huge performance boost.
It is another control that has the potential to cause performance issues as the data volume grows and is important to manage. When the heap memory is full, GC-related settings are made to clean it at certain intervals.
In the research conducted here, it was seen that after the upgrade, the CPU usage and heap usage of machines with 48 CPU cores increased to 80%-90%. As a result of the evaluations, the ratio between the young and the old pool should be 1:2 (33% young/66% old) or 1:3 in general, but this ratio was found to be 1:14 in their systems. Below are the settings made with GC to overcome this situation. We have based this configuration as this event happened when used with the previous version of Elasticsearch.
# This setting is made via /config/jvm.options file.
# In general, if these values are not explicitly specified,
# the JVM creates them dynamically when Elasticsearch is started.
# There may be errors when it is dynamically created.
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 (GC will run when the heap reaches 75% occupancy)
-XX:+UseCMSInitiatingOccupancyOnly
-XX:ParallelGCThreads=8 (Number of processors)
-XX:NewRatio=2 (young pool and old pool proportions are made. This value indicates a 1:2 ratio.)
# or this command can be executed:
java -Xms30G -Xmx30G -XX:ParallelGCThreads=48 -XX:NewRatio=2 -XX:+UseConcMarkSweepGC
-XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | egrep
-i "(NewSize | OldSize | NewRatio | ParallelGCThreads)"
uintx NewRatio := 2 {product}
uintx NewSize := 11095310336 {product}
uintx OldSize := 22190686208 {product}
uintx ParallelGCThreads := 48
Input your search keywords and press Enter.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.