We will deal with VMware vSphere uses transparent page sharing (TPS), memory compression, host swapping and ballooning.
VMware ESXi, a crucial component of VMware vSphere 5.0, is a hypervisor designed to efficiently manage hardware resources including CPU, memory, storage, and network among multiple, concurrent virtual machines. In this article I will describes the basic memory management concepts in VMware ESXi and describe the performance impact of these options.
ESXi uses several innovative techniques to reclaim virtual machine memory, which are:
- Transparent page sharing (TPS)—reclaims memory by removing redundant pages with identical content;
- Ballooning—reclaims memory by artificially increasing the memory pressure inside the guest;
- Hypervisor swapping—reclaims memory by having ESXi directly swap out the virtual machine’s memory;
- Memory compression—reclaims memory by compressing the pages that need to be swapped out.
So how does it work.
Transparent Page Sharing (TPS)Running multiple virtual machines on a single piece of hardware results in identical sets of memory pages. The amount of identical pages is influenced by the number of virtual machines and the (lack of) variation of operating systems. The identical memory pages enable VMware to implement memory sharing across virtual machines. Page sharing enables the hypervisor to reclaim redundant page copies and keep only one copy, which is shared by multiple virtual machines in the host physical memory. This results in a much lower host memory consumption and a high level of memory overcommitment.
TPS is a default ESXi feature which runs regardless of the amount of used physical memory. TPS is turned on by default, you can only disable it by modifying the ESXi advanced settings but I would strongly advise you not to do that. TPS can save you up to 70% (VDI environments with many identical operation systems), space which you can use to increase your consolidation ratio.
TPS is a memory management technique which is transparent for the virtual machine and it includes no performance penalty.
BallooningBallooning is a completely different memory management technique compared to TPS. Ballooning is used to reclaim memory from virtual machines in case of a host’s memory shortage.
Due to the virtual machine’s isolation, the guest operating system is not aware that it is running inside a virtual machine and is not aware of the states of other virtual machines on the same host. When the hypervisor runs multiple virtual machines and the total amount of the free host memory becomes low, none of the virtual machines will free guest physical memory because the guest operating system cannot detect the host’s memory shortage. Ballooning makes the guest operating system aware of the low memory status of the host.
VMware ESXi uses the ballooning driver, which is included in the VMware Tools, to enable ballooning. This driver has no external interfaces to the guest operating system and only communicates with the hypervisor through a private channel through which it polls the hypervisor to obtain a target balloon size to reclaim memory. As a result, the hypervisor offloads some of its memory overload to the guest operating system while slightly loading the virtual machine. That is, the hypervisor transfers the memory pressure from the host to the virtual machine. Ballooning induces guest memory pressure. In response, the balloon driver allocates and pins guest physical memory. The guest operating system determines if it needs to page out guest physical memory to satisfy the balloon driver’s allocation requests. If the virtual machine has plenty of free guest physical memory, inflating the balloon will induce no paging and will not impact guest performance.
So if a ESXi host runs into a memory shortage, it requests the virtual machines to free up virtual memory which in case results in reclaimed physical memory. Virtual machines will be asked to free up memory which can be used by virtual machines requesting additional memory.
Compression
When memory reclamation/ballooning does not have the desired effect, ESXi uses the next memory management technique in the chain, memory compression. Memory compression moves memory pages to a separate cache which is located in the host’s main memory. ESXi determines if a page can be compressed by checking the compression ratio for the page. Memory compression occurs when the page’s compression ratio is greater than 50%. Otherwise, memory compression has no added value and the page is swapped out. Only pages that would otherwise be swapped out to disk are chosen as candidates for memory compression.
When memory reclamation/ballooning does not have the desired effect, ESXi uses the next memory management technique in the chain, memory compression. Memory compression moves memory pages to a separate cache which is located in the host’s main memory. ESXi determines if a page can be compressed by checking the compression ratio for the page. Memory compression occurs when the page’s compression ratio is greater than 50%. Otherwise, memory compression has no added value and the page is swapped out. Only pages that would otherwise be swapped out to disk are chosen as candidates for memory compression.
Memory compression only occurs when there’s a host memory shortage and ballooning has not achieved the desired effect. ESXi will not proactively compress memory pages when host memory is undercommitted.
Memory compression is somewhat comparable to swapping but instead of moving memory pages to disk, memory page are moved to a reserved memory location. Because memory access times are much faster than disk access times, memory compression outperforms host swapping.
Memory compression is turned on by default, you can only disable it by modifying the ESXi advanced settings but I would strongly advise you not to do that
Swapping
When transparent page sharing, ballooning and memory compression do not have the desired effect, ESXi uses it’s last resort, hypervisor swapping. Hypervisor swapping moves the a guest’s memory pages to a virtual machine based swap file (.vswp), which frees host physical memory for other virtual machines.
When transparent page sharing, ballooning and memory compression do not have the desired effect, ESXi uses it’s last resort, hypervisor swapping. Hypervisor swapping moves the a guest’s memory pages to a virtual machine based swap file (.vswp), which frees host physical memory for other virtual machines.
Both page sharing and ballooning take time to reclaim memory. The page-sharing speed depends on the page scan rate and the sharing opportunity. Ballooning speed relies on the guest operating system’s response time for memory allocation. Hypervisor swapping is a guaranteed technique to reclaim a specific amount of memory within a specific amount of time. However, hypervisor swapping is used as a last resort to reclaim memory from the virtual machine because it has a huge performance impact.
Host free memory states
ESXi maintains four host free memory states: high, soft, hard, and low, which are reflected by four thresholds. The threshold values are calculated based on host memory size. The figure below shows how the host free memory state is reported in ESXTOP. The ‘minfree‘ value represents the threshold for the high state. By default, ESXi enables page sharing since it opportunistically reclaims host memory with little overhead. When to use ballooning or swapping (which activates memory compression) to reclaim host memory is largely determined by the current host free memory state.
ESXi maintains four host free memory states: high, soft, hard, and low, which are reflected by four thresholds. The threshold values are calculated based on host memory size. The figure below shows how the host free memory state is reported in ESXTOP. The ‘minfree‘ value represents the threshold for the high state. By default, ESXi enables page sharing since it opportunistically reclaims host memory with little overhead. When to use ballooning or swapping (which activates memory compression) to reclaim host memory is largely determined by the current host free memory state.
In the high state, the aggregate virtual machine guest memory usage is smaller than the host memory size. Whether or not host memory is overcommitted, the hypervisor will not reclaim memory through ballooning or swapping unless the virtual machine memory limit is set.
If host free memory drops towards the soft threshold, the hypervisor starts to reclaim memory using ballooning. Ballooning happens before free memory actually reaches the soft threshold because it takes time for the balloon driver to allocate and pin guest physical memory. Usually, the balloon driver is able to reclaim memory in a timely fashion so that the host free memory stays above the soft threshold.
If ballooning is not sufficient to reclaim memory or the host free memory drops towards the hard threshold, the hypervisor starts to use swapping in addition to using ballooning. During swapping, memory compression is activated as well. With host swapping and memory compression, the hypervisor should be able to quickly reclaim memory and bring the host memory state back to the soft state.
In a rare case where host free memory drops below the low threshold, the hypervisor continues to reclaim memory through swapping and memory compression, and additionally blocks the execution of all virtual machines that consume more memory than their target memory allocations.
If host free memory drops towards the soft threshold, the hypervisor starts to reclaim memory using ballooning. Ballooning happens before free memory actually reaches the soft threshold because it takes time for the balloon driver to allocate and pin guest physical memory. Usually, the balloon driver is able to reclaim memory in a timely fashion so that the host free memory stays above the soft threshold.
If ballooning is not sufficient to reclaim memory or the host free memory drops towards the hard threshold, the hypervisor starts to use swapping in addition to using ballooning. During swapping, memory compression is activated as well. With host swapping and memory compression, the hypervisor should be able to quickly reclaim memory and bring the host memory state back to the soft state.
In a rare case where host free memory drops below the low threshold, the hypervisor continues to reclaim memory through swapping and memory compression, and additionally blocks the execution of all virtual machines that consume more memory than their target memory allocations.
In certain scenarios, host memory reclamation happens regardless of the current host free memory state. For example, even if host free memory is in the high state, memory reclamation is still mandatory when a virtual machine’s memory usage exceeds its specified memory limit. If this happens, the hypervisor will employ ballooning and, if necessary, swapping and memory compression to reclaim memory from the virtual machine until the virtual machine’s host memory usage falls back to its specified limit.
If you want an in depth explanation of ESXTOP and it’s counters, read this great article from Duncan Epping.
So let’s recap: Transparent page sharing is a default ESXi feature which deduplicates identical memory pages to reclaim physical memory and runs regardless of the amount of physical memory used. When a ESXi host faces a memory shortage, it has a few tricks up it’s sleeve to cope with this situation. First ESXi will requests virtual machines to free up virtual memory by using ballooning which in case results in reclaimed physical memory. If that does not work, ESXi defaults to memory compression. Memory compression moves memory pages to a separate cache which is located in the host’s main memory and compresses the memory pages. When all this does not have the desired effect, ESXi is left with one last resort, hypervisor swapping which moves unused memory pages to disk.
Best pratices
Although ESXi uses several innovative techniques to manage memory usage and reclaim memory, there are still VMware admins who think they know better and start disabling ballooning and compression without knowing why and what the effect is. True, a few years ago there was a best practice which stated that you should disable or uninstall the ballooning driver with eg. virtualized Citrix servers.But that is history know.
Based on the memory management concepts and performance test VMware has the following best practices for host and guest memory usage:
- Do not disable page sharing or the balloon driver. Page sharing is a lightweight technique which opportunistically reclaims redundant host memory with trivial performance impact. In the cases where hosts are heavily overcommitted, using ballooning is generally more efficient and safer than using hypervisor swapping, based on the results presented in “Ballooning vs. Host Swapping” on page 19. These two techniques are enabled by default and should not be disabled unless application testing shows that the benefits of doing so clearly outweigh the costs;
- Carefully specify memory limits and reservations. The virtual machine memory allocation target is subject to the virtual machine’s memory limit and reservation. If these two parameters are misconfigured, users may observe ballooning or swapping even when the host has plenty of free memory. For example, a virtual machine’s memory may be reclaimed when the specified limit is too small or when other virtual machines reserve too much host memory, even though they may only use a small portion of the reserved memory. If a performance-critical virtual machine needs a guaranteed memory allocation, the reservation needs to be specified carefully because it may impact other virtual machines;
- Host memory size should be larger than guest memory usage. For example, it is unwise to run a virtual machine with a 2GB working set size in a host with only 1GB of host memory. If this is the case, the hypervisor has to reclaim the virtual machine’s active memory through ballooning or hypervisor swapping, which will lead to potentially serious virtual machine performance degradation. Although it is difficult to tell whether the host memory is large enough to hold all of the virtual machines’ working sets, the bottom line is that the host memory should not be excessively overcommitted because this state makes the guests continuously page out guest physical memory;
- Use shares to adjust relative priorities when memory is overcommitted. If the host’s memory is overcommitted and the virtual machine’s allocated host memory is too small to achieve a reasonable performance, adjust the virtual machine’s shares to escalate the relative priority of the virtual machine so that the hypervisor will allocate more host memory for that virtual machine;
- Set an appropriate virtual machine memory size. The virtual machine memory size should be slightly larger than the average guest memory usage. The extra memory will accommodate workload spikes in the virtual machine. Note that the guest operating system only recognizes the specified virtual machine memory size. If the virtual machine memory size is too small, guest-level paging is inevitable, even though the host might have plenty of free memory. If the virtual machine memory size is set to a very large value, virtual machine performance will be fine, but more virtual machine memory means that more overhead memory needs to be reserved for the virtual machine.
VMware also released a great ‘VMware vSphere 5 Memory Management and Monitoring diagram‘ which provides a comprehensive look into the ESXi memory management mechanisms and reclamation methods. This diagram also provides the relevant monitoring components in vCenter Server and the troubleshooting tools like ESXTOP.
==================================================================================
Memory reclamation, when and how?
After discussing with Duncan the performance problem presented by @heiner_hardt , we discussed the exact moment the VMkernel decides which reclamation technique it will use and specific behaviors of the reclamation techniques. This article supplements Duncan'sarticle on Yellow-bricks.com.
Now let's begin with when the kernel decides to reclaim memory and see how the kernel reclaims memory. So host physical memory is reclaimed based on four "free memory states", each with a corresponding threshold. Based on the Threshold, the VMkernel chooses which reclamation technique it will use to reclaim memory from virtual machines.
Free Memory state | Threshold | Reclamation technique |
High | 6% | None |
Soft | 4% | Ballooning |
Hard | 2% | Ballooning and Swapping |
Low | 1% | Swapping |
The high memory state has a threshold hold of 6%, that means that 6% of the ESX host physical memory minus the service console memory must be free. When the virtual machines use less than 94% of the host physical memory, the VMkernel will not reclaim memory because there is no need to, but when the memory usage starts to fall towards the free memory threshold the VMkernel will try to balloon memory. The VMkernel selects the virtual machines with the largest amounts of idle memory (detected by the idle memory tax process) and will ask the virtual machine to select it's idle memory pages. Now to do this the guest os needs to swap those pages, so if the guest is not configured with sufficient swap space, ballooning can become problematic. Linux behaves pretty worse in this situation, invoking OOM (out-of memory) killer when its swap space is full and starts to randomly kill processes.
Back to the VMkernel, in the High and Soft state, ballooning if favored over swapping. If it ESX server cannot reclaim memory by ballooning in time before it reaches the Hard state, the ESX turns to swapping. Swapping has proven to be a sure thing within a limited amount of time. Opposite of the balloon driver, which tries to understand the needs of the virtual machine let the guest decides whether and what to swap, the swap mechanism just brutally picks pages at random from the virtual machine, this impacts the performance of the virtual machine but will help the VMkernel to survive.
Now the fun thing is, before the VMkernel detects the free memory is reaching the soft threshold, it will start to request pages through the balloon driver (vmmemctl), this is because it takes time for the Guest OS to respond to the vmmemctl driver with suitable pages. By starting prematurely, the VMkernel tries to avoid the situation that it will reach the Soft state or worse. So you can see ballooning occurring sometimes before the Soft state is reached. (between 6 and 4% free memory)
One exception is the virtual machine memory limit, if a limit is set on the virtual machine, the VMkernel always tries to balloon or swap pages of the virtual machine after reaching its limit, even if the ESX host has enough free memory available.
=========================================================
A Beginner’s Guide to Memory Reclamation in ESX/ESXi
Today we have a guest post to the blog from Venkatramani Krishnamurthy, a Tech Support Engineer in our India offices. Venkat (for short) gives us an introduction to some of the different ways ESX and ESXi manipulate memory.
There are four different methods by which ESX reclaims virtual machine memory. They are:
- Transparent Page sharing
- Ballooning
- Hypervisor swapping
- Memory compression
Transparent Page Sharing
In situations where many virtual machines with identical memory content are running on an ESX host, the hypervisor will maintain just a single copy while reclaiming the redundant ones. This results in a reduction of the host memory consumption by the virtual machines.
The redundant pages are identified by their content. ESX employs a special hashing algorithm to identify these pages.
Ballooning:
This is an innovative memory reclamation technique where the guest operating system is made aware of the host's low memory status. The guest operating system by default is not aware that it is running in a virtual machine and the amount of free host memory. The balloon driver is part of the VMware Tools installation. It communicates with the hypervisor directly which will set the balloon size based on the amount of host physical pages it needs to reclaim. The balloon driver then pins the pages in the guest free list and the hypervisor then reclaims the host physical pages which are backing them. If the guest operating system is under memory pressure, it will decide which guest pages to page out without any intervention from the hypervisor.
Hypervisor swapping
This is generally used as a last resort to reclaim memory. In this technique, the hypervisor creates a separate swap file for each virtual machine when it is powered on and swaps out guest physical memory thus freeing up host memory.
Memory compression:
Here instead of the pages being swapped out, they are compressed and stored in a cache on the main memory itself. These pages can be accessed again just by a decompression rather than through disk I/O in the case of page swapping which can significantly improve application performance when the host is under memory pressure. If a page cannot be compressed, it will be swapped out.
A word about Host free memory states.
There are four different states namely, high, soft, hard and low. Page sharing is enabled by default. The memory state in which the host state is in determines if swapping or ballooning is to be employed.
In the high state no reclamation techniques are employed unless there is a virtual machine memory limit configured. In the soft state, ballooning is employed. In the hard state, both ballooning and swapping is used. During swapping, memory compression is also activated. In the low state, in addition to employing all memory reclamation techniques, execution of all virtual machines that consume more memory than their target memory allocations is blocked.
For further reading on this topic, refer to our guide that explains more fully how this works –http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_memory_mgmt.pdf