A customer complained that with almost nothing running, 20G of RAM was used and unaccounted for in a Linux VM on VMware ESXi. My first reaction was it must be cache and buffers. I wanted to send the customer straight to Linux ate my ram.
The customer was spot on. Look at free output
# free -m total used free shared buffers cached Mem: 32233 22257 9975 0 340 412 -/+ buffers/cache: 21504 10728 Swap: 2047 99 1948
I could not find anything using that much RAM in ps and top. My suspicion was confirmed. The balloon driver has been inflated, a technique by which VMkernel reclaims memory from Guest OSes when memory is scarce.
# vmware-toolbox-cmd stat balloon 20951 MB
Why was memory scarce?
DRS, which load balances resources across hosts in a cluster was set to manual mode. In manual mode, it will provide recommendation which will be applied only if the admin does so.
One host was clearly overburdened on RAM.
I turned DRS back to Fully Automated and manually migrated the impacted VMs to other hosts to deflate the balloon driver.
DRS had several recommendations, which I applied.
After switching DRS to Fully Automated, the memory load is now better distributed across the hosts.