Key Metrics for Monitoring Memcached Performance

What is Memcached?

Memcached belongs to the class of software known as in-memory data stores. It predominantly acts as a caching layer sitting in front of a primary database and storing frequently accessed data in memory. This approach allows applications to get data directly from the cache, avoiding the need for time-consuming database queries that would require disk I/O.

Memcached has a distributed architecture in which a virtual memory pool is created and shared among multiple servers in a cluster. This approach enables Memcached to handle high traffic volumes and cater to growing data demands by adding more nodes to the cluster.

As a key-value store that doesn't understand data structures, Memcached expects all uploaded data to be serialized and stores it as key-value pairs where each key uniquely identifies the corresponding value. This efficient storage mechanism allows Memcached to achieve O(1) speed for all commands. On the most performant servers, it can deliver a throughput of millions of keys per second.

Memcached uses an LRU (Least Recently Used) eviction policy to manage memory. It supports different protocols, including HTTP, TCP, and UDP, which makes it easy to integrate Memcached with a wide range of applications.

It can be used for a variety of use cases:

Caching frequently accessed static web pages
Caching the result sets of the most frequent database calls
Caching the most requested API responses
Storing session data in-memory, enabling faster logins
Storing real-time analytics data, powering fast BI applications

Memcached vs. Redis, Aerospike, and Elasticsearch

While Redis, Aerospike, and Elasticsearch are compelling alternatives, Memcached provides distinct advantages that make it an ideal choice for specific use cases:

Simplicity: Memcached is known for its simplicity and lightweight architecture. It is easy to set up and integrate with applications. It also has a smaller memory footprint and minimal overhead compared to Redis, Elasticsearch, and Aerospike.
Built for caching: Memcached is a purpose-built caching system. Redis, Elasticsearch, and Aerospike offer additional features that may introduce needless complexity when caching is the primary use case.
Simplified data model: Memcached uses a key-value model that makes data interaction easy. While Redis and Aerospike support multiple data structures for versatility, they may not be necessary for straightforward caching scenarios.

Why is it important to monitor Memcached?

Regular monitoring of a Memcached cluster is essential to ensure bottleneck-free operations. Here are other key reasons why you should actively monitor Memcached:

Because it’s a shared memory system

Memcached has a shared memory architecture, which means that multiple processes can access the same memory block at the same time. This can make it harder to track down performance issues, as a problem may be caused by one or more processes. However, by regularly monitoring key metrics, you can detect issues early on, before they escalate.

For example, if you notice that the cache eviction rate suddenly spikes and coincides with an increase in overall memory usage, you can assume that the cache is running out of memory. This insight can help you to take corrective action in a timely manner, avoiding service disruption.

To ensure high performance

Monitoring Memcached allows you to track key performance metrics, including cache hit ratio, cache miss ratio, and response times. By analyzing trends in these metrics, you can gauge whether a cluster is performing at full potential or requires optimization.

For example, if you are noticing a really high cache miss ratio after every build upgrade, you can surmise that you need a better cache initialization strategy at application startup.

To keep tabs on data integrity

As data stored in Memcached resides solely in memory, it is vital to monitor its integrity. Regularly inspecting cache consistency and validating data accuracy ensures that clients receive reliable and fresh information from the cache.

For example, by analyzing older data records, you may detect a key-value pair that has long been invalidated. After investigating the root cause of the staleness, you can manually update the data to ensure that it is accurate and up to date.

For high availability

Even the most fault-tolerant systems like Memcached require monitoring for high availability. Memcached relies on memory, so tracking memory and resource utilization is essential to ensure that the cluster stays up and running.

For example, a monitoring tool can alert administrators if a node is approaching peak memory utilization. This allows them to investigate the issue and respond accordingly, by, for example, increasing the memory resources of the node.

To optimize configurations

Memcached has several configuration options, and it performs best when it’s tuned according to business needs and the underlying hardware. Proactive monitoring provides insights into the resource usage of the cluster, enabling you to optimize configurations based on the actual workload.

For example, you may tweak the cache size, eviction policy, or connection limits for better memory utilization and improved throughput.

To facilitate faster issue resolution

Quick contextualization of an issue is crucial for its timely resolution. Regular monitoring and alerting enable you to contextualize and diagnose issues as they arise. This helps in decreasing the mean time to resolution (MTTR) and ensuring business continuity.

For example, if you notice a few unexpected connection timeouts or failures, you can analyze logs to establish context and apply a timely fix.

Key metrics to monitor Memcached performance

For holistic monitoring of a Memcached cluster in production, focus on the following key metric categories:

Cache performance metrics

Metrics related to the actual cache performance should be at the top of the list of metrics to monitor. These metrics can help you assess how well Memcached performs as the caching layer in your deployment.

Metric	Description
Cache hit rate	The percentage of requests served from the cache.
Cache miss rate	The percentage of requests for which data wasn’t found in the cache.
Cache eviction rate	The frequency at which entries are being evicted from the cache.
Cache fill ratio	The percentage of cache space used to store data.
Total entries	The total amount of key-value pairs stored in the cache.
Average time to live	The average time to live value of all the records in the cache.
Cache turnover rate	The rate at which cache entries are being replaced.
New items	The number of new items added to the cache within a specific time.
Reclaimed items	The total number of expired items that were evicted by Memcached to create space for new entries.
Evicted and unfetched items	The total number of valid data items that were evicted from the cache and were never fetched by any client. High values of this metric should be investigated. (The definition of high differs based on operational requirements and SLAs.)
Expired and unfetched items	The total number of expired data items that were reclaimed, and were never fetched by any client. High values of this metric should be investigated. (The definition of high differs based on operational requirements and SLAs.)

Request metrics

Request metrics offer insights into the command processing layer of Memcached. Let’s look at a few examples:

Metric	Description
Timeouts	The number of times a request to Memcached timed out. A high number of timeouts may be a sign that Memcached is overloaded.
Request rate	The overall rate at which requests are being made to Memcached.
Response time	The average time taken by Memcached to respond to requests.
Request throughput	The number of requests processed by Memcached per unit of time. This metric is a great way to gauge an instance’s instantaneous health.
Request latency	The time taken for individual requests to be processed by Memcached.
Miss latency	The average time taken to access an item that was not found in Memcached. A high miss latency can indicate that Memcached is not caching enough data.
Check and Set (CAS) requests	The total number of CAS requests received by Memcached.
Check and Set (CAS) bad requests	The total number of CAS requests received by Memcached in which the compared value didn’t match the currently cached value.
Check and Set (CAS) hit requests	The total number of CAS requests received by Memcached in which the compared value matched the currently cached value.
Check and Set (CAS) miss requests	The total number of CAS requests received by Memcached in which the requested key wasn’t found.
Get commands	The total number of get commands received by the cache.
Flush commands	The total number of flush commands received by the cache.
Set commands	The total number of set commands received by the cache.
Decrement hits	The total number of decrement requests received by Memcached in which the requested key was found.
Decrement misses	The total number of decrement requests received by Memcached in which the requested key wasn’t found.
Get hits	The total number of get commands received by Memcached in which the requested key was found.
Get misses	The total number of get commands received by Memcached in which the requested key wasn’t found.
Delete hits	The total number of delete commands received by Memcached in which the requested key was found.
Delete misses	The total number of delete commands received by Memcached in which the requested key wasn’t found.
Increment hits	The total number of increment requests received by Memcached in which the requested key was found.
Increment misses	The total number of increment requests received by Memcached in which the requested key wasn’t found.
Replace commands	The total number of replace commands received by the cache.
Append commands	The total number of append commands received by the cache.
Prepend commands	The total number of prepend commands received by the cache.
Gets commands	The total number of gets commands received by the cache.

Memory metrics

Tracking memory metrics is another crucial aspect of monitoring Memcached. Focus on the following important memory metrics:

Metric	Description
Memory usage	The amount of memory currently utilized by Memcached. Ensure that this metric’s value doesn’t approach the max memory threshold.
Available memory	The remaining available memory that Memcached can use.
Fragmentation	The extent of memory fragmentation within the cache. Take steps to minimize fragmentation.
Cache item size	The average size of a key-value pair stored in the cache.
Memory utilization ratio	The percentage of used memory compared to the total memory.
Cache memory allocation	The total amount of memory allocated for caching data in Memcached.
Total bytes read	The total number of bytes that the cache has read from the network.
Total bytes used for caching	The total number of memory bytes used for caching data.
Total bytes written out	The total number of bytes that the cache has written to the network.
Total bytes used for hashing	The total number of memory bytes used for storing data in hash tables.

Network metrics

Network metrics enable you to identify and debug network bottlenecks, latency, and degradations. Focus on the following metrics:

Metric	Description
Current connections	The number of active connections formed by the cache.
Stale connections	The number of stale connections. Strive to keep this value to a minimum (ideally zero).
Total connections	The total number of connections formed by the cache since startup.
Failed connections	The total number of failed connection attempts.
Total connection limit reached	The number of times Memcached reached its max connection limit. Non-zero values of this metric should be investigated immediately.
Accepting connections	A Boolean value to indicate whether the cache is currently accepting connections or not. A false value for this metric warrants immediate investigation.
Network latency	The average time taken for data to travel between clients and Memcached. Strive to reduce this metric’s value as much as possible.
Network errors	The total number of network-related errors that the cache has encountered since startup.
Network throughput	The rate at which data is transferred over the network to/from Memcached.
Packet loss	The amount of packet loss that has occurred. A high packet loss indicates a network problem.

Server health metrics

Keeping an eye on server health metrics allows you to optimize resource utilization and cache efficiency. The following server health metrics are the most important:

Metric	Description
CPU utilization	The current CPU utilization of the Memcached instance.
Max CPU utilization	The maximum CPU utilization of the cache since startup.
Uptime	The amount of time the instance has been up.
Server load	The average load on the Memcached server over a given time. Track this metric over time to study usage patterns and perform adequate capacity planning.
Node failure rate	The frequency of node failures within the cluster. A non-zero value of this metric should be investigated immediately.
Active threads	The total number of currently active worker threads.
Total threads	The total number of threads spawned by the Memcached instance since startup.
Waiting threads	The total number of worker threads spawned by the Memcached instance that are currently in the “Waiting” state.

Monitoring Memcached using the stats command

Memcached offers a built-in command, stats, which can be used to track its performance in real time. The stats command displays the following key statistics:

Current connections, accepting connections, total connections, flush command counter, cache sizes, evicted elements, total bytes read, CAS hits, CAS misses, increment hits, increment misses, authentication commands, evicted non-zero elements, total pages, and others.

The stats command also offers the following sub commands:

stats items: Displays information about the data items stored in the cache per different slabs.
stats slabs: Displays more slab-focused statistics.
stats sizes: Displays a hypothetical distribution of items in the cache if the slabs were divided into 32-byte buckets instead of the configured number of slabs. This command helps in monitoring memory usage and identifying potential memory inefficiencies.

You can combine different statistics from the output of the stats command to gather even more key insights. For example, you can calculate the global hit rate as get hits / (get hits + get misses). Or you can find the number of free connections using: total connections – current connections.

Monitoring Memcached using the Site24x7 plugin

The Site24x7 Memcached plugin allows you to monitor all aspects of your Memcached cluster from a single pane of glass. You can track several key metrics in real time, including hit ratio, miss ratio, bytes read, bytes written, current connections, number of threads, evictions, latency, and throughput.

The free Python-based plugin can be downloaded directly from GitHub. It seamlessly integrates with the Site24x7 Linux agent, allowing you to view real-time performance metrics on the Site24x7 web client.

Conclusion

Memcached's simplicity and efficiency as an in-memory key-value store make it a great choice for improving the performance of dynamic applications. While it may not offer the extensive feature set of Redis or the interactivity of Aerospike it excels at what it was designed for: lightning-fast caching.

This article aimed to provide you with all the information you need to monitor key Memcached metrics using native tools. By proactively tracking these metrics, you can ensure the seamless operation of your Memcached cluster and deliver optimal performance to your users.

A Complete Guide to Memcached Performance Monitoring