Memcached vs Redis?

Ah, the great question of our time.

Question: Which is better, memcached or redis?

Answer:  It depends on the nature of the project and its data.

At a basic level, they are both in memory key/value lookup tables designed for high speed and high availability.  They both cluster.  They both add value to a large project by providing in-memory objects without need to reference a data store.  They have comparable data access speeds.

Ultimately, though, past a single instance, they have different deployed and scaled architectures.  Memcached is a flat array of additive instances sharded by a hash value in the client.  Redis is a classic master-slave architecture which scales out to slave of slaves.   A picture makes their two different core philosophies clear:

Memcached has a flat architecture while Redis uses Master-Slave and Slave of Slaves.

Memcached has a flat architecture while Redis uses Master-Slave and Slave of Slaves.

Making a technical and architectural decision is difficult, especially on a component as critical as caching.  Here's some of the pros and cons of each system selection:

Memcached Pros:

  • Low complexity
  • Simple to configure
  • Few command macros == simple to master
  • Atomic increment and decrement
  • Simple to cluster -- uses a hashing algorithm at the client to find keys in a cluster
  • Runs like a rock -- memcached requires a nuclear strike to fall over
  • Can withstand a member dying
  • Many years in production
  • Every programming language has a memcached library.  

Memcached Cons:

  • Doesn't do anything besides be an in-memory key/value store
  • Caches sharded by client do not scale across AWS zones
  • Unbalanced memcached clusters require a full system restart
  • Adding a member to the pool requires reconfiguring and rebooting the client
  • Seriously doesn't do anything besides be an in-memory key/value store

Redis Pros:

  • Stores data in a variety of formats: list, array, sets and sorted sets
  • Pipelining!  Multiple commands at once
  • Blocking reads -- will sit and wait until another process writes data to the cache
  • Mass insertion of data to prime a cache
  • Does pub/sub... if you want to do pub/sub through your cache
  • Partitions data across multiple redis instances
  • Can back data to disk

Redis Cons:

  • Super complex to configure -- requires consideration of data size to configure well
  • SENTINEL, the automated failover which promotes a slave to master, is perpetually on the redis unstable branch
  • Master-slave architecture means if the master wipes out, and SENTINEL doesn't work, the system is sad
  • Lots o' server administration for monitoring and partitioning and balancing... 

From a completely neutral position it's a push -- unless the plan is to do crazy things with the caching system like spread it across multiple availability zones, or use it as a publish/subscriber system instead of using ZeroMQ or RabbitMQ or Zookeeper, it comes down to religious arguments and the importance of maintaining that Redis master.  

For Project Butterfly, I will choose memcached because it's simple and it meets the needs of the project.  Even with features like rate limiting or keeping in-memory counts of used convention passes in real time, memcached works fine.  The project won't use any of Redis's advanced features.  Redis does neat things, but this project doesn't hit any of them.

My advice: if you need to cache database queries or REST round trips or numbers or page fragments, use memcached.  Most cached data types and caching needs are simple and straight forward.  This is 80% of all caching needs.  

If you need to do operations on cached datasets at once, or need to spread one enormous cache over a geographically challenged area, or read-write splitting against caching will enormously help performance and alleviate cache connection starvation, use redis.  

But where would I use something like redis?  Where would I need its extra features, where its featureset overwhelms its complexity?

  • Leaderboards

Redis can manage sets in memory, which gives it an advantage here over memcached.  Leaderbaords are sets of items ordered just in time and served from in memory to a page.  A database is too slow -- and it's all disposable data.  Here both the pipelining and the key/values to sorted sets are powerful features.

  • On the Fly Voting Systems

Again, piping to multiple items and ordering sets in memory where a system reads them out and streams results -- websockets! -- in real time makes implementing this feature simpler and streamlined.

  • Page Clicks and Analytics

One can implement a page clicks and analytics engine on memcached and backed to a database but redis is really good at counting lists and sets of things.  Of all of redis's features, its ability to do key/value to sorted sets is where it exceeds memcached -- and counting something like page clicks per sets of pages and then summing those numbers together into analytics which can be pumped via a worker into a bigger analytics engine is one place where the redis choice is the right one.

One last thing: regardless of the choice, a caching system is not a database.  Some folks out there want to replace their RBDMS with a cache.  Eventually, if that data has any worth, that data has to make its way via asynchronous jobs back to a real data store.  Man cannot live on caching alone.  A system needs caching and a database.