Adding Caching to Project Butterfly in Node.JS

Since Project Butterfly is written entirely in Node.JS, it needs a Node.JS compatible memcached library.  We will use the Overclocked/MC library as it's full featured -- memcached clustering, consistent hashing, configurable options, and supports the full memcached API.  Also, it has nice documentation on its web site.  Documentation wins out over slick but undocumented features every time.

I haven't mentioned Node.JS programming technique much because of a lack of code samples, but like every other operation in Node.JS, getting, setting or incr/decr a key requires a callback to process the result of the call.  Non-callback calls in Node.JS leads to blocking and reduces the efficiency of the scheduler. 1  It's a different technique than normal non-event driven programming.

Technically, it's fine to build one giant memcached cluster and let front end and backend servers use them as long as the keys are namespaced.  But it's also fine to build one big cluster and then logically separate the servers in the memcached configuration files on the applications into two.  This makes the clusters easier to monitor, easier to balance and easier to scale on demand if need be.  

Also, build systems out in pairs for the redundancy/failover/DR needs.

Project Butterfly with two MemcacheD cache clusters

Project Butterfly with two MemcacheD cache clusters

I'm somewhat convinced LucidChart is going to come take my car for the amount that I've been abusing their free service.

Memcached Front End Cache

Holds...

  • Rendered page fragments
  • REST API call fragments
  • Convention pass counts/running counts of passes burned/fast checks on capacity
  • Session data (later)
  • Rate Limiting on Login/Failed Passwords (later)
  • Anything that could be pulled from the backend

Fragment caching is more a Rails thing that a Node.JS thing but caching partial responses from AJAX calls will help scalability.  It's a key/value cache and can be configured to hoover up the entire RAM allotment of a VM.  Use the memory.

Memcached Back End Cache

Holds...

  • Queries.
  • More queries.
  • Honestly, it should hold queries.
  • It should expire out queries after a reasonable amount of time and then cache queries.
  • Any solutions derived from those queries -- roll up data that might go into queries, say.
  • Also, queries.

Web applications have an 80% reads/20% writes profile on using the database.  Memcached's job is to keep the code from hitting that database on the 80% reads to let the database deal with the writes.  It's up to the code to expire out keys from memcached on having to do an update to data stored in the cache and the reload the cache from data.  

If there's tons of data to keep in memory, yes, backend caches can grow to enormous size.

Note: This is all backend caching.  Adding backend caching will give the code a great lift.  Yet, after having two memcached clusters caching data, this doesn't include asset caching, nginx compression and tuning, and CDNs.  It's about 1/2 the caching in a running system.

Honestly, cache until it hurts and then cache some more.  Memory is the fastest resource in a live running system.

  1. Cough callback hell cough