Logging Architectures and Splunk

If I could spend money on any underlying infrastructure component for a large distributed system it would be on Splunk

Splunk is a log aggregator.  It accepts data from any known data stream, nominally through sets of forwarders, ETLs the data, indexes the data, and provides the data for search.  So data can come in from a number of different sources -- syslog, application logs, nginx logs, firewall logs -- can be aggregated together across a common logging framework using Splunk's map reduce-like language.  Reports can span over time, be shown in a variety of formats, and can live update into monitors.  And it can alert.

For a system that has, say, several hundred virtualized nodes all playing different roles in a complex system, Splunk is a godsend.  It coughs up all the data as long as there are logs -- going back to my argument that a good distributed system must log.  This is what the system does with all those endless logs.  It sends them to Splunk.  

With some careful logging and mining, one can trace a user from the furthest edge to interactions with various deep in systems and then back out again at the end of their session, skipping from system to system to system. Splunk is every bit as sweet a system as the website plays up.


Let's talk about the howevers.

Splunk costs all the money.  Whatever money you have, it costs it.  It costs more than that.  Because the licensing is by the gigabyte, it costs more the more useful it is.  Sure when a few web logs go in, it's moderately useful.  Once it starts surfacing analytics, the voracious appetite for analytics never lets up.  It's like a heroin addiction.  The first high is great but then it costs more and more to maintain that high and get to new ones.  

Splunk does need some serious love from professional services to become useful.  It is a large complex system with multiple moving parts that cannot be understood by just reading the manual.   Without love indexers fill up, indexes are not updated, reports not maintained, and the system falls into disrepair.  It takes on a post-Apocalyptic air of unlove. 

And... Splunk needs to run on dedicated highly performant hardware with equally highly performant disk.  This feeds into the top point -- Splunk is expensive.  But if the model is to spend money on analytics and system intelligence instead of time, this is the far end of the money-time trade-off.  

If there's time to build and little money, this isn't the right buy because it is extremely pricy.  But if there's either little time, lots of money, or the business is risk-averse, it's a fantastic product.

I sound like a shill for Splunk -- and it is lovely, they can send me all the free t-shirts -- but this is all warm-up for "how do I build splunk without any money and... is it any good?"