In 2007, after high frustration with writing websites in Java and angry with mod_python failing under Apache, I read Why's Poignant Guide to Ruby and started building in Ruby on Rails. For a while it was fantastic until I was bitten by its issues.
The language feature that makes Ruby stand out is its strong support for metaprogramming. Any part of Ruby can be overwritten, changed, or morphed at run time -- literally any part. Sick of the stock string class? Have a gem that does something you don't like? Change it! Rewrite it! Want to generate classes of unknown size with their own database calls? No problem!
Metaprogramming is powerful for web programming (see the growth of Clojure)... but it extracts a huge cost in memory usage. It has to load all these objects into RAM with an uncertain amount of absolute memory overhead. And running as a single threaded interpreter because Ruby doesn't do threads, it has to sit and think every time garbage collection comes along causing everything to be slow. Ruby 2.0 tries to tackle the GC issue with a better algorithm and backend servers like EventMachine give it libevent support to event GC away but the memory issue becomes a real nasty snarl once talking about an architecture based on many small VMs with few cores and little RAM instead of huge bare metal systems with tons of cores and huge swaths of RAM.
Ruby on Rails -- even 4.0 -- has some architectural challenges in getting it to high concurrency in a highly distributed system:
- The Rails stack lends itself to single box LAMP stacks. By default it provides a single monolithic MVC2 stack out of the box with helpful support scripts. This makes it super easy to get started writing a front end web application but in the long haul, makes it very difficult to break the application into pieces and get to high user concurrency. Just standing up X giant web servers behind a load balancer does not do the trick.
- ActiveRecord lends itself to some utterly terrible data architecture designs and implementations. Data architecture is one of the keys (if not the key) of getting to incredibly high user concurrency. Code-described schemas and libraries who want to do JOIN across the tables all the time are not it. Magic is very bad in reaching high distributed concurrency. It is absolutely possible to gut ActiveRecord or replace it with an alternative -- or go straight to the database -- but now we're already going down the path of needing to rewrite Rails. Here's a Slideshare on alternative ORMs for Rails.
- Passenger is full of Ruby Magic and Ruby Magic is absolutely non-performant. Please use Unicorn.
- Rails does have an 8G RAM minimum due to the metaprogramming and garbage collection requirements. It really needs a minimum of 4 cores with 4 Unicorns/core to get it to support any kind of load. This is already moving into m1.large territory in AWS.
On the other hand, Ruby on Rails has a fantastic user community, great videos and tutorials for getting started, a huge library of gems to plug in everything from JSON support to entire authentication systems into a Rails app. If the exercise was to focus on code elegancy and beauty instead of distributing out a system then Rails would be a serious contender.
If the team loves Ruby, though, it is totally possible with a combination of something like Goliath, Zookeeper or RabbitMQ, and Memcached to get to a million concurrent but it means ditching Rails entirely. In fact, Goliath workers behind RMQ or Zookeeper will absolutely work. But why contort to use something like Goliath when other systems provide Goliath's internal architecture out of the box? (I like it, though -- it's a fantastic project for Rubyists.)
So putting the Japanese PERL back on the "to consider" pile, I'll look at the next language on the list.