Problems with Mongoid and Sidekiq- Brainstorming
30 Jan 2013A few weeks back, we started slowly upgrading all of our queues at Gogobot to work with Sidekiq.
Posts on how awesome the experience was and how much better Sidekiq is from Resque coming soon, though with all the good came some bad.
Summary of the solution
With Sidekiq, we are processing around 25X more jobs than what we were doing with Resque, processing around 15,000,000 jobs per day, at paces of over 1K per second at times (at peak we go up well past that)
This is how many jobs we processed today…
And this is a snapshot of our realtime graph
On the MongoDB side we are working with Mongoid and we have a shared environment, 9 shards with 3 replicas in each shard, all running through 2 routers.
Our production mongoid config looks like this
production:
op_timeout: 3
connection_timeout: 3
sessions:
default:
hosts:
- HOST_NAME:27017 #Single router R0
username: USER_NAME
password: PASSWORD
database: DATABASE_NAME
options:
consistency: :eventual
We are using latest versions of all relevant gems (Sidekiq, Mongoid, Moped, Redis)
All seems fine right? What’s the problem?
The problem is that we have too many connections opening and closing to our mongo instances. (~25-40 new connections per second).
Each time a job is picked up, a connection to Mongo is opened and when the job is done, this connection is closed (using Kiqstand middleware).
This is causing huge loads on our router server, and causing mongo to run out of file descriptors at times.
SO?
More then anything, this post is a callout for discussion with anyone using similar solution with similar scale and can assist, I know I would love to brainstorm on how to solve this problem.