break

Data Processing with Ruby, AMQP and RabbitMQ

The amount of data businesses have to process is increasing every day.  Eventually traditional approaches become impractical options to process this data due to time constraints or the sheer size of the dataset.  However as datasets have grown, other ways to process and analyze them have come to light.  One specific problem involved 14 datasets dumped nightly to a remote site where they are read in, converted and merged into a single dataset.  A traditional approach took about 2 days to process one dataset.  The new approach using Ruby, AMQP and RabbitMQ takes less than 24 hours to process, convert and merge all 14 datasets.

This faster approach uses RabbitMQ (written in Erlang) as our “middleman” and Ruby workers to publish the legacy data and another set of Ruby workers to convert and merge into a single dataset.  The choice of using the Ruby programming language for the workers stemmed from being able to re-use conversion code from the old approach saving time in developing the new system.

When receiving the nightly load workers are started up to publish the necessary information from each dataset to its respective queue in RabbitMQ.  The workers waiting to convert and merge the data are sent batches of messages with the information and begin their processing and continue to process data until there are no more messages left in the queue.  If any worker goes down or fails for any reason, the messages it did not process successfully are recovered by the message queue making sure all queue items are processed.

When the next nightly dump comes around the message queues are filled again and the workers repeat their workflow.  The new approach and system processes the nightly dataset dumps on time and allows the clients using the system to have up to date information.

4 Responses

  1. bmuller Says:

    Why AMQP instead of XMPP?

  2. john Says:

    XMPP was considered as an option but did not fit into the problem domain as cleanly as AMQP. What we really needed was a persistent/recoverable notification system of legacy data needing to be processed. In this case it was drilled down to individual records. To do this in XMPP we would need a way to communicate that a record was processed which would require a middleman or the publisher of the legacy data to receive back replies that a job was finished. Another downpoint of XMPP in this problem domain is the ability to add workers and publishers dynamically. In XMPP this would require some sort of communication bridge that would allow publishers to see appropriate workers and then load balance between them. With RabbitMQ this was already built-in which allowed us to develop the new system more quickly. AMQP and RabbitMQ were chosen over XMPP due to their features and the specific problem domain.

  3. BRUCE Says:


    Pillspot.org. Canadian Health&Care.Best quality drugs.No prescription online pharmacy.Special Internet Prices. Online Pharmacy. Buy pills online

    Buy:Super Active ED Pack.Propecia.Viagra Professional.Viagra Soft Tabs.Soma.Viagra.Cialis Super Active+.Cialis Soft Tabs.Viagra Super Force.Levitra.Zithromax.Cialis.Cialis Professional.Maxaman.Tramadol.Viagra Super Active+.VPXL….

  4. JAY Says:


    Pillspot.org. Canadian Health&Care.Best quality drugs.Special Internet Prices.No prescription online pharmacy. No prescription drugs. Order drugs online

    Buy:Actos.Lumigan.Valtrex.Mega Hoodia.Zyban.100% Pure Okinawan Coral Calcium.Petcam (Metacam) Oral Suspension.Synthroid.Arimidex.Zovirax.Prevacid.Human Growth Hormone.Accutane.Retin-A.Nexium.Prednisolone….

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.