Background
Using Cascading deployed on Amazon Elastic MapReduce, Ion Flux created a scalable gene sequencing and realignment algorithm to analyze massive amounts of data. Ion Flux is able to readily hire Java programmers instead of MapReduce experts and focus on genomic problems rather than infrastructure technicalities.
Solution
Ion Flux turned to Amazon Elastic MapReduce and Cascading from Concurrent, Inc. to manage jobs on as many as ten- to thirty- node clusters utilizing 200-500 cores, launched in parallel for up to 50 hours.
Ion Flux chose Cascading because it provides the right level of abstraction for production-quality and –scale use of Hadoop. The company found that Cascading is the only way to attack real-world scale problems and deploy to production rapidly. Cascading is a straightforward API that allows programmers to work in Java rather than MapReduce. Cascading also provides finer-grained application definition and allows complex, distributed processes to be assembled quickly. With Cascading, programmers can create new operations, reuse past operations and chain them together easily.
Benefits
With Cascading, Ion Flux can focus on adding features their customer want instead of optimizing and tuning at the Hadoop level. The company can also launch analytics capabilities quickly to production, since they are already implemented on the same platform. In addition, Ion Flux is able to hire Java programmers rather than MapReduce experts and manage MapReduce jobs easily.
“Given our aggressive development timelines, application complexity and need to constantly modify analytics jobs, Cascading was a perfect fit,” commented Dr. Allen Day, Ion Flux’s founder and CEO. “It is much easier and faster for our developers to work in Java and reuse common operations. Cascading has let us focus on the genomics problem, not technicalities of the underlying computing platform. Cascading empowers us to effectively develop and quickly deliver the services our customers want.”
Ion Flux plans to continue to continue to use Cascading for production-quality and –scale use of Hadoop as they develop their analysis services. The solution will help them create new capabilities with fast time to market.