SAN FRANCISCO – June 19, 2013 – Concurrent, Inc., the enterprise Big Data application platform company, today announced that Paco Nathan, director of Data Science, will deliver a talk, titled “Pattern – an open source project for migrating predictive models from SAS, R, Microstrategy®, etc., onto Hadoop” at the 6th Annual Hadoop Summit North America, taking place June 26-27, 2013 in San Jose, Calif. This two-day event will feature Apache Hadoop™ thought leaders who will showcase successful Hadoop use cases, share development tips and tricks and educate organizations about how to best leverage Apache Hadoop as a key constituent in their enterprise data architecture.
Details At-A-Glance
What: “Pattern – an open source project for migrating predictive models from SAS, R, Microstrategy®, etc., onto Hadoop” speaking session at Hadoop Summit
Who: Paco Nathan, director of Data Science of Concurrent, Inc., the company behind the Cascading™ application framework
When: Wednesday, June 26 at 4:55 p.m. PDT
Where: San Jose Convention Center
How: Register at http://www.hadoopsummit.org/san-jose/register/
Session Description
Pattern is a free, open source project, which takes models trained in popular analytics tools, such as SAS®, Microstrategy®, R and SQL Server, and runs them at scale on Apache Hadoop. This machine-learning library, based on the popular Cascading framework, works by translating PMML into data workflows and can be quickly deployed on your Apache Hadoop data. PMML models can be run in a pre-defined JAR file with no coding required and can also be combined with other flows based on ANSI SQL (Lingual), Scala (Scalding) and Clojure (Cascalog) to meet enterprise requirements. Benefits include greatly reduced development costs and less licensing issues at scale, while leveraging a combination of Apache Hadoop clusters, existing intellectual property in predictive models, and the core competencies of analytics staff. Sample code in this talk will show apps using predictive models built in SAS and R. In addition, examples will show how to compare variations of models for large-scale customer experiments. Portions of this material come from the O’Reilly book “Enterprise Data Workflows with Cascading,” publishing on July 10, 2013.
About the Speaker
Paco Nathan is director of Data Science at Concurrent, Inc., where he leads the company’s developer outreach program. He has a dual background from Stanford in math/statistics and distributed computing, and has more than 25 years experience in thetechnology industry. Nathan is an expert in Hadoop, R, predictive analytics, machine learning and natural language processing.
Supporting Resources
Cascading website: http://www.cascading.org
Pattern website: http://www.cascading.org/pattern
Concurrent website: http://www.concurrentinc.com
Contact Us: http://www.concurrentinc.com/contact
Follow us on Twitter: http://www.twitter.com/concurrent
About Concurrent, Inc.
Concurrent, Inc.’s vision is to become the #1 software platform choice for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data processing applications at scale on Apache Hadoop. Concurrent is the mind behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 75,000+ user downloads a month. Used by thousands of data driven businesses including Twitter, eBay, The Climate Corp, and Etsy, Cascading is the de-facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco. Visit Concurrent online at http://www.concurrentinc.com.
Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 340-1982
concurrent@kulesafaul.com