All posts by admin

Concurrent Announces Release Of Cascading 2.5 and Lingual 1.0 To Simplify Application Development Using Hadoop

Concurrent Announces Release Of Cascading 2.5 and Lingual 1.0 To Simplify Application Development Using Hadoop
Arnal Dayaratna, Ph.D., Cloud Computing Today
November 19, 2013
http://cloud-computing-today.com/2013/11/19/58784/

Today, Concurrent elaborates on the release of Cascading 2.5, the open source framework for facilitating the development of applications on Apache Hadoop. Cascading 2.5 supports the recent released Hadoop 2.0 distribution including YARN and its other features. Cascading users that are interested in upgrading to Hadoop 2.0 can do so by means of Cascading 2.5. Similarly, applications that leverage the Scalding, Cascalog and PyCascading languages can migrate to Hadoop 2.0 as well by means of the Cascading 2.5 framework. The latest release of Cascading also features “complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS,” according to the Concurrent’s press release. Finally, the release deepens its compatibility with other Hadoop distributions and Hadoop as a Service vendors such as Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR.

Cascading 2.5 represents one of the few products in either the commercial or open source ecosystem for simplifying the development of Hadoop applications while integrating with a rich and varied ecosystem of products as illustrated below:

concurrent-cascading

The graphic shows how Cascading 2.5 supports all major Hadoop distributions in addition to an impressive list of development languages, database platforms and cloud platforms. In an interview with Cloud Computing Today, Concurrent CEO Gary Nakamura and CTO Chris Wensel noted the uniqueness of Cascading in the Big Data landscape, particularly given its iterative refinement in collaboration with the likes of Twitter, eBay and The Climate Corporation over a period of more than five years.

Today’s announcement regarding the general availability of Cascading 2.5 is accompanied by news of the general availability of Lingual, an ANSI-compliant SQL interface that allows developers to use SQL commands to query data stored in Hadoop clusters. Unlike Apache’s Hive project, Lingual’s ANSI-standard SQL interface enables developers to deploy authentic SQL commands as opposed to HIVE’s SQL-like syntax. Cascading Lingual also allows for the migration of legacy SQL workloads onto Hadoop clusters, the export of Hadoop data onto BI tools such as Jaspersoft, Pentaho and Talend, and the ability to leverage the power of Cascading in conjunction with SQL to orchestrate the execution of multiple SQL queries instead of several, discrete disparate queries. The Big Data space should expect more from Concurrent as it continues to build out tools for simplifying application development on Hadoop, particularly as more and more Hadoop developers come to terms with Cascading’s advantages over MapReduce.

Concurrent Builds New Tools for Hadoop Big Data Access

Concurrent Builds New Tools for Hadoop Big Data Access Christopher Tozzi, The VAR Guy November 19, 2013 http://thevarguy.com/big-data-technology-solutions-and-information/concurrent-builds-new-tools-hadoop-big-data-access

The channel is continuing to bring the power of Hadoop, the open source platform for Big Data, up to speed with the database and storage tools necessary to leverage that power. This week, Concurrent in unveiling two major software releases with important implications for Big Data analytics and application programming. Concurrent’s professed goal is to build “the most advanced software platform for Enterprise Big Data applications.” Its most recent new software platform, called Cascading Lingual and introduced Nov. 19, works toward that end by simplifying the interface between Hadoop data and the applications, such as business intelligence (BI) tools, that rely on that data. Concurrent is billing Cascading Lingual as “Hadoop for Everyone Else,” and says the platform:

enables virtually anyone familiar with SQL to instantly work with data stored on Hadoop using their JDBC compliant BI or desktop tool of choice. Enterprises benefit as they can execute on Big Data strategies using existing in-house resources, skills sets and product investments. Cascading Lingual drives improved enterprise productivity, time-to-market benefits and the deployment of a sane and maintainable Big Data strategy.

Also on Nov. 19, Concurrent unveiled version 2.5 of its Cascading platform, an application framework for building Java programs that leverage Hadoop. In addition to performance improvements, the update brings support for Hadoop 2—and with it YARN, a major innovation in Hadoop that represents a complete rewrite of the resource manager, formerly known as MapReduce. Cascading 2.5 also strengthens Concurrent’s channel presence by making the platform more compatible with the Hadoop tools offered by other vendors, including ClouderaHortonworksMapRIntel (INTC), Altiscale,Qubole and Amazon (AMZN) Elastic Map Reduce, the company said. Viewed broadly, Concurrent’s recent innovations, including similar advances in next-generation storage technology, reflect the drive to make Hadoop more usable. In some senses, it’s surprising that this remains an ongoing effort, since Hadoop has been around now for some time. But with Hadoop 2 out as of October, the channel has a new imperative to build solutions in this space. Stay tuned.

Concurrent Tools Up For Hadoop 2, Hadoop for Everyone Else

Concurrent Tools Up For Hadoop 2, Hadoop for Everyone Else
Steve Wexler, IT Trends & Analysis
November 19, 2013
http://it-tna.com/2013/11/19/concurrent-tools-up-for-hadoop-2-hadoop-for-everyone-else/

Big Data and Apache Hadoop (High-availability distributed object-oriented platform) are racing towards respectability, and Concurrent wants to both accelerate the journey and make it easier with a couple of announcements, Cascading 2.5 and Cascading Lingual. Despite its rapid enterprise adoption, developing and building applications on Hadoop has proven to be difficult, stated Chris Wensel, Concurrent founder and CTO, who added the company’s focus is forging a simpler path to mass Hadoop adoption by delivering a framework for building powerful and reliable data-oriented applications supporting data driven business models – quickly and easily.

“Hadoop is a mess, a total mess,” said CEO Gary Nakamura. “It’s whack-a-mole as far as solving problems.” Hence the company’s focus to make it easier for mainstream enterprises to build these applications and continue on with their Big Data strategy, he said.

Concurrent, which bills itself as the enterprise Big Data application platform company, said the new release of Cascading features support for the recently announced Hadoop 2 (October GA), including YARN (MapReduce 2.0, MRv2). With Cascading Lingual (Hadoop for Everyone Else), an open source project that provides ANSI compatible SQL, enterprises with business intelligence (BI) tools such as Pentaho, Jaspersoft and Cognos, can now access their data on Hadoop in a matter of hours, rather than weeks, The new offerings provide a number of benefits, said Wensel, including making it easier to migrate applications to Hadoop, and the ability to create Cascading application on the fly.

According to Gartner, Big Data is still making its way up to the Peak of Inflated Expectations, but Hadoop is well on its way to the bottom of the Trough of Disillusionment. That means while there is still a lot of hype and confusion about Big Data is quickly moving to being a useful and more widely adopted technology.

It’s still early days for Big Data and analytics, but the future looks very bright. With organizations drowning in data, and missing out on opportunities, real-time operational intelligence systems are moving from ‘nice to have’ to ‘must have for survival’, said Gartner. It predicts that analytics will reach 50% of potential users by 2014, and by 2020, that figure will be 75%. Post 2020 we’ll be heading toward 100% of potential users and into the realms of the Internet of Everything.

Hadoop may not be the only way to address Big Data, but it is growing strongly, a compound annual growth rate of 55.63% between 2012-2016. Hadoop-as-a-Service is growing even faster, with a CAGR of 95.16% during this period. The Hadoop market is expected to reach $13.9 billion by 2017, with North America, which accounted for 53.85% of the overall market in 2012, leading the way.

A new IDC study, commissioned by Red Hat, found that virtually all of the companies (99%) surveyed have either deployed or plan to deploy Hadoop. A third (32%) have already made a Hadoop deployment, 31% intend to deploy Hadoop in the next 12 months, and 36% plan to use a Hadoop deployment in more than a year.

“We’re very cognizant that Hadoop is not the be all and end all of Big Data,” said Nakamura. Going forward, Concurrent plans to support the different computational platforms, with another platform offering already in the works, he said.

Under The Hood

To be publicly available soon and freely licensable under the Apache 2.0 License Agreement, Cascading 2.5 features and benefits include:

  • support for Hadoop 2 and its new features, including YARN, so Cascading users can upgrade to Hadoop 2 and seamlessly migrate their applications; in addition, Big Data applications using domain specific languages (DSLs), such as Scalding (Scala on Cascading), Cascalog (Clojure on Cascading) and PyCascading (Jython on Cascading) languages, will also seamlessly migrate to Hadoop 2;
  • added performance improvements for complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS; and,
  • additional broad compatibility with other Hadoop vendors and Hadoop as a service providers, including Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR.

Called Hadoop for Everyone Else, Cascading Lingual enables virtually anyone familiar with SQL to instantly work with data stored on Hadoop using their JDBC compliant BI or desktop tool of choice, according to Concurrent. Offering a true ANSI-standard SQL interface, it is compatible with all major Hadoop distributions whether on-premise or in the cloud. Use-case examples include:

  • data analysts, scientists and developers can now simply ‘cut and paste’ existing ANSI SQL code to instantly access data locked on or migrate applications to a Hadoop cluster;
  • developers can use a standard Java JDBC interface to create new Hadoop applications, or use the Cascading APIs to build applications with a mix of SQL and custom Java, Scala or Clojure code; and,
  • companies can now query and export data from Hadoop directly into traditional BI tools.

Cascading 2.5 gets Lingual

Cascading 2.5 gets Lingual
Alex Handy, SD Times
November 19, 2013
http://www.sdtimes.com/content/article.aspx?ArticleID=66386&page=1

Concurrent has updated its flagship, open-source project, Cascading. This Hadoop development library gives developers a way to separate their business logic from the rest of their Hadoop code. The result is that Cascading 2.5, released today, is now able to interface with multiple versions of Hadoop, and to export data from a Hadoop cluster using a SQL query.

Chris Wensel, creator of Cascading and CTO of Concurrent, said that Cascading brings a more familiar development model to the Hadoop world. “Cascading is a Java library that adds two key core components. It allows you to isolate your business logic and do tests in a model you’re familiar with. And it has an alternative API of MapReduce, though it uses MapReduce under the hood. You can focus on business logic by simply reading and writing files,” he said.

In order to expand the capabilities of Cascading, version 2.5 adds Lingual. Lingual executes ANSI SQL as Cascading applications across a Hadoop cluster. While these queries don’t return as fast as a SQL query into a relational database, they do allow developers to use SQL to pull data out of Hadoop.

Wensel is clear that this SQL support for Hadoop is not intended to be competition for Greenplum or other Big Data analysis systems that allow large-scale SQL queries across Big Data sets. Instead, he said, “We’re being honest and saying Hadoop is great for migrating workloads without low-latency SLAs. Hadoop is glue: It’s good at integrating systems and working reliably.”

Lingual also comes with a JDBC driver, allowing developers to treat Hadoop as a standard Java-accessible SQL-addressable data source.

Cascading 2.5 also uncouples the Hadoop-specific code from the actual Cascading functionality. In version 2.0, this manifested as the ability to run Cascading code in memory, without Hadoop. In version 2.5, this capability means that Cascading code can run on Hadoop 1.x and Hadoop 2.x without modification.

Big Data Application Framework Gets Update, SQL Interface

Big Data Application Framework Gets Update, SQL Interface
Thor Olavsrud, CIO Magazine
November 19, 2013
http://www.cio.com/article/743423/Big_Data_Application_Framework_Gets_Update_SQL_Interface

Open source big data application platform specialist Concurrent has released a new version of the Cascading application framework and simultaneously released Cascading Lingual 1.0, an ANSI SQL interface for Hadoop.

Building on last month’s release of Apache Hadoop 2.2, big data application platform specialist Concurrent today released a new version of Cascading, its big data application framework.

Concurrent also announced the general availability of Cascading Lingual 1.0, an open source project that provides a comprehensive ANSI SQL interface.

Cascading is a stand-alone open source Java application framework designed as an alternative API to MapReduce. Cascading gives Java developers the capability to build big data applications on Hadoop using their existing skillset.

“I created Cascading in anger after having used MapReduce once in my life and vowing never to use it again,” says Chris Wensel, creator of Cascading and founder and CTO of Concurrent.

The latest release, Cascading 2.5 adds support for Hadoop 2.2, including the new YARN architecture introduced in that version of Hadoop. Apache Hadoop YARN (Yet Another Resource Negotiator) serves as the Hadoop operating system, taking what was a single-use data platform for batch processing and evolving it into a multi-use platform that enables batch, interactive, online and stream processing.

YARN acts as the primary resource manager and mediator of access to data stored in Hadoop Distributed File System (HDFS), giving enterprises the capability to store data in a single place and then interact with it in multiple ways, simultaneously, with consistent levels of service.

Enterprises can now use Cascading to leverage Java, legacy SQL and predictive modeling investments for a single big data processing application.

Migration Path to Hadoop 2

Gary Nakamura, CEO of Concurrent, says that Cascading doesn’t leverage YARN specifically, but does enable users to seamlessly migrate their applications to Hadoop 2 and take advantage of YARN. Domain specific languages (DSLs) like Scalding, Cascalog and PyCascading also seamlessly migrate to Hadoop 2. Similarly, Cascading will support Apache Tez when it takes its place in the Hadoop stack.

Concurrent has also added performance improvements for complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS.

In addition to Cascading, Concurrent announced the immediate availability of Cascading Lingual 1.0, intended to help enterprises that have already invested heavily in business intelligence (BI) tools like Pentaho, Jaspersoft and Cognos—and the training to go with them—to quickly access their data on Hadoop. Lingual allows users to utilize their existing SQL skills and systems to create and run applications on Hadoop.

Concurrent’s Wensel says Lingual empowers just about anyone familiar with SQL to instantly work with data stored on Hadoop using their JDBC-compliant BI or desktop tool of choice.

“Cascading is an important component to the big data application development ecosystem, and Lingual is another step forward in making it significantly easier to build big data apps,” says Steve McPherson, group manager, Amazon Elastic MapReduce (EMR) at Amazon Web Services (AWS).

“Now, Amazon Elastic MapReduce customers can leverage Lingual to integrate disparate data stores on Amazon Web Services with services such as Amazon S3 and Amazon Redshift, and they can process the data and store it in Amazon EMR through one standard ANSI SQL statement,” McPherson says. “This makes it easier for customers to query data with their favorite BI tool.”

Concurrent, Inc. Accelerates Big Data Application Development in the Enterprise

Introducing Cascading 2.5, now with full support and compatibility for Hadoop 2, including YARN

SAN FRANCISCO – Nov. 19, 2013 – Concurrent, Inc., the enterprise Big Data application platform company, today introduces Cascading 2.5, an open source Big Data application framework for building robust, enterprise Big Data applications on Apache Hadoop™. The most widely used and deployed Big Data application platform now delivers support for Hadoop 2, including YARN, representing another powerful step toward delivering on the full promise of the business of data.

Enterprises today are rapidly adopting Hadoop and other computation engines to process, manage and make sense of growing volumes of both unstructured and semi-structured data. At the same time, the need to rapidly and reliably build enterprise-class applications without deep knowledge of these technologies is the greatest it has ever been. Cascading fulfills this critical need by allowing businesses to leverage their existing skillsets, investments and systems to build enterprise-class applications on Hadoop, thereby lowering the barrier to adoption. With the family of Cascading applications, enterprises can apply Java, legacy SQL and predictive modeling investments, and combine the respective outputs of multiple departments into a single data processing application.

New Cascading 2.5 features and benefits include:

  • Support for Hadoop 2 and its new features, including YARN. Cascading users looking to upgrade to Hadoop 2 will now be able to seamlessly migrate their applications and take advantage of new advance features like YARN. Furthermore, Big Data applications using domain specific languages (DSLs), such as the widely used Scalding (Scala on Cascading), Cascalog (Clojure on Cascading) and PyCascading (Jython on Cascading) languages, will also seamlessly migrate to Hadoop 2.
  • Added performance improvements for complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS.
  • Additional broad compatibility with other Hadoop vendors and Hadoop as a service providers, including Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR, among others give Cascading users a richer set of deployment options and services available to them, whether on-premise or in the cloud.

Concurrent also announced today the general availability ofCascading Lingual, an open source project that provides a comprehensive ANSI SQL interface. The solution allows enterprises to quickly migrate legacy SQL workloads from data warehouses and mainframes to Hadoop by simply cutting and pasting their SQL and redeploying them on Hadoop – enabling a fast, simple and reliable way to add workloads onto Hadoop. Moreover, Cascading Lingual provides out-of-the-box support for JDBC. Enterprises that have invested millions of dollars in business intelligence (BI) tools, such as Pentaho, Jaspersoft and Cognos, and training can now also access their data on Hadoop through standard SQL interface.

Supporting Quotes

“The combined power of Concurrent’s Cascading and Cloudera CDH, the world’s most widely deployed Apache Hadoop distribution, delivers users an enterprise-grade, flexible framework and API for building and managing Big Data applications.”
-Tim Stevens, Vice President, Corporate and Business Development, Cloudera

“As a leading provider of Hadoop-in-the-Cloud, Altiscale is focused on leveraging the potential of Hadoop 2. Our enterprise customers are very interested in the new capabilities in Hadoop 2, so it’s critical that they are easy to consume. Our partnership with Concurrent and integration with Cascading 2.5 allows our customers to quickly build mission-critical, easy-to-maintain Hadoop applications to run on our ’Big Data Dialtone.’”
-Raymie Stata, CEO, Altiscale

“Within Hortonworks Data Platform 2.0, YARN enables enterprises to interact with all data in multiple ways simultaneously, making Hadoop a true multi-use data platform and allowing it to take its place in a modern data architecture. We are pleased to see Concurrent embrace these significant advances as they make Cascading 2.5 available.”
-John Kreisa, Vice President of Strategic Marketing, Hortonworks

“Despite its rapid enterprise adoption, developing and building applications on Hadoop has proven to be difficult. With Hadoop 2, the community has addressed many concerns, paving a clearer path for enterprise users. At Concurrent, we’re dedicated to forging a simpler path to mass Hadoop adoption by delivering a framework for building powerful and reliable data-oriented applications supporting data driven business models – quickly and easily. Our support for Hadoop 2 was an easy decision, as we continue to be an integral part of the Hadoop and Big Data ecosystem, providing solutions that simplify application development and management for the enterprise.”
-Chris Wensel, Founder and CTO, Concurrent, Inc.

Supporting Resources

Cascading website: http://cascading.org
Company: http://concurrentinc.com
Contact us: http://concurrentinc.com/contact
Follow us on Twitter: http://twitter.com/concurrent

Availability and Pricing

Cascading 2.5 will soon be publicly available and freely licensable under the Apache 2.0 License Agreement. To learn more about Cascading, visit http://cascading.org. Concurrent also offers standard and premium support subscriptions for enterprise use. To learn more about Concurrent’s offerings, please visithttp://concurrentinc.com.

About Concurrent, Inc.

Concurrent, Inc. is the enterprise Big Data application platform company. Founded in 2008, Concurrent simplifies Big Data application development, deployment and management on Apache Hadoop. We are the company behind Cascading, the most widely used and deployed technology for building Big Data applications with more than 110,000 user downloads a month. Enterprises including Twitter, eBay, The Climate Corporation, Square and Etsy all rely on Concurrent’s technology to drive their Big Data deployments. Concurrent is headquartered in San Francisco. Visit Concurrent online at http://concurrentinc.com.


Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Concurrent Delivers All-Access Pass to BI Tools on Hadoop

Available now and built for the enterprise; Cascading Lingual enables users to query and export data from Hadoop directly into any traditional BI solution

SAN FRANCISCO – Nov. 19, 2013 – Concurrent, Inc., the enterprise Big Data application platform company, today announced the immediate availability of Cascading Lingual, an open source project that provides ANSI compatible SQL enabling fast and simple Big Data application development on Apache Hadoop™.  With Cascading Lingual, enterprises that have invested millions of dollars in business intelligence (BI) tools, such as Pentaho, Jaspersoft and Cognos, and training can now access their data on Hadoop in a matter of hours, rather than weeks. By leveraging the power and broad platform support of the Cascading application framework, Cascading Lingual lowers the barrier to enterprise application development on Hadoop.

Enterprises are rapidly adopting Hadoop to deal with growing volumes of both unstructured and semi-structured data. The need for Hadoop to easily integrate with existing data management systems, however, creates a real barrier to unlocking the full potential of Big Data and Hadoop. Cascading Lingual allows users to utilize existing SQL skills and systems to instantly create and run applications on Hadoop. As a result, data analysts, scientists and developers can now easily work with data stored on Hadoop using their favorite BI tool.

Cascading Lingual: Hadoop for Everyone Else

Cascading Lingual enables virtually anyone familiar with SQL to instantly work with data stored on Hadoop using their JDBC compliant BI or desktop tool of choice. Enterprises benefit as they can execute on Big Data strategies using existing in-house resources, skills sets and product investments. Cascading Lingual drives improved enterprise productivity, time-to-market benefits and the deployment of a sane and maintainable Big Data strategy.

Offering a true ANSI-standard SQL interface, Cascading Lingual is compatible with all major Hadoopdistributions whether on-premise or in the cloud. This project has coverage of more than 7,000 SQL-99 statements derived from sophisticated industry standard OLAP tools, delivering the broadest SQL coverage for any tool in the Hadoop ecosystem. It’s innovative by making Hadoop simple and accessible, and by providing easy systems integration for multiple data stores into Hadoop by using just one SQL statement. Cascading Lingual use-case examples include:

  • Data analysts, scientists and developers can now simply ‘cut and paste’ existing ANSI SQL code to instantly access data locked on or migrate applications to a Hadoop cluster.
  • Developers can use a standard Java JDBC interface to create new Hadoop applications, or use the Cascading APIs to build applications with a mix of SQL and custom Java, Scala or Clojure code.
  • Being ANSI-standard compliant and supporting the standard Java JDBC interface, companies can now query and export data from Hadoop directly into traditional BI tools.

Concurrent also announced today the release of Cascading 2.5, an open source Big Data application framework with full support and compatibility for Hadoop 2, including YARN. As enterprises upgrade to Hadoop 2 or move from one platform to another, Cascading eliminates the complexity associated with platform migration by acting as the abstraction layer between hardware platforms and applications.

Cascading is the most widely used and deployed application framework for building robust, enterprise Big Data applications on Hadoop. Companies, including The Climate Corporation, eBay, Etsy, Square, Trulia, TeleNav and Twitter are using Cascading to streamline data processing, data filtering and workflow optimization for large volumes of unstructured and semi-structured data. Cascading is also at the core of popular language extensions including PyCascading (Python + Cascading), Scalding (Scala + Cascading) and Cascalog (Clojure + Cascading) – open source projects sponsored by Twitter. Cascading has become the most reliable and repeatable way of building and deploying Big Data applications.

Supporting Quotes

“Cascading is an important component to the Big Data application development ecosystem, and Lingual is another step forward in making it significantly easier to build Big Data apps. Now, Amazon Elastic MapReduce (EMR) customers can leverage Lingual to integrate disparate data stores on Amazon Web Services (AWS) with services such as Amazon S3 and Amazon Redshift, and they can process the data and store it in Amazon EMR through one standard ANSI SQL statement. This makes it easier for customers to query data with their favorite BI tool.”
-Steve McPherson, Group Manager, Amazon EMR at AWS

“Concurrent is committed to our mission of simplifying enterprise application development and deployment on Hadoop. Cascading Lingual abstracts the complexity of Hadoop into languages and interfaces that people already know, providing a pragmatic, yet powerful, tool for enterprises that are looking to make the most of their in-house talent. With Cascading Lingual, companies can now make the most of Hadoop without the costly investment in additional BI tools and training.”
-Gary Nakamura, Chief Executive Officer, Concurrent, Inc.

Supporting Resources

Cascading Lingual website: http://cascading.org/lingual
Cascading website: http://cascading.org
Company: http://concurrentinc.com
Contact us: http://concurrentinc.com/contact
Follow us on Twitter: http://twitter.com/concurrent

Availability and Pricing

Cascading Lingual is now publicly available and freely licensable under the Apache 2.0 License Agreement. To learn more about the Cascading Lingual project, visit http://cascading.org/lingual. Concurrent also offers standard and premium support subscriptions for enterprise use. To learn more about Concurrent’s offerings, please visit http://concurrentinc.com.

About Concurrent, Inc.

Concurrent, Inc. is the enterprise Big Data application platform company. Founded in 2008, Concurrent simplifies Big Data application development, deployment and management on Apache Hadoop. We are the company behind Cascading, the most widely used and deployed technology for building Big Data applications with more than 110,000 user downloads a month. Enterprises including Twitter, eBay, The Climate Corporation, Square and Etsy all rely on Concurrent’s technology to drive their Big Data deployments.  Concurrent is headquartered in San Francisco. Visit Concurrent online athttp://concurrentinc.com.


Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 340 1982
concurrent@kulesafaul.com

Dealing with Big Data

Dealing with Big Data
By Amber E. Watson
November 15, 2013
http://www.softwaremag.com/content/ContentCT.asp?P=3457

Collecting, mananging, processing, and analyzing big data is critical to business growth.

Big data holds the key to the critical information an organization needs to attract and retain customers, grow revenue, cut costs, and transform business.

According to SAS, big data describes the exponential growth, availability, and use of structured and unstructured information. Big data is growing at a rapid rate, and it comes from sources as diverse as sensors, social media posts, mobile users, advertising, cybersecurity, and medical images.

Organizations must determine the best way to collect massive amounts of data from various sources, but the challenge lies in how that data is managed and analyzed for the best return on investment (ROI). The ultimate goal for organizations is to harness relevant data and use it to make the best decisions for growth.

Technologies today not only support the collection and storage of large amounts of data, but they also provide the ability to understand and take advantage of its full value.

Challenges of Collecting Big Data
A common description of big data involves “the four V’s,” including “the growing volume, which is the scale of the data; velocity—how fast the data is moving; variety—the many sources of the data; and veracity—how accurate and truthful is the data,” explains James Kobielus, big data evangelist, IBM.

Organizations strive to take advantage of big data to better understand the customer experience and behavior, and to tailor content to maximize opportunities.

Steve Wooledge, VP of marketing, Teradata Unified Data Architecture, notes that gaining insights from big, diverse data typically involves four steps including data acquisition, data preparation, analysis, and visualization.

“One of the key challenges in collecting big data is the ability to capture it without too much up-front modeling and preparation,” he says. “This becomes especially important when data comes in at a high velocity and variety due to different types of data—for example, Web logs generated from visitor activity on high-traffic ecommerce Web sites, or text data from social media sites.”

Separate, disconnected silos of data also increase the time and effort required for data analysis, which is why integrated and unified interfaces are preferred.

Webster Mudge, senior director of technology solutions, Cloudera, agrees that the variety of big data presents a particular issue because a disproportionate amount of data growth experienced by organizations is with unstructured or variable structured information, such as sensor and social media content.

“It is difficult for business and IT teams to fully capture these forms of data in traditional systems because they do not typically fit cleanly and efficiently with traditional modeling approaches,” he points out. “Thus, organizations have trouble extracting value from these forms due to model constraints and impedance.”

With a large volume of incoming data, transferring it reliably from sources to destinations over a wide-area network is also challenging. “Once data arrives at the destination, it is often logged or backed up for recoverability. The next step is ingesting large volumes of data in the data warehouse. Because loading tends to be continuous, the biggest challenge is ingesting and concurrently allowing reporting/analytics against the data,” states Jay Desai, co-founder, XtremeData Inc.

Mike Dickey, founder/CEO, Cloudmeter, explains that traditionally, network data was captured with intrusive solutions that modified the source code of Web applications (apps). “Companies either add extra code that runs within Web browsers, known as ‘page tags,’ or modify the server code to generate extra log files,” he notes.

“But both approaches restrict companies to capture only a subset of the information available because the excess code introduces significant latency, harming performance, and degrading customer experiences. It is also a nightmare on the back end for IT to stay on top of dynamic, ever-changing sites.”

To get the best use of one’s data, George Lumpkin, VP of product management, big data and data warehousing, Oracle, advises an organization to first understand what data it needs and how to capture it.

“In some cases, an organization must investigate new technologies like Apache Hadoop or NoSQL databases. In other cases, it may be required to add a new kind of cluster to run the software, increase networking capacity, or increase speed of analytics. If the use case involves real-time responses on streaming data—as opposed to more batch-oriented use—there are tools around event processing, caching, and decision automation,” he says.

These considerations, of course, must be integrated with the existing information infrastructure. “Big data should not exist in a vacuum,” says Lumpkin. “It is the combination of new data and existing enterprise data that yields the best new insight and business value.”

With new technology comes the need for new skills. “There is an assumption that since Hadoop is open source and available for free, somebody should just download it and get started. While the early adopters of Hadoop did take this approach, they have significantly developed their skill sets over time to install, tune, use, update, and support the software internally, efficiently, and effectively,” shares Lumpkin.

The key to realizing value with big data is through analytics, so organizations are advised to invest in new data science skills, adopting and growing statistical analysis capabilities.

Big Data Management
When it comes time to analyze big data, the analytics are only as good as the data. It is important to think carefully about how data is collected and captured ahead of time. Sometimes businesses are left with information that doesn’t correlate, and are unable to pull the analytics needed from the information gathered.

Data preparation and analysis is a major undertaking. “Data acquisition and preparation can itself take up 80 percent of the effort spent on big data if done using suboptimal approaches and manual work,” cautions Teradata’s Wooledge.

“One key challenge of managing data is preparing different types of data for analysis, especially for multistructured data. For example, Web logs must first be parsed and sessionized to identify the actions customers performed and the discrete sessions during which customers interacted with the Web site. Given the volume, variety, and velocity of big data, it is impossible to prepare this data manually,” he adds.

To solve a business problem effectively, Wooledge advises organizations to use an iterative discovery process and combine different analytic techniques to solve business problems. “Discovery is an interactive process that requires rapid processing of large data volumes in a scalable manner. Once a user changes one or more parameters of the analysis and incorporates additional data using a discovery platform, they are able to see results in minutes versus waiting hours or days before they are able to do the next iteration.”

In addition, many big data problems require more than one analytic technique to be applied at the same time to produce the right insight to solve the problem most effectively. The ability to easily use different analytic techniques on data collected during the discovery process amplifies the effectiveness of the effort.

Cloudera’s Mudge shares that traditional approaches to modeling and transformation assume that a business understands what questions need to be answered by the data. However, if the value of the data is unknown, organizations need latitude within their modeling processes to easily change the questions in order to find the answers.

“Traditional approaches are often rigid in model definition and modification, so changes may take significant effort and time. Moreover, traditional approaches are often confined to a single computing framework, such as SQL, to discover and ask questions of the data. And in the era of big data, organizations require options for how they compute and interpret data and cannot be limited to one approach,” he notes.

After collecting data, an organization is sitting on a large quantity of new data that is unfamiliar in content and format. “For example, a company may have a large volume of social media comments coming in at high velocity; some of those comments are made by important customers. The question becomes how to disambiguate this data or match the incoming data with existing enterprise data. Of all the data collected, what is potentially relevant and useful?” asks Oracle’s Lumpkin.

Next, the company must try to understand how the new data interrelates to determine hidden relationships and help transform business. “A combination of social media comments, geographic location, past response to offers, buying history, and several other factors may predict what it needs. The team just has to figure out this winning combination,” shares Lumpkin.

Data governance and security are also important considerations. Does a company have the right to use the data it captures? Is the data sensitive or private? Does it have a suitable security policy to control and audit access?

Harnessing Big Data
To harness diverse data, outsmart the competition, and achieve the highest ROI, Wooledge advises organizations to include all types of data across customer interactions and transactions.

“Traditional approaches of buying hardware/software do not often work for big data because of scale, complexities, and costs,” adds XtremeData’s Desai. “Cloud-based, big data solutions help companies avoid up-front commitments and have infinite on-demand capacity.”

Paco Nathan, director of data science, Concurrent, Inc., adds, “Companies today recognize the importance of collecting data, aggregating into the cloud, and sharing with corporate IT for centralized analysis and decisions. The ecommerce sector, including Amazon, Apple, and Google, for example, have faired well with big data in terms of marketing funnel analytics, online advertising, customer recommendation systems, and anti-fraud classifiers.”

Another facet to consider is productivity since some of the biggest costs associated with big data are labor and lost opportunity. To mitigate this, Desai recommends that companies look for “load-and-go” solutions/services, meaning they do not require heavy engineering or optimization and can be deployed quickly for business use.

According to IBM’s Kobielus, a differentiated big data analytics approach is needed to keep up with the ever-changing requirements of a business. “Alternative, one-size-fits-all approaches of applying the same analytics to different challenges are flawed in that they do not address the fact that on any given day, a business may need to quickly understand consumer purchases, more effectively manage large financial data sets, and more seamlessly detect fraud in real time,” he says.

A comprehensive, big data platform strategy ensures reliable access, storage, and analysis of data regardless of how fast it is moving, what type it is, or where it resides.

Mudge recommends that data maintain its original, fine-grain format. “By storing data in full fidelity, future interpretation of the data is not constrained by earlier questions and decisions, and associated modeling and transformation efforts. By keeping data in its native format, organizations are free to repeatedly change its interpretation and examination of the original data according to the demands of the business,” he explains.

Cloudmeter’s Dickey points out the importance of having an analytics solution that processes and feeds data in real time. “Since today’s online business operates in real time, it is critical to be able to respond in a timely manner to trends and events that may impact an online business and the customer experience,” he says.

As Kobielus points out, petabytes of fast-moving, accurate data from a variety of sources have little value unless they provide actionable information. It is not just the fact that there is big data, it is what you do with it that counts.

Products to Help
Several key technologies are available to help organizations get a handle on big data and to extract meaningful value from it.

Cloudera – Cloudera’s Distribution Including Apache Hadoop (CDH) is a 100 percent open-source platform for big data for enterprises and organizations. It offers scalable storage, batch processing, interactive SQL, and interactive search, along with other enterprise-grade features such as continuous availability of data. CDH is a collection of Apache projects; Cloudera develops, maintains, and certifies this configuration of open-source projects to provide stability, coordination, ease of use, and support for Hadoop.

The company offers additional products for CDH, including Cloudera Standard—which combines CDH with Cloudera Manager for cluster management capabilities like automated deployment, centralized administration and monitoring, and diagnostic tools—and Cloudera Enterprise, a comprehensive maintenance and support subscription offering.

Cloudera also offers a number of computing frameworks and system tools to accelerate and govern work on Hadoop. Cloudera Navigator provides a centralized data management application for data auditing and access management, and extends governance, risk, and compliance policies to data in Hadoop. Cloudera Impala is a low-latency, SQL query engine that runs natively in Hadoop and operates on shared, open-standard formats. Cloudera Search, which is powered by Apache Solr, is a natural language, full-text, interactive search engine for Hadoop and comes with several scalable indexing options.

Cloudmeter – Cloudmeter captures and processes data available in companies’ network traffic to provide marketing and IT with complete insight into their users’ experiences and behavior. Cloudmeter works with companies whose Web presence is critical to their business, including Netflix, SAP, Saks Fifth Avenue, and Skinit.

Cloudmeter Stream and Cloudmeter Insight use ultralight agents that passively mine big data streams generated by Web apps, enabling customers to gain real-time access into the wealth of business and IT information available without connecting to a physical network infrastructure.

Now generally available, Cloudmeter Stream can be used with other big data tools, such as Splunk, GoodData, and Hadoop. Currently in private beta, Cloudmeter Insight is the first software as a service application performance management product to include visual session replay capabilities so users can see the individual visitor sessions and the technical information behind them.

Concurrent, Inc. – Concurrent, Inc. is a big data company that builds application infrastructure products designed to help enterprises create, deploy, run, and manage data processing applications at scale on Hadoop.

Cascading is used by companies like Twitter, eBay, The Climate Corporation, and Etsy to streamline data processing, data filtering, and workflow optimization for large volumes of unstructured and semi-structured data. By leveraging the cascading framework, enterprises can apply Java, SQL, and predictive modeling investments—and combine the respective outputs of multiple departments into a single application on Hadoop.

Most recently, Concurrent announced Pattern—a standards-based scoring engine that enables analysts and data scientists to quickly deploy machine-learning applications on Hadoop—and Lingual, a project that enables fast and simple big data application development on Hadoop. All tools are open source.

IBM – IBM’s big data portfolio of products includes InfoSphere BigInsights—an enterprise-ready, Hadoop-based solution for managing and analyzing large volumes of structured and unstructured data.

According to IBM, InfoSphere Streams enables continuous analysis of massive volumes of streaming data with submillisecond response times. InfoSphere Data Explorer is discovery and navigation software that provides real-time access and fusion of big data with rich and varied data from enterprise applications for greater insight and ROI. IBM PureData for Analytics simplifies and optimizes performance of data services for analytic applications, enabling complex algorithms to run in minutes.

Finally, DB2 with BLU Acceleration provides fast analytics to help organizations understand and tackle large amounts of big data. BLU provides in-memory technologies, parallel vector processing, actionable compression, and data skipping, which speeds up analytics for the business user.

LexisNexis – High-performing computing cluster (HPCC) Systems from LexisNexis is an open-source big data processing platform designed to solve big data problems for the enterprise. HPCC Systems provides HPCC technology with a single architecture and a consistent data-centric programming language. HPCC Systems offers a Community Edition, which includes free platform software with community support, and an Enterprise Edition, which includes platform software with enterprise-class support.

Oracle – Oracle’s big data solutions are grouped into three main categories—big data analytics, data management, and infrastructure. Big data analytics are covered by Oracle Endeca Information Discovery—a data discovery platform that enables visualization and exploration of information, and uncovers hidden relationships between data, whether it be new, unstructured, or trusted data kept in data warehouses.

Oracle Database, including Oracle Database 12c, contains a number of in-database analytics options. Oracle Business Intelligence Foundation Suite provides comprehensive capabilities for business intelligence, including enterprise reporting, dashboards, ad hoc analysis, multidimensional online analytical processing, score cards, and predictive analytics on an integrated platform. Oracle Real-Time Decisions automates decision management and includes business rules and self-learning to improve offer acceptance over time.

Oracle’s data management tools are used to acquire and organize big data across heterogeneous systems. Oracle NoSQL Database is a scalable key/value database, providing atomicity consistency isolation durability transactions and predictable latency. MySQL and MySQL Cluster are widely used for large-scale Web applications. Oracle Database is a foundation for successful online transaction processing and data warehouse implementations. Oracle Big Data Connectors link Oracle Database and Hadoop. Oracle Data Integrator is used to design and implement extract, transform, and load processes that move and integrate new data and existing enterprise data. Oracle distributes and supports Cloudera’s distribution, including Hadoop.

Lastly, for big data infrastructure, Oracle says its Engineered Systems ship pre-integrated to reduce the cost and complexity of IT infrastructures while increasing the productivity and performance of a data center to better manage big data. Oracle Big Data Appliance is a platform for Hadoop to acquire and organize big data.

SAS – SAS offers information management, which provides strategies and solutions that enable big data to be managed and used effectively through high-performance and visual analytics, and flexible deployment options.

Flexible deployment models bring choice. High-performance analytics from SAS are able to analyze billions of variables, and these solutions can be deployed in the cloud with SAS or another provider, on a dedicated high-performance analytics appliance, or within existing IT infrastructure—whichever best suits an organization’s requirements.

Teradata – Within the Unified Data Architecture, Teradata integrates data and technology to maximize the value of all data, providing new insights to thousands of users and applications across the enterprise.

According to Teradata, its Aster Discovery Platform—which includes the Teradata Aster Database and Teradata Aster Discovery Portfolio—offers a single unified platform that enables organizations to access, join, and analyze multistructured data from a variety of sources from a single SQL interface. The Visual SQL-MapReduce functions and out-of-the box functionality for integrated data acquisition, data preparation, analysis, and visualization in a single SQL statement generate business insights from big, diverse data.

The recently announced Teradata Portfolio for Hadoop provides open, flexible, and comprehensive options to deploy and manage Hadoop.

XtremeData – XtremeData is a petabyte-scale SQL data warehouse for big data analytics on private and public clouds. XtremeData’s “load-and-go” structure means that users implement schema, load data, and start running queries using industry standard SQL tools. XtremeData allows users to freely run queries against multiple large tables and continuously add new data sets without the need for performance engineering.

XtremeData is designed for big data applications that need continuous, real-time ingests and interactive analytics with high availability, such as digital/mobile advertising, gaming, social, Web, networking, telecom, and cybersecurity.

More Data, More Success
While the sheer volume and variety of big data available today presents a challenge for organizations to capture, manage, and analyze, it also holds the key to business growth and success. “Leveraging and analyzing big data enables organizations and industries as a whole to generate powerful, disruptive, even radical transformations,” explains IBM’s Kobielus.

Organizations with access to large data collections strive to harness the most relevant data and use it for optimized decision-making. Big data technologies not only support the ability to collect large amounts of data, but they also provide the ability to understand it and take advantage of its value for better ROI. The more information an organization is able to collect and analyze successfully, the more it has the ability to change, adapt, and succeed long term.

Nov2013, Software Magazine

MeetUp | Cascading Office Hours in San Francisco – Nov 13, 2013

Sign up here: http://meetu.ps/22ZnsY

When:
Wed, Nov 13 6:00 PM (PT)

Where:
Mikkeller Bar
34 Mason Street
San Francisco, CA 94102

Come and greet the Cascading team on Wednesday, Nov 13 in San Francisco!

Cascading gurus, Chris Wensel and Alexis Roos, will be hosting informal office hours and will be there to answer your pertinent questions around Cascading and the related projects you’re working on. Whether you’re a veteran of Cascading and its extensions or simply new to the Cascading framework– this is the right place to get your questions answered by experts!

BusinessInterviews.com Q&A with Gary Nakamura, CEO, Concurrent, Inc.

BusinessInterviews.com Q&A with Gary Nakamura, CEO, Concurrent, Inc.
By BusinessInterviews.com
October 10, 2013
http://www.businessinterviews.com/concurrent-gary-nakamura

Concurrent, Inc.’s vision is to become the #1 software platform choice for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data processing applications on Apache Hadoop.

Concurrent is the mind behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 90,000+ user downloads a month. Used by thousands of data-driven businesses including Twitter, eBay, The Climate Corp, and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

What issue does your core product help solve and how so?

Cascading is all about making it easy for non-experts to adopt Hadoop for Big Data application development. When it comes to developing applications on Hadoop, developers program in MapReduce or scripting query languages like Pig and Hive. However, there is a growing realization that MapReduce is difficult to use, and Pig and Hive fall short of enterprise needs and enabling anything more than simple ad hoc analysis.

This is where Cascading comes into play, as it offers the first viable alternative to MapReduce for processing huge data sets on Hadoop. Cascading enables developers, data analysts and data scientists to quickly and easily build robust data processing and data management applications on Hadoop that can be deployed in the cloud or within private data centers. Cascading simplifies programming, workflow and data processing, allowing many successful, data-driven companies to gain significant productivity benefits, making it significantly easier to execute their Big Data strategy with existing skills sets, systems and tools – all while reducing the overall complexity of their technology infrastructure.

What has been your biggest challenge as a business owner and how have you met that challenge?

Our biggest challenge to date has been investing in and perfecting software that we give away for free, allowing tens of thousands of developers to adopt and thousands of companies to benefit and drive their big data strategies.

Going forward it will be about building a business, and continue to deliver useful and valuable software to market.

What’s the most exciting thing on the horizon for you/your Company?

We secured $4 Million in funding earlier this year, and are now working to release our first commercial product. The enterprise product is currently in alpha, and we are working to ship a beta version for additional users to evaluate. It is not a repackaging of our open source tools, but will be a new software offering that adds capabilities aimed at enterprises that need the added visibility and control of their big data applications.

Where do you envision your company in 5 years, 10 years, 20 years?

In 5 years, we want Concurrent to be the source for Big Data application infrastructure. We are well on our way to delivering the most comprehensive set of tools and runtime services aimed at enterprise needs. In 10 years, we want Concurrent to develop the same clout as Oracle, without the same overhead. As for 20 years from now… it’s too tough to say. We will stay focused on delivering great software to market whether in open source software or commercially, staying true to that will carry us forward. We look forward to furthering our business with the release of Concurrent’s first commercial product.

Have you had any mentors or role models that have influenced you? Describe the impact.

There are 2 quotes that have influenced my focus, planning, execution and consistent work ethic:

Thomas Jefferson – The harder I work, the luckier I get.
Branch Rickey – Luck is the residue of design.

The combination of the two has helped me immensely.