All posts by admin

What to Expect from Big Data in 2014

What to Expect from Big Data in 2014
Nick Kolakowski, Slashdot
December 30, 2013
http://slashdot.org/topic/bi/what-to-expect-from-big-data-in-2014/

Hadoop will continue to reign; more clients will demand the latest in encryption; and a lot of analytics firms could fail.

As 2013 stumbles across the finish line, technology publications are busy issuing their inevitable Year’s End lists, in which the Best and Worst of everything is carefully delineated (BlackBerry’s failing! Apple continued to produce shiny things! Ballmer’s leaving Microsoft!). And that’s all well and good, but Big Data is all about the future—and with that in mind, here are some mild predictions for 2014:

Hadoop’s Reign Continues

Apache Hadoop, the open-source framework for running data applications on large hardware clusters, enjoyed considerable momentum in 2013: not only did Hadoop 2.2.0 reach its general-release milestone (improvements included more integration with other open-source projects, as well as YARN, a general-purpose resource management system that makes it easier to work with MapReduce and other frameworks), but a variety of prominent firms released their own distributions of the platform—all but ensuring Hadoop will rumble strong through 2014.

But continuing popularity or no, not every company will use Hadoop effectively. “More Hadoop projects will be swept under the rug as businesses devote major resources to their Big Data projects before doing their due diligence, which results in a costly, disillusioning project failure,” Gary Nakamura, CEO of Concurrent, wrote in a December blog posting. “We may not hear about most of the failures, of course, but the successes will clearly demonstrate the importance of using the right tools.”

For those companies that can accurately assess whether Hadoop is the right tool for their business needs, the improvements to the platform could translate into more effective data-crunching—and better insights. But Hadoop’s continued prominence raises still another issue.

Open-Source vs. Proprietary

Open-source has long remained a double-edged sword for firms that build analytics software. Back in Ye Olden Days of mid-2012, for example, research firm IDC suggested that the availability of open-source solutions could very well hamper the ability of proprietary software platforms to fully capitalize the analytics space. “The Hadoop and MapReduce market will likely develop along the lines established by the development of the Linux ecosystem,” Dan Vesset, vice president of Business Analytics Solutions for IDC, wrote in a statement at the time. “Over the next decade, much of the revenue will be accrued by hardware, applications, and application development and deployment software vendors.”

Open-source’s strength in the analytics space hasn’t dissuaded a broad cross-section of tech giants and startups from building and releasing proprietary software into the ecosystem. But those companies’ solutions will need to push back against open-source—and rival platforms—in order to scratch out any sort of market-share and profit; a good number of them will fail, including several companies that were teetering on the brink in 2013.

Data Encryption

Over the summer, government-contractor-turned-whistleblower Edward Snowden revealed the existence of massive National Security Agency (NSA) programs designed to harvest vast amounts of online data. Those revelations, in turn, have sparked everything from legal battles to promises of government review.

Whether or not the NSA actually rolls back its programs, the newfound awareness of widespread online surveillance will drive more companies to encrypt all their data—and not only in storage, but also in movement from Point A to B. That will spark a rise in costs (encryption is expensive, after all), and perhaps slow some everyday processes down (as encryption adds another layer to most operations)—but with more and more clients demanding the latest in privacy and security, vendors may have no choice but to armor their infrastructure as extensively as possible.
Ease of Use…

Data analytics tools: they’re not just for data scientists anymore. Over the next year, the trend of making those tools easier to use for people with relatively little analytics knowledge will continue, with simplified dashboards and greater integration of multiple types of data.

…But Issues Remain

Facebook and Netflix wouldn’t be operational without custom data infrastructure painstakingly developed by their in-house data scientists; IBM is using its Watson supercomputing platform to assist in everything from retail to healthcare IT; dozens of firms use analytics to improve everything from shipping and logistics to customer service.

But for many firms, the jury’s still out about the ultimate effectiveness of analytics. Back in August, The New York Times asked if Big Data was “an economic big dud,” because U.S. economic productivity had slowed over the past few years despite a massive growth in the amount of data held by individuals and corporations. That might have been overstating things—many analytics tools have only begun to penetrate the market in a meaningful way, suggesting that only a small percentage of firms are leveraging their datasets to fullest effect—but it’s undeniable that, when it comes to various industries, analytics still has something to prove. 2014 might be the year that Big Data actually makes more of a name for itself.

Cascading Enterprise Developer Training

Due to popular demand, we’re happy to announce the availability of our Cascading Enterprise Developer Training course. For those of you that are just getting started with Cascading or those who want to take their Cascading chops to the next level, this course was designed for you.

This comprehensive course covers topics ranging from basic to advanced Cascading concepts and best practices, all reinforced by hands-on labs.

Learn more here at: http://www.cascading.io/services/training

Concurrent, Inc. Momentum Continues, Driving Innovation Through Data

SAN FRANCISCO – Jan. 8, 2013 – Concurrent, Inc., the enterprise Big Data application platform company, finished a stellar 2013 with unprecedented growth and innovative solutions for the enterprise. The company continues to pave the way for enterprise Big Data application development on Apache Hadoop™ with rapid adoption of its Cascading application platform, along with the introduction of open source projects Cascading Lingual 1.0, Cascading Pattern and Cascading 2.5. Additional yearly milestones include $4 million in Series A funding, continued expansion of the company’s leadership team and new strategic partnerships, as well as overwhelming successes seen by Cascading users.

Cascading has become the platform of choice for tackling enterprise Big Data processing needs in an effort to connect growing volumes of data stored with business strategy. With rapid and increasing demand for businesses to derive more from data, Cascading provides a practical and reliable platform for enterprises to leverage existing skillsets, software investments and systems to build enterprise-class applications on Hadoop, thereby lowering the barrier to adoption and facilitating execution of their data strategies. With Cascading, enterprises can apply Java, SQL and predictive modeling investments, and combine the respective outputs of multiple departments into a single enterprise data application.

Corporate, Project and Partnership Milestones

  • Secured $4 million in Series A funding in March 2013, following a $900,000 seed investment in August 2011.
  • Expanded leadership team with the appointments of Gary Nakamura as CEO and Paul O’Leary as vice president of engineering.
  • Launched Cascading 2.1 and Cascading 2.5, the most widely used and deployed application platform for building robust, enterprise Big Data applications on Hadoop, with full support and compatibility for Hadoop 2, including YARN.
  • Launched Cascading Lingual 1.0, a free open source project that delivers ANSI-SQL technology to easily build new and integrate existing applications onto Hadoop while using any traditional BI solution.
  • Launched Cascading Pattern, a free, open source, standard-based scoring engine that enables analysts and data scientists to quickly deploy machine-learning applications via PMML or an API.
  • Increased Cascading user downloads from 50,000 to more than 130,000 per month.
  • Empowered overwhelming user successes, including Twitter’s IPO and the $930 million acquisition of the Climate Corporation by Monsanto.
  • Added new strategic partnerships with technology leaders including Hortonworks, MapR, Think Big Analytics and Qubole.
  • Released the hands-on introductory book ‘Enterprise Data Workflows with Cascading,’ published by O’Reilly Media.

Supporting Quote

“The past year was a great one for Concurrent, marked by several significant project releases, partnership announcements as well as overwhelming enterprise adoption. As more and more enterprises realize that they are in the business of data, only Cascading offers the complete toolbox for quick and easy Big Data application development on Hadoop. We look forward to seeing more success in 2014 as Cascading continues its momentum in the market.”
– Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

Concurrent Announces 2014 Big Data Predictions

Curious to know what’s in store for Big Data in 2014? Check out Concurrent’s top 2014 industry predictions at:http://bit.ly/1aZorQh

About Concurrent, Inc.

Concurrent, Inc. is the enterprise Big Data application platform company. Concurrent simplifies Big Data application development, deployment and management on Apache Hadoop. We are the company behind Cascading, the most widely used and deployed technology for building Big Data applications with more than 130,000 user downloads a month. Enterprises including Twitter, eBay, The Climate Corporation, Square and Etsy rely on Concurrent’s technology to drive their Big Data deployments. Concurrent is headquartered in San Francisco. Visit Concurrent online at http://concurrentinc.com.


Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Big Data In 2014: 6 Bold Predictions

Big Data In 2014: 6 Bold Predictions
Jeff Bertolucci, Information Week
December 23, 2013
http://www.informationweek.com/big-data/big-data-analytics/big-data-in-2014-6-bold-predictions/d/d-id/1113091

‘Tis the season when temperatures tumble, shoppers stumble, and prognosticators fumble, often. Will these big data prophecies come true?

How will big data evolve in 2014? The future is anyone’s guess, of course, but we thought we’d compile a tasty holiday assortment of prognostications from executives working in the big data trenches. So without further delay, here they are — six big data predictions for next year:

1. “More Hadoop projects will fail than succeed.”
That scary assessment is from Gary Nakamura, CEO of Concurrent, a big data application platform company. In a December 12 blog post, Nakamura made a few 2014 forecasts, including this not-so-rosy assessment of Hadoop:

“More Hadoop projects will be swept under the rug as businesses devote major resources to their big data projects before doing their due diligence, which results in a costly, disillusioning project failure. We may not hear about most of the failures, of course, but the successes will clearly demonstrate the importance of using the right tools. The right big data toolkit will enable organizations to easily carry forward the success of these projects, as well as further insert their value into their business processes for market advantage.”

2. Enterprises will focus less on big data and more on stepping up their data management game. 
“There’s no doubt that companies’ pursuits of big data initiatives have the best intentions to improve operational decision making across the enterprise. That being said, companies shouldn’t get stuck on the term ‘big data.’ The true initiative and what they ultimately need to be concerned with is how they’re implementing better data management practices that account for the variety and complexity of the data being acquired for analysis,” Scott Schlesinger, a senior vice president for consulting and outsourcing giant Capgemini, told InformationWeek via email.

3. The pace of big data innovation in the open-source community will accelerate in 2014. 
“New open-source projects like Hadoop 2.0 and YARN, as the next-generation Hadoop resource manager, will make the Hadoop infrastructure more interactive. New open-source projects like STORM, a streaming communications protocol, will enable more real-time, on-demand blending of information in the big data ecosystem,” wrote Quentin Gallivan, CEO of business analytics software firm Pentaho, in a December 5 blog post.

4. The need for automated tools will become increasingly critical.
“It seems that the more data we have, the more we want,” John Joseph, VP of product marketing for analytics software firm Lavastorm Analytics told InformationWeek via email. “But as data volumes increase, the need for pattern matching, simulation, and predictive analytics technologies become more crucial. Engines that can automatically sift through the growing mass of data, identify issues or opportunities, and even take automated action to capitalize on those findings will be a necessity.”

5. Beware, Oracle! 2014 will be the year of SQL on Hadoop. 
“I think you’ll see people start building interactive applications on the Hadoop infrastructure. And what I mean by that — and I think this is probably the most controversial thing — is that people will start replacing their first-generation relational databases with SQL on Hadoop,” said Monte Zweben, CEO of SQL-on-Hadoop database startup Splice Machine, in a phone interview with InformationWeek.

6. Big data flies to the cloud. 
“Big data has gained a lot of traction in 2013 but complex technologies are keeping many businesses from getting their solutions into production and generating a positive ROI. In 2014, businesses will look beyond the hype and turn to cloud solutions that generate fast time to value and do not require highly specialized dedicated skill sets, like Hadoop, to manage. 2014 will be the year that big data moves from buzzword to business imperative,” said Sandy Steier, cofounder and CEO of cloud-based analytics firm 1010data, via email.

2014 Big Data Predictions Include Bigger Data, Bigger Funding, Bigger Failures

Concurrent, Inc. CEO Predicts 2014 Big Data Trends – More Funding and IPOs, Hadoop is Still Everywhere, But It’s All About the Apps

SAN FRANCISCO – Dec. 12, 2013 – Concurrent, Inc., the enterprise Big Data application platform company, today released its top 2014 predictions around Big Data. As the previous year witnessed major market milestones such as the release of Hadoop 2, expansion of the Hadoop ecosystem and continued investment in Big Data initiatives by global enterprises, 2014 promises to be another momentous year for Big Data.

“Today every company is in the business of data, and this will drive the demand for cost effective and scalable Big Data platforms higher than ever before,” said Gary Nakamura, CEO of Concurrent, Inc. “As the market starts to catch up to the Big Data hype, 2014 will prove that leveraging the right tools can make or break a project. In addition to new roles and requirements, the coming year will also see acquisitions, IPOs and funding on the rise and a shift toward Big Data application development as the next big thing.”

Nakamura offered the following 2014 predictions:

  1. Funding is the new black.
    Everyone wants a piece of the Big Data pie. In 2014, we will see Big Data funding only grow, and at least one significant IPO possibly from a player like Cloudera. As investors rush to get in the game, investments in Big Data will increase significantly yet imprudently, as smart people rather than innovative ideas will attract funding. Therefore, next year will signal the need for a funding reset – more due diligence, less mediocrity.
  2. More Hadoop projects will fail than succeed.
    More Hadoop projects will be swept under the rug as businesses devote major resources to their Big Data projects before doing their due diligence, which results in a costly, disillusioning project failure. We may not hear about most of the failures, of course, but the successes will clearly demonstrate the importance of using the right tools. The right Big Data toolkit will enable organizations to easily carry forward the success of these projects, as well as further insert their value into their business processes for market advantage.
  3. Enter the era of the Big Data product manager.
    As enterprises work to bridge the emerging gap between Big Data projects and business processes, this opens the door to a new career opportunity. The role of Big Data product manager will entail the identification, definition and shipping of specific data products aimed to solve critical business problems and maximize opportunities. It will require product management and data experience, along with a deep understanding of data science, and will become the norm as companies recognize that they are in the business of data.
  4. The next big thing in Big Data will NOT be the data, but the apps. 
    While we can expect to see continued investment happen at the data layer, we also anticipate a massive acceleration in the number and frequency of new data applications. These applications will turn into data products for internal and external use, and will clearly emerge as the next area of growth for Big Data.
  5. While it becomes ever ubiquitous, the Big Data “thang” will remain convoluted.
    We are sorry to say that the rapidly evolving Big Data market will continue to confuse rather than provide clear guidance on what it is exactly that enterprises should be doing now. Those businesses that can separate the wheat from the chaff, and execute upon this valuable knowledge, will be the ones to succeed.

Supporting Resources

Company: http://concurrentinc.com
Cascading: http://cascading.org
Contact us: http://concurrentinc.com/contact
Follow us on Twitter: http://twitter.com/concurrent

About Concurrent, Inc.

Concurrent, Inc. is the enterprise Big Data application platform company. Founded in 2008, Concurrent simplifies Big Data application development, deployment and management on Apache Hadoop. We are the company behind Cascading, the most widely used and deployed technology for building Big Data applications with more than 110,000 user downloads a month. Enterprises including Twitter, eBay, The Climate Corporation, Square and Etsy rely on Concurrent’s technology to drive their Big Data deployments. Concurrent is headquartered in San Francisco. Visit Concurrent online athttp://concurrentinc.com.


Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Cascading: Open Source Java App Framework for Big Data

Cascading: Open Source Java App Framework for Big Data
John K. Waters, Application Development Trends
December 4, 2013
http://adtmag.com/blogs/watersworks/2013/12/cascading-big-data.aspx

Enterprise interest in Big Data and associated analytics software has sparked intense interest in Apache Hadoop, the open source framework for running applications on large data clusters built on commodity hardware, and something of a flood of tools for developers working with it. But as an applications market emerges in this space, the next Big Thing for Big Data is likely to be app-oriented middleware.

That’s an insight Tony Baer, principal analyst at Ovum, shared with me when I talked with him recently about Continuuity’s recent Reactor 2.0 release, which the Java toolmaker billed as the first scale-out application server for Apache Hadoop.

“It is inevitable that applications will be developed that run against Big Data,” Baer said, “and as that occurs, it will be necessary to have an application layer that allows developers with Java and other languages to develop apps that run against it.”

Baer’s prediction makes perfect sense, and it’s one reason Java jocks might want to keep an eye on Concurrent, the company behind the open source Cascading project. Cascading is a Java application development framework for rich data analytics and data management apps running across “a variety of computing environments,” with an emphasis on Hadoop and API compatible distributions.

“Big Data is moving to the next phase of maturity and it’s all about the applications,” the company says on its Web site. “The applications process the data and extract the value at scale and we believe that there must be a simple, reliable and consistent way to build, deploy, run and manage these data driven applications.”

Great minds.

Concurrent characterizes Cascading as “a rich Java API for defining complex data flows and creating sophisticated data oriented frameworks,” and it claims more than 110,000 user downloads a month. Its published user list includes Twitter, eBay, Square and Etsy, among others.

The San Francisco-based company recently announced Cascading 2.5 with new support for Hadoop 2 and YARN, the next-gen Hadoop data processing framework (sometimes called MapReduce 2.0).

Chris Wensel, Concurrent’s founder and CTO, has argued that developing and building applications on Hadoop has proven to be difficult, despite the framework’s rapid enterprise adoption. “With Hadoop 2, the community has addressed many concerns, paving a clearer path for enterprise users,” he said in a statement. “At Concurrent, we’re dedicated to forging a simpler path to mass Hadoop adoption by delivering a framework for building powerful and reliable data-oriented applications supporting data driven business models — quickly and easily. Our support for Hadoop 2 was an easy decision, as we continue to be an integral part of the Hadoop and Big Data ecosystem, providing solutions that simplify application development and management for the enterprise.”

As a Java-based framework, Cascading fits naturally into JVM-based languages, including Scala, Clojure, JRuby, Jython and Groovy. And the Cascading community has created scripting and query languages for many of these languages. The company’s extensions page offers a growing list of user contributed code.

Cascading 2.5 is publicly available and freely licensable under the Apache 2.0 License Agreement.

Concurrent, Inc. and Qubole Partner to Accelerate Enterprise Big Data Adoption

With Qubole Data Service, Cascading Users Can Easily Deploy Big Data Applications at Scale on an Elastic, Cost-Effective Cloud Platform

SAN FRANCISCO – Dec. 4, 2013 – Concurrent, Inc., the enterprise Big Data application platform company, and Qubole, a provider of the next-generation cloud Big Data platform, today announced a partnership to allow Cascading users to easily deploy Big Data applications at scale on an elastic, cost-effective cloud platform. The partnership combines the power of Cascading, the most widely used and deployed application framework for building robust enterprise Big Data applications, with Qubole Data Service (QDS), a Big Data as a Service solution providing a fast and scalable infrastructure in the cloud.

With more than 110,000 downloads a month, Cascading has become the enterprise development framework of choice for processing data sets on Apache Hadoop™. Cascading simplifies programming, workflow and data processing, allowing enterprises to cut time to production by 50 percent. As a result, enterprises can more easily execute on their Big Data strategies by leveraging existing skillsets, systems and tools – all while reducing the overall complexity of their technology infrastructure. Moreover, Cascading is compatible with most Hadoop distributions to prevent infrastructure lock-in.

QDS is a turnkey Big Data as a Service solution for Big Data teams, running on the top-performing elastic Hadoop engine in the cloud. QDS helps enterprise customers get to value quickly by removing key obstacles associated with building a Hadoop data cluster. With QDS, the power of Big Data meets the simplicity of the cloud.

The combination of Concurrent’s Cascading framework and Qubole’s QDS platform simplifies Big Data application development, data processing and exploration. Cascading users benefit from all the features and optimizations of QDS, including access to tables created in QDS and the ability to build custom SQL queries.

Supporting Quotes

“With an unparalleled understanding of the cloud, Qubole is addressing today’s challenges of storing and managing data to create a fast, easy and reliable path to turning Big Data into business insight. Because of the incredible popularity of Cascading, it made sense for us to partner with Concurrent. Together, Cascading and Qubole enhance developer productivity and provide operational value by helping customers build Big Data applications and derive insights without Hadoop expertise.”

-Ashish Thusoo, CEO of Qubole

“Our customers are looking for a Big Data platform that is cost effective and that can scale dynamically to meet their needs. Working together, Qubole and Concurrent can offer enterprises a cost-effective and comprehensive solution with best-in-class capabilities to drive business differentiation through data.”

-Gary Nakamura, CEO of Concurrent, Inc.

Supporting Resources

About Concurrent, Inc.

Concurrent, Inc. is the enterprise Big Data application platform company. Founded in 2008, Concurrent simplifies Big Data application development, deployment and management on Apache Hadoop. We are the company behind Cascading, the most widely used and deployed technology for building Big Data applications with more than 110,000 user downloads a month. Enterprises including Twitter, eBay, The Climate Corporation, Square and Etsy all rely on Concurrent’s technology to drive their Big Data deployments. Concurrent is headquartered in San Francisco. Visit Concurrent online athttp://concurrentinc.com.

About Qubole

Qubole is a leading Big Data as a Service provider. Ashish Thusoo and Joydeep Sen Sarma, creators of Facebook’s Big Data infrastructure and Apache Hive, founded Qubole to make it simple to prepare, integrate and explore Big Data in the cloud.

Qubole’s Big Data as a Service Solution, Qubole Data Service (QDS), runs on the top performing elastic Hadoop engine on the cloud and includes a library of data connectors. QDS makes it easy to inspect data, author and execute queries, and convert queries into scheduled jobs. With QDS, the power of Big Data meets the simplicity of the cloud.

The largest brands in social media, online advertising, entertainment, gaming and other data-intensive ventures trust QDS to handle their most challenging Big Data requirements at a fraction of the cost of alternatives. Customers include Pinterest, Quora, MediaMath, BigCommerce and many others.


Media Contacts
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Big Data: Interacting with Hadoop 2 Data Using Standard ANSI SQL

Big Data: Interacting with Hadoop 2 Data Using Standard ANSI SQL
Dick Weisinger, formtek
November 19, 2013
http://formtek.com/blog/big-data-interacting-with-hadoop-2-data-using-standard-ansi-sql/

Cascading is an Apache-licensed application framework for building rich data processing and machine learning applications that run on Hadoop. Cascading applications are built using a simple API that can be called from any JVM-based language. Over the past five years, it has been developed and supported commercially by Concurrent, Inc. The flow of Cascading is to first capture data from ‘sources’, to then pass that data through ‘pipes’ where it is processed, and finally to push the results into output files or ‘sinks’. This flow of data is known as the ‘source-pipe-sink’ paradigm. Cascading runs as an abstraction at a higher level than MapReduce, so that while Cascading applications ultimately execute MapReduce jobs, when the Cascading application is written, no explicit interactions with MapReduce need to be programmed.

Chris Wensel, Founder and CTO of Concurrent, said that “building applications on Hadoop, despite its growing adoption in the enterprise, is notoriously difficult. We are driving the future of application development and management on Hadoop, by allowing enterprises to quickly extract meaningful information from large amounts of distributed data and better understand the business implications. We make it easy for developers to build powerful data processing applications for Hadoop, without requiring months spent learning about the intricacies of MapReduce.”

More than 110,000 user downloads of Cascading are made every month.  Cascading is used by businesses like Twitter, eBay, The Climate Corporation, Square and Etsy for managing some or all of their Big Data requirements. In fact, all of Twitter’s revenue-generating applications have been built with Cascading.

Today, Cascading 2.5 is being introduced — a version-number jump from the previously available 2.2 point release. The jump in numbering was intended to emphasise the significance of some of the new features in the release. Most significantly, the 2.5 release will include support for Hadoop 2 and YARN.  Other highlights of the new 2.5 Cascading release include:

  • Performance improvements for complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS.
  • Broad compatibility with other Hadoop vendors and Hadoop as a service providers, including Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR

Coincident with the release of Cascading 2.5, Concurrent is also making another product, Cascading Lingual, generally available. The Lingual product is an add-on to Cascading that enables a complete ANSI-SQL interface for interacting with Hadoop data. Compatibility with standard SQL means that SQL developed in traditional relational databases can be brought over and used as-is within Lingual. Like Cascading, Lingual comes with the Apache 2.0 license.

Concurrent described the benefits of the Lingual product by saying that “Cascading Lingual provides out-of-the-box support for JDBC. Enterprises that have invested millions of dollars in business intelligence (BI) tools, such as Pentaho, Jaspersoft and Cognos, and training can now also access their data on Hadoop through standard SQL interface.”

André Kelpe, software engineer for Concurrent, summarized three of the design goals for the Cascading Lingual product:

  • Enable immediate ANSI SQL query access to data
  • Simplified System and Data Integration with read/writes from hdfs, jdbc, memcached, HBase, and redshift
  • Simplified migration of existing SQL within Cascading

A YouTube on-line demo of Cascading Lingual can be found here.

Concurrent Seeks to Lower Hadoop Adoption Barrier for SQL Coders

Concurrent Seeks to Lower Hadoop Adoption Barrier for SQL Coders
David Ramel, Application Development Trends Magazine
November 21, 2013
http://adtmag.com/articles/2013/11/21/concurrent.aspx

Concurrent Inc. this week released Cascading Lingual, an open source project designed to give developers a standards-based SQL interface for creating and working with Big Data applications on Apache Hadoop. It works with the company’s Cascading application framework used by Java developers for building Hadoop analytics and data management apps.

Concurrent also announced an upcoming upgrade of that framework, Cascading 2.5, which includes compatibility with the recent Hadoop 2 upgrade featuring long-awaited support for YARN.

Cascading Lingual joins the growing list of projects aimed at lowering the Hadoop adoption barrier by making Hadoop Big Data application development more accessible to SQL-savvy developers and integrating existing legacy systems with Hadoop applications. “Cascading Lingual enables virtually anyone familiar with SQL to instantly work with data stored on Hadoop using their JDBC-compliant BI [business intelligence] or desktop tool of choice,” the company said. “Enterprises benefit as they can execute on Big Data strategies using existing in-house resources, skills sets and product investments.”

Developers have the option of using standard Java JDBC interfaces to build apps or Concurrent’s own Cascading APIs that facilitate solutions built with ANSI-standard SQL and custom code written in Java, Scala or Clojure.

Along with the standards-based SQL support and JDBC driver, Cascading Lingual features an interactive SQL Shell command interface for interacting with Hadoop, for example through the execution of SQL commands. A Catalog command-line tool is provided for curating database tables that map to Hadoop files and other resources. Another key feature of Cascading Lingual is a Data Provider mechanism that allows for simultaneous data queries from multiple external data stores with just one SQL command, Concurrent said. Developers can just “cut and paste” existing SQL code to work with Hadoop data or migrate applications to Hadoop clusters.

The company said Cascading Lingual was created via collaboration between the developers of the Cascading Java API and developers of Optiq, a dynamic data management framework and SQL parser originally written by Julian Hyde, who also authored the Java-based Mondarian Online Analytical Processing (OLAP) engine.

The upcoming 2.5 release of the Cascading application framework upon which Lingual is based provides developers with the new capabilities of Hadoop 2. Released last month, Hadoop 2 supports YARN–sometimes referred to as “Yet Another Resource Negotiator”–a long-anticipated upgrade to the MapReduce job batch processing framework and core component of Hadoop. Concurrent said Cascading users will now be able to seamlessly upgrade their applications to Hadoop 2. “Furthermore, Big Data applications using domain specific languages (DSLs), such as the widely used Scalding (Scala on Cascading), Cascalog (Clojure on Cascading) and PyCascading (Jython on Cascading) languages, will also seamlessly migrate to Hadoop 2,” the company said.

Concurrent said the framework is used by companies such as Twitter, eBay and Trulia “to streamline data processing, data filtering and workflow optimization for large volumes of unstructured and semi-structured data.”

Cascading 2.5 reportedly will be available for download soon under the open source Apache 2.0 License Agreement. Cascading Lingual is available for download now.

Cascading 2.5 Supports Hadoop 2

Cascading 2.5 Supports Hadoop 2
Boris Lublinsky, InfoQ
November 19, 2013
http://www.infoq.com/news/2013/11/cascading

Despite a wide and growing adoption of Hadoop, enterprises are still facing the problem of finding the right approach for fast and cost effective development of Hadoop-based applications. One of the ways to achieve this goal is using Domain Specific Languages (DSL) that often allow for significant simplification of Hadoop implementations.

One of the most popular Java DSL on top of the low-level MapReduce API is Cascading. It was introduced in late 2007 as a DSL to implement functional programming for large scale data workflows. It is based on a “plumbing” metaphor to define data processing as a workflow built out of familiar elements: Pipes, Taps, Tuple Rows, Filters, Joins, Traps, etc.

Cascading introduced a new version of the product – Cascading 2.5 – this week, delivering support for Hadoop 2, including YARN. According to the company’s press release, this new release features:

  • Support for Hadoop 2 and its new features, including YARN. Cascading users looking to upgrade to Hadoop 2 will now be able to seamlessly migrate their applications and take advantage of new advance features like YARN.
  • Added performance improvements for complex join operations and optimizations to dynamically partition and store processed data more efficiently on HDFS.
  • Additional broad compatibility with other Hadoop vendors and Hadoop as a service providers, including Cloudera, Hortonworks, MapR, Intel, Altiscale, Qubole and Amazon EMR, among others give Cascading users a richer set of deployment options and services available to them, whether on-premise or in the cloud.

Simultaneously, Concurrent announced general availability of Cascading Lingual – an open source project that provides a comprehensive ANSI SQL interface for accessing Hadoop-based data. This project covers more than 7,000 SQL-99 statements derived from sophisticated industry standard OLAP tools, and according to Cascading:

delivers the broadest SQL coverage for any tool in the Hadoop ecosystem. It’s innovative by making Hadoop simple and accessible, and by providing easy systems integration for multiple data stores into Hadoop by using just one SQL statement.

InfoQ had a chance to discuss the latest Cascading release with Chris K Wensel, CTO and Founder of Concurrent, Inc.

InfoQ: When you are talking about supporting YARN in Cascading 2.5, what do you mean exactly – the fact that MapReduce code uses the YARN resource manager or are you actually leveraging YARN for creating a new application manager, one specific to Cascading?

Wensel: Cascading 2.5 implicitly supports YARN, meaning that because Cascading 2.5 supports Hadoop 2, YARN functionality will also be supported. Cascading does not actually leverage YARN for application development.

InfoQ: Do you have any plans for leveraging Apache Tez to further improve the performance of Cascading applications?

Wensel: Yes, we do. We have plans for Tez in our roadmap and will communicate those updates at the appropriate time.

InfoQ: Can you elaborate on optimizations and performance improvements for complex joins?

Wensel: We have updated the API to allow for more complex and custom join types. Cascalog, for example, would leverage this feature under some circumstances.

InfoQ: In your opinion, is the current emphasis on SQL-based processing limiting the spectrum of applications developed in Hadoop? In spite of all its power, SQL is good for solving only a certain class of problems, representing a limited subset of applications for which enterprises can leverage Hadoop.

Wensel: SQL allows the other 99% of developers, analysts, and legacy systems to leverage Hadoop. Yes, you may hit a wall with SQL, but 90% of most problems are reasonably expressed as SQL. Cascading allows you to choose your battles. A good question to ask is: who wants to write a bunch of Java to do something that is one line of SQL? And also, who wants to write hundreds of lines of SQL to do something best written and tested in Java? Cascading gives developers the flexibility.