Category Archives: News

Concurrent Tool Optimizes Hadoop Big Data App Performance

February 4, 2014NewsKIm Loughead

Concurrent Tool Optimizes Hadoop Big Data App Performance
Christopher Tozzi, The VAR Guy
February 4, 2014
http://thevarguy.com/big-data-technology-solutions-and-information/020414/concurrent-tool-opti

Another Big Data application designed to simplify data analysis using the open source Hadoop platform has hit the channel. Concurrent has announced the release of Driven, a free cloud service that it describes as “the industry’s first application performance management product for Big Data applications.”

Driven is designed to help developers and data analysts optimize the performance of Big Data applications by providing application metrics in real time. It also offers visualization features for diagnosing application problems as they occur. Insights such as these enable administrators to find inefficiencies and tweak performance, with the end result of shortening the time required to reap the results of Big Data operations.

Driven runs on top of Concurrent’s Big Data application framework, Cascading, which has 6,000 deployments and sees more than 130,000 downloads each month, according to the company. Cascading users can download the Driven plug-in for testing now from the platform’s website. Concurrent promises general availability of the product, along with “additional operations features and deployment options,” in Q2 of 2014.

The product is another example of both the maturation of the open source Hadoop Big Data platform, which now enjoys widespread adoption, and demand for solutions that simplify Hadoop deployment. It’s a natural addition to Concurrent’s suite of Hadoop products, which also includes a variety of programming interfaces that allow data analysts to use data languages they probably already know well—such as SQL in the case of Cascading Lingual, which Concurrent released in November 2013—to connect to Hadoop.

Concurrent is currently offering Driven, which is a cloud-based service, for no cost for development purposes. It will require a license for production operations.

Concurrent Offers Performance Management for Big Data Applications

February 4, 2014NewsKIm Loughead

Concurrent Offers Performance Management for Big Data Applications
Thor Olavsrud, CIO Magazine
February 4, 2014
http://www.cio.com/article/747673/Concurrent_Offers_Performance_Management_for_Big_Data_Applications

CIO — As big data applications move from development and proof of concept to production, the need for management and operations tools is becoming ever more pronounced. Enter big data application platform company, Concurrent, primary sponsor of the open source Cascading application framework.

Concurrent today announced Driven, which the company says is the first application performance management product for big data applications. Chris Wensel, author of the Cascading project and founder and CTO of Concurrent, says Driven is purpose-built to address the pain points of enterprise application development and application performance management on Apache Hadoop.

“Driven is a powerful step forward in delivering the full promise of connecting business with big data,” Wensel says. “Gone are the days when developers must dig through log files for clues to slow performance or failures of their data processing applications. The release of Driven further enables enterprise users to develop data-oriented applications on Apache Hadoop in a more collaborative, streamlined fashion. Driven is the key to unlock enterprises’ ability to drive differentiation through data. There’s a lot more to come—this is only the beginning.”

Governance and compliance tools are among the features on tap for the future, Wensel says.

The idea behind Driven is to give developers, data analysts, data scientists and operations the capability to see key application metrics in real-time and thus allow them to isolate and resolve problems quickly.

Driven Helps Visualize, Diagnose and Resolve Big Data Failures
Driven is essentially a plug-in for Cascading, which is an application that sits atop Hadoop and works with all the major Hadoop distributions. Once installed, Driven immediately begins collecting telemetry data from your running Cascading applications. That includes users of the popular domain-specific languages (DSL) built on Cascading, including Scalding (Scala DSL on Cascading), Cascalog (Clojure DSL on Cascading), Lingual (ANSI SQL on Cascading) and Pattern (Predictive Model Scoring on Cascading). The telemetry data gives Cascading users the ability to visualize their data applications, diagnose and quickly resolve application failures and performance problems.

Concurrent says Driven will allow developers and enterprises to achieve the following:

Accelerate time to market. Driven reduces the time to application production with process visualization and monitoring capabilities, allowing you to quickly understand complex applications and data flows by drilling down into each application at runtime using a rich user interface.
Build reliable applications. With detailed insight into your data processing logic and algorithms, you can ensure they are executing properly. Driven surfaces key application metrics around each data process to provide insights around data accuracy.
Optimize application performance. Driven allows you to understand the performance and capacity of the applications running on your infrastructure by providing key application behavior metrics like data skew and runtime parallelization. You can also compare this information with historical data to trend application performance, both in development and in production.

Driven represents a first for Concurrent in other ways as well. Until now, all of Concurrent’s output has been open source. But Driven is a proprietary product on top of Cascading, Wensel explains. It comes in two flavors: Driven and Driven Enterprise.

Driven, available now in public beta is a free cloud service for development environments only. Concurrent will provide online support for Driven. Driven Enterprise, which Wensel says will be generally available in the second quarter of 2014, will require an annual subscription. It is intended for both development and production environments and will support both developers and operations. Driven Enterprise will be available via on-premise and in the cloud, and Concurrent will provide enterprise support for the product.

“The product itself will be closed,” Wensel explains. “Driven is an open service and freely available for use, but you don’t get the source code. We feel this is the most natural, non-conflicting way to monetize open source.”

“We want the initial offering to be online so we can iterate and learn,” he adds. “We’re putting out a basic initial set of features. We have a long list of things we’re going to want to add.”

Concurrent Driven: Big data application performance management

February 4, 2014NewsKIm Loughead

Concurrent Driven: Big data application performance management
Dan Kusnetzky, ZDNet
February 4, 2014
http://www.zdnet.com/concurrent-driven-big-data-application-performance-management-7000025936/

Concurrent’s founder and CTO, Chris Wensel, and CEO, Gary Nakamura, stopped by to both introduce their company; their leading product, Cascading; and their new application performance management product, Driven, that is designed to take the guesswork out of finding performance problems for Hadoop-based big data applications.

What is Cascading?
I have to admit that I was not aware of Cascading until this conversation. After the discussion, I spent some time familiarizing myself with the open source project and what it can do. Here’s how Concurrent describes Cascading:

Cascading is a Java application framework that enables typical developers to quickly and easily develop rich Data Analytics and Data Management applications that can be deployed and managed across a variety of computing environments. Cascading works seamlessly with Apache Hadoop 1.0 and API compatible distributions.

I was impressed by the list of companies that use Cascading as part of their Big Data development efforts and by the fact that 75,000 copies of the software are downloaded monthly. Concurrent points out that over 6,000 data driven businesses, including Twitter, eBay, The Climate Corp and Etsy, use Cascading to develop Hadoop-based applications.

What is Driven?
Driven is an application performance management tool designed to help Cascading developers accelerate development, diagnose performance problems, and both manage and monitor Hadoop-based Big Data applications.

Concurrent’s focus
Concurrent points out that they are addressing three things:

Making Hadoop development straightforward enough that it can be a tool enterprises of all sizes can use to become data-driven companies. Without Cascading, many would have to resort to programming Big Data applications in assembler.
Offering tools that offer alerting and notifications for Big Data applications so that they can be folded into production environments.
Low cost tools (free for development, modest cost for production environments) that make it possible for companies to use Big Data.

Snapshot Analysis
Companies have accumulated huge amounts of operational, telemetric and point of sale data that could be the basis for a better and deeper understanding of their own operations and customers. Hadoop, while a very popular tool, can be challenging to a newcomer. Tools such as Cascading and Driven could certainly shorten the time it takes for developers to come up to speed and be productive with Hadoop.

Application Performance Management (APM) For Big Data

February 4, 2014NewsKIm Loughead

Application Performance Management (APM) For Big Data
Mike Matchett, Taneja Blog
February 4, 2014
http://tanejagroup.com/news/blog/blog-systems-and-technology/application-performance-management-apm-comes-to-hadoop

Concurrent, the folks behind Cascading, have today announced the beta of “Driven” – an Application Performance Management (APM) solution for Hadoop. APM has been sorely missing from the Hadoop ecosystem at a level in which developers, IT ops, and even end users can quickly get to the bottom of any issues.

First, if you don’t know about Cascading – one of the big impediments to using Hadoop on a wider scale is the need for programmers to work Map Reduce algorithms at a low, detailed, and often mind-bending level. Cascading is a popular open source package that provides a higher level of abstraction on top of Hadoop. Instead of working with mappers and reducers, you can work with more commonly understood higher-level objects like sources and sinks, functions, filters and joins. There are others of course, like Pig, but Cascading is designed for super reliability in production at scale.

Yet when anything goes wrong in Hadoop at scale, it can be hard to figure out (to say the least!). Especially in production when there are service level agreements in play which start the clock ticking on resolution. Downtime or degradation for big data “batch” apps and even more for the newer big data “streaming” apps, can cost big bucks. So Concurrent saw an opportunity to further leverage their app platform with a plug-in to feed detailed instrumentation into a management service.

Driven is at first a free-for-development cloud service that deploys quickly into any Cascading implementation and will easily help track and gain immediate insight into “enterprise-grade” apps. Commercial production and on-prem versions should be available once it gets into GA.

As a service, Driven monitors all the running apps and processes, and tracks successes and failures with all the expected alerts and notifications. Visualizations are interesting maps with a detailed “high fideltiy” view into the app components, highlighted of course where failures occur, with direct drill down into exceptions and stack traces. This is just a first cut though, lots of advanced analysis could easily be layered on later.

According to Concurrent, the community collaboration will be a key value for folks too, with Driven being integrated into the Cascading community web site.

While “free-for-development” Driven seems a no-brainer for Cascading developers, and should help to drive Cascading adoption even faster, we expect the big impact of Driven’s APM to really be with IT and Dev operations folks who have to manage large data processing solutions in production. APM is sorely needed in the big data for enterprise production, and we think Driven has a good chance of expanding over time to give Concurrent a real foothold in the enterprise management space. And on the backend Concurrent will gain instant broad and deep visibility into how the platform is doing in the field, helping improve Cascading.

Concurrent Releases Driven, First Big Data Application Performance Management Solution

February 4, 2014NewsKIm Loughead

Concurrent Releases Driven, First Big Data Application Performance Management Solution
Arnal Dayaratna, Ph.D., Cloud Computing Today
February 4, 2014
http://cloud-computing-today.com/2014/02/04/845844/

Concurrent Inc., the primary sponsor behind Cascading, today announces the release of Driven, an application performance management solution for Big Data applications. Driven enables developers to quickly identify and remediate application failures and performance issues specific to applications built using Hadoop. Available as a plug-in for the Cascading infrastructure, Driven solves a key problem in the Hadoop industry related to the management of Hadoop-based applications. The use of Driven allows developers to confirm the successful execution of application jobs and data processing algorithms, in addition to facilitating the optimization of application performance. Developers can monitor and trend application metrics such as runtime parallelization for both operational and R&D purposes. Moreover, because Driven is part of the Java-based Cascading framework for building analytics and data management applications on Apache Hadoop, Driven users can take advantage of Cascading’s collaboration functionality to communicate with Driven communities all over the world.

Chris Wensel, founder and CTO, Concurrent, Inc., remarked on the significance of Driven as follows:

Driven is a powerful step forward in delivering on the full promise of connecting business with Big Data. Gone are the days when developers must dig through log files for clues to slow performance or failures of their data processing applications. The release of Driven further enables enterprise users to develop data oriented applications on Apache Hadoop in a more collaborative, streamlined fashion. Driven is the key to unlock enterprises’ ability to drive differentiation through data. There’s a lot more to come – this is only the beginning.

Here, Wensel notes the way in which Driven responds to the opacity of Hadoop by providing developers with an alternative to sloughing through volumes of log files to understand the performance of their applications. Concurrent CEO Gary Nakamura elaborated on Wensel’s remarks by noting that “One of the big problems in Hadoop today is it’s just a black box,” and that Driven provides a way to expeditiously navigate to lines of code that are responsible for application failure. Because of its positioning as part of the Cascading infrastructure, Driven stands to significantly enhance the value of Cascading by providing developers with an extra layer of insight into application performance that complements Cascading’s indigenous framework for big data analytics and data management. Expect Driven to vault the status of Cascading within the Big Data industry even further and ultimately confirm its place as the go to application for Hadoop analytics, data and application management. Driven is currently available in public Beta whereas its commercial variant, Driven Enterprise, will be available in Q2 via an annual subscription.

Shining a Light on Hadoop’s ‘Black Box’ Runtime

February 4, 2014NewsKIm Loughead

Shining a Light on Hadoop’s ‘Black Box’ Runtime
Alex Woodle, Datanami
February 4, 2014
http://www.datanami.com/datanami/2014-02-04/shining_a_light_on_hadoop_s_black_box_runtime.html

Let’s face it: Writing MapReduce processes is not very fun. That’s the main reason that the Cascading framework is gaining such a big following–because it abstracts away the difficult part of MapReduce with an easy-to-use Java API and library. With today’s launch of a new product called Driven, the company behind Cascading is enabling users to instrument the data analytic apps developed with Cascading, in pursuit of faster troubleshooting and higher performance.

There is some serious momentum building up behind Cascading. According to Concurrent–the commercial open source company founded by Cascading creator Chris Wensel to sell support for Cascading–the open source framework is being downloaded 130,000 per month. What’s more, 6,000+ companies have deployed Cascading-built applications on production Hadoop clusters, including big names like Twitter, Kohl’s, and Nokia.

The way that Cascading allows mortal Java developers with average skills to build MapReduce-based applications that would normally require a super Java coder to construct has made Cascading a staple component of many Hadoop projects. “Being a Java API, the average Java developer can use it,” Wensel tells Datanami. “They can write tests and use their IDE. But also more importantly, they can think about the problem at hand and they don’t have to think in terms of MapReduce, MapReduce, MapReduce.”

While Cascading has helped many organizations build data analytic apps that run on Hadoop, the framework doesn’t address the overall lack of visibility into the inner-workings of Hadoop apps once they’re placed into production.

“One of the big problems in Hadoop today is it’s just a black box,” says Concurrent CEO Gary Nakamura. “Most people today deploy their applications and pray. What we’re doing [with Driven] is providing the visibility so you can actually see what’s going on, and if there’s a failure, we’ll take you to the exact spot that failure happened, so a developer can try and figure out what to do.”

From its GUI, Driven will show users exceptions and track traces in their Hadoop app, and track all the filters, joins, and other functions that are taking place within the software. “You’ll be able to see all of the details in your data application, the units of work and how it all ties together,” Nakamura says. “You’ll be able to see them in real-time, running on Hadoop, and see how your application is progressing.”

Nakamura says the software will help users, operators, and developers collaborate on improving their Hadoop applications–not only with broken apps, but with the working apps that could use a little optimization.

“We expect Driven to provide the capability to build more reliable applications,” he says. “Developers and operators will be able to look at those things and say, ‘Hmmm, we should have somebody take a look at this because everything else takes 5 minutes and this takes 25 minutes. We ought to be able to optimize that down to something more reasonable.'”

As Cascading grows in use, so will Driven, Nakamura syas. “The next version [3.0] of Cascading will support other fabrics like Spark, Storm, and Tez,” he says. “So that means applications that have been built using Cascading will be portable across the supported frameworks, across the supported fabrics.” As these Cascading-developed applications start moving to different fabrics, Driven will follow and provide the same type of troubleshooting and optimization capabilities.

The first release of Driven will focus on helping developers monitor, debug, and set alerts on their Hadoop apps. Later, Concurrent will add more operational capabilities to their Hadoop jobs, including setting service level agreements (SLAs), enforcing quality of service (QoS), and ensuring the integrity of data lineage.

The first beta release of Driven is available now as a cloud service. The service is free for development use. A paid enterprise version is in the works that will support production use and be installable on-premise; it’s expected in the second quarter.

Driven supports Cascading version 2.08 (it’s currently at version 2.5) and includes popular domain specific languages like Lingual (ANSI SQL), Pattern (PMML), Scalding (Scala), and Cascalog (Clojure).

Why Hadoop Only Solves a Third of the Growing Pains for Big Data

January 27, 2014NewsKIm Loughead

Why Hadoop Only Solves a Third of the Growing Pains for Big Data
Gary Nakamura, CEO, Concurrent, Inc.
January 27, 2014
http://insights.wired.com/profiles/blogs/why-hadoop-only-solves-a-third-of-the-growing-pains-for-big-data

Apache Hadoop has addressed two of the growing pains organizations face as they attempt to make sense of larger and larger sets of data in order to out-innovate their competition, but four more remain unaddressed. The lesson: Just because you can avoid designing an end-to-end data supply chain when you start storing data doesn’t mean you should. Architecture matters. Having a plan to reduce the cost of getting answers and simultaneously scale its utility to the broader organization means adding new elements to Hadoop. Fortunately, the market is addressing this need.

The Need for a Single Repository for Big Data

Traditional databases were not the repository we needed for big data, 80 percent of which is unstructured. Hadoop offered us, for the first time, the ability to keep all the data in a single repository, addressing the first big data growing pain. Hadoop is becoming a bit bucket that can store absolutely everything: tabular data, machine data, documents, whatever. In most ways, this is a great thing because data becomes more valuable when it is combined with other data, just like an alloy of two metals can create a substance that is stronger and more resilient. Having lots of different types of data in one repository is a huge long-term win.

No Standard Way to Create Applications to Leverage Big Data

Once we have all the big data in a single repository, the next trick is to create applications that leverage that data. Here again Hadoop has done a fine, if complicated, job. MapReduce provides the plumbing for applications that can benefit from a parallel divide-and-conquer strategy to analyze and distill data spread over the massive Hadoop File System (HDFS). YARN, a result of the refactoring of Hadoop 1 introduced in Hadoop 2, formalizes resource management APIs and allows more parallelization models to be used against a given cluster. This is all great, but it leaves one huge problem: Most of the people who have questions have no way to access the data in Hadoop. A financial analyst at Thomson Reuters or a buyer at Bloomingdales wants answers that their data can provide, but in the main these folks don’t know how to write programs that access Hadoop; it’s not their expertise.

What this leaves us with is a situation that is mighty familiar to people who have worked in large organizations that have implemented traditional Business Intelligence stacks. Questions are plentiful and programs that can help answer those questions are scarce. Pretty soon, there’s a huge backlog that only programmers can help with. All that wonderful stored data is longing to answer questions. All those analysts and businesspeople want answers. The complexity of creating applications hinders progress.

Most companies get stuck at this point, which I call the “dark valley of Hadoop.” The way out is to find a way to address the remaining growing pains.

The Gap between Analysts and Big Data

The third growing pain is to close the gap between the analysts and the data. Splunk’s Hunk is perhaps the most promising technology to deliver a true interactive experience. Especially powerful are Splunk’s capabilities for discovering the structure of machine data and other unstructured data on the fly. My view is that Drill and Impala are addressing a similar need. But it is a mistake to think of this interactivity as closing the entire gap because it’s really the missing productivity that must be addressed.

To accelerate the pace of creating Hadoop applications, comfortable and powerful abstractions must be delivered to power users, analysts, and developers that allow them to express the work of the application at as high a level as possible. For that abstraction to work, it must mean that you can create applications without being a Hadoop expert. For power users and analysts, domain-specific languages like Cascalog allow an application to be built by expressing a series of constraints on the data. Concurrent’s Lingual project allows applications to be expressed as SQL and then translated into Hadoop jobs. For developers, Concurrent’s Cascading library allows the application to be expressed in terms of APIs that hide the details of Hadoop from power users. In addition to Hunk, Pentaho, Alpine Data, Teradata Aster, and Pivotal all offer higher level ways to design and build and applications. Effective higher level abstract methods are crucial to productivity.

In other words, the way to address this growing pain is to hide the complexity of Hadoop so that analysts can get work done without having to become Hadoop experts.

Problems in Processing Big Data

Big data is almost always turned from a raw metal to a valuable alloy through a process that involves many steps. To continue the metaphor, raw metal must be refined before it is forged. This is costly and it is where Hadoop excels. Analysts can ask and answer certain questions using an interactive system, but the data must be cleansed beforehand, resulting in a complex upstream workflow. Typically these workflows are built in a brittle fashion and are difficult to test and debug. Most frequently, the workflow itself is applied to a new dataset where the results are a machine-learning model or a set of analytics. Any benefit to loading and subsequent indexing in interactive tools is lost in these cases.

The fourth growing pain is the ability to manage the execution of a cascading chain of workloads or applications. This can be tricky for many reasons. If you cobble together lots of applications in a complex chain using a wildly varying set of tools within arm’s reach, something data scientists love to do, you end up with a complicated workflow, or using a better phrase, a business data process. If something goes wrong in one of the intermediate applications, it is often impossible to figure out how to fix the process. You have to start the whole thing over. Tools such as Pig, Sqoop, and Oozie offer alternative ways to express the problem, but ultimately do not fix the underlying issue.
The right way to manage the cascading chain of data-oriented applications is, again, to hide the problem through abstraction. The higher level abstractions mentioned in growing pain 3 should be generating the complex of Hadoop jobs and managing their execution. Then through indirection it’s okay to combine a number of Hadoop applications or workloads in a chain, but when your business process depends on dozens of steps, all knit together by hand, something is bound to go wrong.

One Size Does Not Fit All

Today Hadoop is virtually synonymous with big data. In reality, Hadoop is not all things to all people nor is it all things to big data. Business requirements for more timely insight have introduced lower latency requirements as well as newer, more immature technologies to solve these problems and glean insight from sources such as real-time streaming data. As they face these requirements, users are cast back into the darkness, trying to once again make sense of when they should use these new tools as well as how they should glue them together in a comprehensive way to form a coherent big data architecture.

From Data for Business to the Business of Data

Looking down the road a bit, there is a progression in maturity for using data. Many organizations derive business insights from their data. Once you see that your data is actually transformative to your business, you have hit a new level of maturity. More and more organizations are realizing that they are actually in the business of data. At this point, the growing pain becomes acute, particularly if you see in horror that the tools you have chosen to assemble your data workflows are nothing but a house of cards.

The most sophisticated companies using big data, and you know who I mean, have been aggressive about solving all these growing pains. These companies, like yours, don’t want talented people struggling with Hadoop. Instead, they want them focused on the alchemy of data, seeing what the data has to say and putting those insights to work, and inserting them into their business processes.

The masters of big data have created a coherent architecture that allows the integration of new tools. They have created robust big data workflows and applications that support the transition from deriving insights from big data to being in the business of data. That’s the end game, the real value from big data that makes all of the growing pains worthwhile.

What to Expect from Big Data in 2014

January 21, 2014Newsadmin

What to Expect from Big Data in 2014
Nick Kolakowski, Slashdot
December 30, 2013
http://slashdot.org/topic/bi/what-to-expect-from-big-data-in-2014/

Hadoop will continue to reign; more clients will demand the latest in encryption; and a lot of analytics firms could fail.

As 2013 stumbles across the finish line, technology publications are busy issuing their inevitable Year’s End lists, in which the Best and Worst of everything is carefully delineated (BlackBerry’s failing! Apple continued to produce shiny things! Ballmer’s leaving Microsoft!). And that’s all well and good, but Big Data is all about the future—and with that in mind, here are some mild predictions for 2014:

Hadoop’s Reign Continues

Apache Hadoop, the open-source framework for running data applications on large hardware clusters, enjoyed considerable momentum in 2013: not only did Hadoop 2.2.0 reach its general-release milestone (improvements included more integration with other open-source projects, as well as YARN, a general-purpose resource management system that makes it easier to work with MapReduce and other frameworks), but a variety of prominent firms released their own distributions of the platform—all but ensuring Hadoop will rumble strong through 2014.

But continuing popularity or no, not every company will use Hadoop effectively. “More Hadoop projects will be swept under the rug as businesses devote major resources to their Big Data projects before doing their due diligence, which results in a costly, disillusioning project failure,” Gary Nakamura, CEO of Concurrent, wrote in a December blog posting. “We may not hear about most of the failures, of course, but the successes will clearly demonstrate the importance of using the right tools.”

For those companies that can accurately assess whether Hadoop is the right tool for their business needs, the improvements to the platform could translate into more effective data-crunching—and better insights. But Hadoop’s continued prominence raises still another issue.

Open-Source vs. Proprietary

Open-source has long remained a double-edged sword for firms that build analytics software. Back in Ye Olden Days of mid-2012, for example, research firm IDC suggested that the availability of open-source solutions could very well hamper the ability of proprietary software platforms to fully capitalize the analytics space. “The Hadoop and MapReduce market will likely develop along the lines established by the development of the Linux ecosystem,” Dan Vesset, vice president of Business Analytics Solutions for IDC, wrote in a statement at the time. “Over the next decade, much of the revenue will be accrued by hardware, applications, and application development and deployment software vendors.”

Open-source’s strength in the analytics space hasn’t dissuaded a broad cross-section of tech giants and startups from building and releasing proprietary software into the ecosystem. But those companies’ solutions will need to push back against open-source—and rival platforms—in order to scratch out any sort of market-share and profit; a good number of them will fail, including several companies that were teetering on the brink in 2013.

Data Encryption

Over the summer, government-contractor-turned-whistleblower Edward Snowden revealed the existence of massive National Security Agency (NSA) programs designed to harvest vast amounts of online data. Those revelations, in turn, have sparked everything from legal battles to promises of government review.

Whether or not the NSA actually rolls back its programs, the newfound awareness of widespread online surveillance will drive more companies to encrypt all their data—and not only in storage, but also in movement from Point A to B. That will spark a rise in costs (encryption is expensive, after all), and perhaps slow some everyday processes down (as encryption adds another layer to most operations)—but with more and more clients demanding the latest in privacy and security, vendors may have no choice but to armor their infrastructure as extensively as possible.
Ease of Use…

Data analytics tools: they’re not just for data scientists anymore. Over the next year, the trend of making those tools easier to use for people with relatively little analytics knowledge will continue, with simplified dashboards and greater integration of multiple types of data.

…But Issues Remain

Facebook and Netflix wouldn’t be operational without custom data infrastructure painstakingly developed by their in-house data scientists; IBM is using its Watson supercomputing platform to assist in everything from retail to healthcare IT; dozens of firms use analytics to improve everything from shipping and logistics to customer service.

But for many firms, the jury’s still out about the ultimate effectiveness of analytics. Back in August, The New York Times asked if Big Data was “an economic big dud,” because U.S. economic productivity had slowed over the past few years despite a massive growth in the amount of data held by individuals and corporations. That might have been overstating things—many analytics tools have only begun to penetrate the market in a meaningful way, suggesting that only a small percentage of firms are leveraging their datasets to fullest effect—but it’s undeniable that, when it comes to various industries, analytics still has something to prove. 2014 might be the year that Big Data actually makes more of a name for itself.

Cascading Enterprise Developer Training

January 14, 2014Newsadmin

Due to popular demand, we’re happy to announce the availability of our Cascading Enterprise Developer Training course. For those of you that are just getting started with Cascading or those who want to take their Cascading chops to the next level, this course was designed for you.

This comprehensive course covers topics ranging from basic to advanced Cascading concepts and best practices, all reinforced by hands-on labs.

Learn more here at: http://www.cascading.io/services/training

Big Data In 2014: 6 Bold Predictions

December 23, 2013Newsadmin

Big Data In 2014: 6 Bold Predictions
Jeff Bertolucci, Information Week
December 23, 2013
http://www.informationweek.com/big-data/big-data-analytics/big-data-in-2014-6-bold-predictions/d/d-id/1113091

‘Tis the season when temperatures tumble, shoppers stumble, and prognosticators fumble, often. Will these big data prophecies come true?

How will big data evolve in 2014? The future is anyone’s guess, of course, but we thought we’d compile a tasty holiday assortment of prognostications from executives working in the big data trenches. So without further delay, here they are — six big data predictions for next year:

1. “More Hadoop projects will fail than succeed.”
That scary assessment is from Gary Nakamura, CEO of Concurrent, a big data application platform company. In a December 12 blog post, Nakamura made a few 2014 forecasts, including this not-so-rosy assessment of Hadoop:

“More Hadoop projects will be swept under the rug as businesses devote major resources to their big data projects before doing their due diligence, which results in a costly, disillusioning project failure. We may not hear about most of the failures, of course, but the successes will clearly demonstrate the importance of using the right tools. The right big data toolkit will enable organizations to easily carry forward the success of these projects, as well as further insert their value into their business processes for market advantage.”

2. Enterprises will focus less on big data and more on stepping up their data management game.
“There’s no doubt that companies’ pursuits of big data initiatives have the best intentions to improve operational decision making across the enterprise. That being said, companies shouldn’t get stuck on the term ‘big data.’ The true initiative and what they ultimately need to be concerned with is how they’re implementing better data management practices that account for the variety and complexity of the data being acquired for analysis,” Scott Schlesinger, a senior vice president for consulting and outsourcing giant Capgemini, told InformationWeek via email.

3. The pace of big data innovation in the open-source community will accelerate in 2014.
“New open-source projects like Hadoop 2.0 and YARN, as the next-generation Hadoop resource manager, will make the Hadoop infrastructure more interactive. New open-source projects like STORM, a streaming communications protocol, will enable more real-time, on-demand blending of information in the big data ecosystem,” wrote Quentin Gallivan, CEO of business analytics software firm Pentaho, in a December 5 blog post.

4. The need for automated tools will become increasingly critical.
“It seems that the more data we have, the more we want,” John Joseph, VP of product marketing for analytics software firm Lavastorm Analytics told InformationWeek via email. “But as data volumes increase, the need for pattern matching, simulation, and predictive analytics technologies become more crucial. Engines that can automatically sift through the growing mass of data, identify issues or opportunities, and even take automated action to capitalize on those findings will be a necessity.”

5. Beware, Oracle! 2014 will be the year of SQL on Hadoop.
“I think you’ll see people start building interactive applications on the Hadoop infrastructure. And what I mean by that — and I think this is probably the most controversial thing — is that people will start replacing their first-generation relational databases with SQL on Hadoop,” said Monte Zweben, CEO of SQL-on-Hadoop database startup Splice Machine, in a phone interview with InformationWeek.

6. Big data flies to the cloud.
“Big data has gained a lot of traction in 2013 but complex technologies are keeping many businesses from getting their solutions into production and generating a positive ROI. In 2014, businesses will look beyond the hype and turn to cloud solutions that generate fast time to value and do not require highly specialized dedicated skill sets, like Hadoop, to manage. 2014 will be the year that big data moves from buzzword to business imperative,” said Sandy Steier, cofounder and CEO of cloud-based analytics firm 1010data, via email.