All posts by KIm Loughead

EPAM Partners With Concurrent Inc. to Leverage Big Data for the Enterprise

Newtown, PA — February 28, 2014 — EPAM Systems, Inc. (NYSE:EPAM), a leading provider of complex software engineering solutions and a leader in Central and Eastern European IT service delivery, announced its partnership with Concurrent, Inc., the enterprise Big Data application platform company and developer of Cascading, the most widely used and deployed application platform for building, deploying and managing robust Big Data applications.

With more than 6 years of supporting large-scale Big Data initiatives for the enterprise, Cascading is a widely deployed platform for scalable Big Data applications that opens data assets to more users within an organization, including data scientists, analysts and engineers. Leveraged by EPAM, the Cascading framework will help companies enable faster time-to-market and deliver repeatable solutions based on a standard infrastructure and flexible support of all popular Hadoop distributions. With its deep expertise in software development and large number of talented engineers, data scientists and solution architects, EPAM is ideally positioned for helping its customers gain immediate value from their growing data assets.

“We rely on EPAM’s extensive pool of software engineers and architects who are continuously trained to be experts in delivering Big Data solutions,” says Gary Nakamura, CEO, Concurrent, Inc. “Together, Concurrent and EPAM will jointly contribute to our customers’ success in Big Data projects with powerful resources.”

“We recognize Concurrent’s continuing innovation in Big Data and Data Science and the value their technology brings to the industry,” states Sam Rehman, Chief Technology Officer, EPAM Systems. “Concurrent products are a great addition to the toolkit of EPAM engineers, enriching the data solutions we provide to our customers. Our clients often ask us for help in transitioning to the new technology stack, and Cascading simplifies that learning curve very effectively.”

About Concurrent Inc.

Concurrent, Inc. delivers the #1 application development platform for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data applications at scale on Apache Hadoop™.

Concurrent is the team behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 130,000+ user downloads a month. Used by thousands of businesses including Twitter, eBay, The Climate Corp and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

About EPAM Systems

Established in 1993, EPAM Systems, Inc. (NYSE: EPAM), provides complex software engineering solutions through its award-winning Central and Eastern European service delivery platform. Headquartered in the United States, EPAM employs approximately 9,300 IT professionals and serves clients worldwide from its locations in the United States, Canada, UK, Switzerland, Germany, Sweden, Netherlands, Singapore, Belarus, Hungary, Russia, Ukraine, Kazakhstan, Poland, Hong Kong and Australia.

EPAM is ranked #2 on the 2013 Forbes list of America’s Best Small Companies: 20 Fast-Growing Tech Stars and is recognized among the leaders in software product development services by Forrester and Zinnov analysts. The company is also included in the top 30 in IAOP’s “The 2013 Global Outsourcing 100” list.

For more information, please visit www.epam.com.

Forward-Looking Statements

This press release includes statements which may constitute forward-looking statements made pursuant to the safe harbor provisions of the Private Securities Litigation Reform Act of 1995, the accuracy of which are necessarily subject to risks, uncertainties, and assumptions as to future events that may not prove to be accurate. Factors that could cause actual results to differ materially from those expressed or implied include general economic conditions and the factors discussed in our most recent Annual Report on Form 10-K and other filings with the Securities and Exchange Commission. EPAM undertakes no obligation to update or revise any forward-looking statements, whether as a result of new information, future events, or otherwise, except as may be required under applicable securities law.

Media Contact

EPAM Systems, Inc.
Joseph King, Vice President of Marketing
Phone: +1-267-759-9000 x56790
Fax: +1-267-759-8989
prinquiry@epam.com

What Is Data Flow and Why Should You Care?

What Is Data Flow and Why Should You Care?
Eric Kavanagh, Inside Analysis
February 24, 2014
http://insideanalysis.com/2014/02/what-is-data-flow-and-why-should-you-care/

What goes around surely comes back around, which in the world of data is often called lifecycle management. To be blunt, very few organizations have ever formalized and implemented such a grandiose practice, but that’s not a pejorative statement, for only until recently has the concept become seriously doable without great expense.

Lifecycle management means following data from cradle to grave, or more precisely, from acquisition through integration, transformation, aggregation, crunching, archiving, and ultimately (if ever) deletion. That last leg is often a real kicker, and has entered the spotlight largely in the context of eDiscovery, which tends to be discussed in the legal arena – too much old data lying around can become a definite liability if some lawyer can use it against you.

But there’s a new, much more granular version of lifecycle management circulating these days, and it’s described by Dr. Robin Bloor as Data Flow. In fact, he even talks of a data flow architecture which can be leveraged to get value from data long before it ever enters a data warehouse (if it ever even does). Data Flow in this context can be a really big deal, because it can deliver immediate value without ever beating at the door of a data warehouse.

Data streams embody one of the hotter trends in data flow. Streams are essentially live feeds of data spinning out of various systems of all kinds – air traffic control data, stock ticker data, and a vast array of machine-generated data, whether in manufacturing, healthcare, Web traffic, you name it. Several innovative vendors are focused intently on data streams these days, such as IBM, SQLstream, Vitria, Extrahop Networks and others. The use cases typically revolve around the growing field of Operational Intelligence.

Data Flow Oriented Vendors

Finding ways to effectively visualize data flows can be a real treat. Some of the most talked-about vendors these days have worked to provide windows into the world of data flow – or at least basic systems management – primarily using so-called big data. Both Cloudera and Hortonworks have built their enterprises on the shoulders of Apache Hadoop, the powerful if somewhat cryptic engine that has the entire world of enterprise software in a genuine tizzy.

But there are a few other vendors who have excelled in the domain of providing detailed visibility into how data flows in certain contexts. The first that comes to mind is Concurrent, which just unveiled their Driven product. This offering is almost like the enterprise data version of a glass-encased ant farm. Remember those from childhood days? You could actually watch the ants build their tunnels, take care of business, cruise all around, get things done. For Driven, this is systems management 3.0 – you can actually see how the data moves through applications, where things go awry, and thus fine-tune your architecture.

Another vendor that talks all about data flow is Actian. Formerly known as Ingres, the recently renamed vendor went on an acquisition spree in recent times, folding ParAccel, Pervasive Software and Versant into its platform data-oriented portfolio of products. Mike Hoskins, once the CTO of Pervasive is now the CTO, of Actian and can be credited years ago with having the vision to build a parallel data flow platform which originally went by the name of DataRush but is now simply referred to as Actian DataFlow. Actian’s view of the Big Data landscape involves Hadoop as a natural data collection vehicle and its DataFlow product as a means either of processing Hadoop data in situ or flowing it (also employing its data integration products) to an appropriate data engine of which it has several, including Matrix (a scale-out analytical engine once known as ParAccel) and Vector (a scale-up engine).

And then there’s Alpine Data Labs, a cloud-based solution that offers a data science workbench. The collection of Alpine offerings, some of which derive from Greenplum Chorus, provide a wide range of functionality for doing all things data: modeling, visualizing, crunching, analyzing. And when you push the big red button to make the magic happen, you get a neat visual display of where the process is at any given point. This is both functional and didactic, helping aspiring data scientists better understand what’s happening where.

Like almost all data management vendors, Alpine touts a user-friendly, self-service environment. That said, the “self” who serves in such an atmosphere needs to be a very savvy information executive, someone who understands a fair amount about all the nuts and bolts of data lifecycle management. And though Alpine also talks of no data movement, what they really mean by that is data movement in the old ETL-around-the-mulberry bush sense. You still need to move data into the cloud, and set your update schedule, which incidentally runs via REST.

Of course, in a certain sense, there’s not too much new under the sun. After all, data flow happened in the very earliest information systems, even in the punch card era. But these days, the visibility into that movement will provide game-changing awareness of what data is, does, and can be.

Concurrent aims to smash the Hadoop “black box”

Concurrent aims to smash the Hadoop “black box”
Lucy Carey, JAXenter
February 20, 2014
http://jaxenter.com/concurrent-aims-to-smash-the-hadoop-black-box-49506.html

Traditionally, one of the biggest productivity snags for app developers has been the Hadoop “black box” scenario, which makes it difficult to decipher what’s actually going on inside their project. Stepping up to solve this problem are Concurrent, with a (currently beta) solution called Driven, billed as “the industry’s first application performance management product” for Big Data apps.

Concurrent are probably best known for Cascading – a highly extensible Java app framework which makes it quick and easy for devs to tool rich Data Analytics and Data Management apps to deploy and manage across diverse computing environments. CEO Gary Nakamura describes it primarily as, “a tool that does the heavy lifting and converts your application logic built on Cascading into MapReduce jobs so that the developer doesn’t have to deal with the low level assembly language of Hadoop.”

On top of this, it delivers a computation engine, systems integration, data processing and scheduling capabilities through common interfaces, and runs on all popular Hadoop distributions – though it’s also capable of extending beyond the elephant ecosphere.

Regarding the topic of Hadoop, although it may have lost some of its lustre in recent months, Gary is positive about the future of the technology, believing that the dimming of the “buzz” around it is a positive.

According to Gary, “Hadoop is maturing, and the user behavior is maturing along with it. Enterprises are taking a more pragmatic approach to formulating their data strategy and looking for the right tools to ensure success.” Whilst the Hadoop ecosystem is still quite “convoluted and confusing”, he belives that “those that have taken a deliberate and informed approach are the ones that are succeeding.”

He adds that, “enterprises have moved on to building data applications on their Hadoop investments and are driving business process and strategy through these data applications.” It’s this new wealth of data and applications that will underpin innovation and business advantage going forward, he affirms, “not necessarily the fabric that an application runs on.”

For this reason, Concurrent are confident that 2014 is the year for enterprise data applications to come to the fore, as Hadoop-based apps become “business critical and essential for businesses to move forward.”

Cascading is well placed for this scenario, with a number of differentiators from rival offerings to give devs that all important productivity edge when building robust data applications.

As Gary puts it, “With Cascading, instead of becoming an expert in MapReduce, you can now leverage existing skill sets such as enterprise Java, Scala, SQL, R, etc.” Moreover, he says, Cascading’s local mode enables test-driven development practices where developers can efficiently test code and processes on their laptops before deploying on a Hadoop cluster.

On to the newer offering – Driven, which was inspired by “years of painstaking experience with building enterprise data applications.” Its raison d’être is to make the process of developing, debugging and managing Cascading apps that bit more painless, as well as allowing for easy management of production data applications.

It’s been a long time in the making, with the key challenge centering on waiting for the market to mature enough for it to be viable product, as opposed to any specific development challenges. For Concurrent, the magic time for “broad applicability” of Driven has now come.

Gary cites the key “Driven difference” as being a significantly faster timeline – up to ten times, in fact – for enterprise data applications to reach production. With Driven’s real time app visualisation, instant app performance analytics, and well as data management and monitoring capabilities, Gary thinks that users will find they have “unprecedented visibility into your applications that don’t exist in the Hadoop ecosystem today.” And, most crucially, circumventing the aforementioned “black box” scenario.

Although Concurrent doesn’t currently see any direct competitors for their tools, there are “indirect entities” around which they compete – for example, the open-source tools in Apache Hadoop ecosystem.

The majority of Cascading’s popularity to date is something that Gary puts down to the “attrition” of developers using Pig and Hive. For now, the hope is that devs will ultimately adopt Cascading and Drive (once it moves out of beta status) as a tandem productivity enhancement solution, continuing this upward trajectory.

Event | Concurrent, Inc. to Present at DeveloperWeek San Francisco 2014

Concurrent’s Alexis Roos to Deliver Session on “Pattern: An Open Source Project for Creating Complex Machine Learning Applications”

SAN FRANCISCO – Feb. 13, 2014Concurrent, Inc., the enterprise Big Data application platform company, today announced that Alexis Roos, senior solutions architect, will deliver a talk, titled “Pattern: An Open Source Project for Creating Complex Machine Learning Applications,” at DeveloperWeek San Francisco. Taking place Feb. 15-21, the conference is the first developer event that brings together thousands of developers to explore, learn and build new skills, applications, startups and product features.

Details At-A-Glance

What:  “Pattern: An Open Source Project for Creating Complex Machine Learning Applications”
Who: Alexis Roos, senior solutions architect, Concurrent, Inc.
When: Tuesday, Feb. 18 at 10 a.m. PST
Where: Workshop Room 2, Terra Gallery, 511 Harrison St., San Francisco
How: Register at http://developerweek.com/register/

Session Description

Cascading Pattern is an open source project that takes models trained in popular analytics frameworks, such as SAS, Microstrategy, SQL Server, etc., and runs them at scale on Apache Hadoop. With Pattern, developers can use a Java API to create complex machine learning applications, such as recommenders or fraud detection. Pattern effectively lowers the barrier of adoption to Apache Hadoop for developers because developers can use existing skill sets to immediately begin building these complex applications.

In this presentation, Concurrent, Inc.’s Alexis Roos, will provide sample code that will show applications using predictive models built in SAS and R, such as anti-fraud classifiers. Additionally, Alexis will compare variations of models for enterprise-class customer experiments.

Extended abstract available at: http://sched.co/1fuIriN

About the Speaker

Alexis Roos is a senior solutions architect focusing on Big Data solutions at Concurrent, Inc. He has more than 18 years of experience in software and sales engineering, helping both Fortune 500 firms and start-ups build new products that leverage Big Data, application infrastructure, security, databases and mobile technologies. Prior, Alexis worked for Sun Microsystems and Oracle for more than 13 years, and has also spent time at Couchbase and several large systems integrators over in Europe. Alexis has spoken at dozens of conferences as well as university courses and holds a Master’s Degree in computer science with a cognitive science emphasis.

Supporting Resources

About Concurrent, Inc.

Concurrent, Inc. delivers the #1 application development platform for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data applications at scale on Apache Hadoop™.

Concurrent is the team behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 130,000+ user downloads a month. Used by thousands of businesses including Twitter, eBay, The Climate Corp and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

We’re All in the Data Business

We’re All in the Data Business
Gary Nakamura, CEO, Concurrent, Inc.
February 11, 2014
http://recode.net/2014/02/11/were-all-in-the-data-business/

Imagine finding out that your headquarters is sitting on a diamond mine. But you’re an architectural firm, oil company, or a commercial real estate company — what do you know about diamonds?

Data is like that. Simply put, no matter what kind of company you are, you’re in the data business and you’re sitting on a kind of mine, whether or not you’ve tapped it. “You’ve got us mixed up with Twitter,” you might say. But hear me out. Every enterprise collects data. Facebook, Google and Twitter are clearly in the business of data. These organizations have mastered wrangling the complexity of data and turning it into products or services.

Smaller organizations are now looking at their data in new ways. Organizations that recruit employees and temp agencies may find they can sell data about employment trends to others who are looking for more sources of information about this key economic indicator.

For startups, monetizing data is a common strategy. Kaggle runs contests for analyzing big data. In the process, it created one of the largest collections of data about active data scientists in the world, an asset that can be used in dozens of ways.

A Japanese firm found that by looking at elevator activity from a building-automation system they could predict the likelihood of lease renewals. Building-automation data is being used for dozens of other money-saving purposes.

As more and more “things” like elevators become smart, more data will arrive, and it seems obvious to see what value can be obtained from it. Figuring out how you will profit from data means looking at your data from new angles and determining how the data you have is valuable to you or to your partners.

Mashing up data sources for business insight

You have more data than you think you do. Twitter, for all its servers, is no New York Stock Exchange; it’s also not a chemical plant, which generates more data in a day than Twitter does in a month. The enterprise grabs its data from an astounding number of sources: From Salesforce.com or its CRM system, its back-office systems such as SAP or Oracle, not to mention its downloads, click-throughs, Facebook “Likes” and Twitter followers. If you’re a manufacturer, you can pile on data from the factory-automation system and remote sensors. If you’re a retailer, you can add data from your point-of-sale and warehouse management systems. The key point here is not just having so many rich-data sources, but creating new alloys by mashing up those data sources in different ways to increase their business value.

Data-driven decisions = data monetization

Any enterprise that uses data to drive decisions is monetizing its data. In 2011, MIT researchers analyzed 179 publicly traded companies to find that data-driven decision-making (versus relying on a leader’s gut instincts) translated into about five percent higher productivity and profit. I’m surprised those numbers aren’t higher. Those same companies tended to score better in performance measures like asset utilization, return on equity, and market value, which are all forms of monetization.

Hard goods, big data

Manufacturers understand the value data can bring and, according to IDC, 43 percent of manufacturers are actively designing automated, connected factories of the future. Six in 10 manufacturers expect their production processes to be mostly or completely digitized in the next five years. The key in all this is not just the automation of production processes, but the ability to “listen” to all the data the machines in the factories are generating. Manufacturers can then aggregate that machine data, along with other data sources, for everything from optimizing maintenance schedules to saving energy costs to creating a more resilient supply chain. In other words, automation saves money, but it also generates voluminous data that can be used to create fine-grained operational models. Viewing that information in new ways can lead not only to new insights but also to new lines of business.

General Electric is in the data business. The company once known for “Bringing Good Things to Life” with consumer products is now leading with what it calls “The Industrial Internet,” aimed at connecting intelligent machines (like its own gas and steam turbines) with advanced analytics. The payoff is in preventing downtime and eliminating unnecessary labor. GE estimates that it costs the industry 52 million labor hours and more than $7 billion a year to service the gas and steam turbines at work across 56,000 power plants. Servicing machines that are in perfect working order wastes much of that labor. Sensor data tells them which machines are running outside normal ranges (for example, for temperature and vibration) and need servicing. Apply data like that to commercial aircraft, fleet vehicles, conveyors, medical equipment like MRI scanners, and the savings are enormous — especially if you eliminate a costly breakdown.

The new competitive landscape is data

Data is the new competitive landscape, and how we can use data in new ways is becoming clearer all the time. If we think of data as a product, then certainly Amazon, Google, Facebook and Twitter come to mind — they aggregate and sell user data for geospatial advertising (among other uses).

But data can be a product for internal use, as well. A bank client of my company, Concurrent, creates numerous products using its data, both customer-facing products and internal products geared toward risk management, compliance and trading policies. That bank assigns product managers to both types of products — which sounds novel for data products, but I predict that it will become the rule, not the exception. These data product managers bridge business requirements, imperatives and processes with their data. As more organizations recognize that they are in the business of data, data products will need to be managed like any other product line.

Trading data for data

Another way that companies are in the business of data is by trading data that they have — partial data sets — for more coherent data sets. One example is Jigsaw, the crowdsourced database of companies and contacts. “No more renting or buying costly company directory lists,” Jigsaw promises (presumably from costly services like Bloomberg). Instead, you share the contact information you have, and receive credits toward a more complete data set. A second example is Factual, with location-based data for mobile personalization and ad targeting. User companies contribute their own consumer location data for access to bigger data sets and to product data on more than 650,000 consumer packaged goods. (That’s data that you historically paid for with a United Product Code (UPC) membership.) Factual promises that companies can “share and mash open data on any subject,” and share and mash they do. Both Jigsaw and Factual offer premium services like data cleansing, but the price of admission is your partial data set.

Learning to mine big data

Enterprises of all kinds will only grow more data-rich, meaning there are more riches to be found. If data is a kind of rich repository (like diamonds), then it requires both crude and refined tools. Hadoop is like dynamite, just the first tool in mining diamonds. Yes, it can aggregate data into a whole, but none of those data leaders (like Amazon or GE) installs Hadoop and declares its job done. They are developing testable, reusable data processing applications that turn data into products they can use.

It will take creativity and business acumen to look at the data we have and begin to imagine its uses, either for us or for others. What I can tell you is that you are sitting on that mine. Exactly how that data is valuable to you is the question that I invite you to consider. Since we’re all in the business of data, I guarantee that the answers to that question will be worth the time you take to ponder it.

What you missed in Big Data : New solutions shift analytics landscape

What you missed in Big Data : New solutions shift analytics landscape
Maria Deutscher, SiliconANGLE
February 10, 2014
http://siliconangle.com/blog/2014/02/10/what-you-missed-in-big-data-new-solutions-shift-analytics-landscape/

It’s been an exciting month for the Big Data ecosystem, with Cloudera repackaging its portfolio in an effort to make Hadoop more accessible to organizations of different sizes. Two emerging players also made headlines with new solutions for developing analytical applications.

Announced on Monday, the Cloudera Enterprise Data Hub combines the vendor’s open source distribution with a set of “advanced components” that consists of HBase, the Impala structured query engine and homegrown search and auditing tools. The offering also includes across-the-board support services and integration with Spark, an in-memory alternative to MapReduce that executes queries up to 100 times faster.

Organizations that don’t need all this extra functionality can opt to go with the Flex Edition, a slimmed down version of the Enterprise Data Hub that comes with just one premium. For the most economic-minded organizations, Cloudera is offering a Basic Edition that features only the core Hadoop distro plus support.

Over on the NoSQL front, Orchestrate launched a cloud-based service that allows developers to use a single API for accessing and managing information across different databases. The firm says that this abstraction reduces complexity and empowers users to drive more value from their data. The platform supports text search, graph, activity feed, time-ordered event and key-value queries, with more types to come.

Concurrent is taking a different approach to streamlining the development of data-driven apps. The company last week introduced a free performance monitoring tool that aims to simplify the maintenance of software built using Cascading, an open source Java framework for Hadoop. Dubbed Driven, the solution tracks data flows at runtime to ensure information quality and includes a broad set of metrics around program logic, giving operations professionals visibility into application behavior.

Concurrent rounds out data-driven app dev framework with performance management tool

Concurrent rounds out data-driven app dev framework with performance management tool
Maria Deutscher, SiliconANGLE
February 7, 2014
http://siliconangle.com/blog/2014/02/07/concurrent-rounds-out-data-driven-app-dev-framework-with-performance-management-tool/

The rapid growth in unstructured information is transforming the entire enterprise IT stack, from the underlying infrastructure to the business software end users depend on to be productive. But while the journey to re-architect the data center is well under way, with the Hadoop distribution race already in full swing, the industry is only now beginning to deliver on the potential of Big Data applications, which Tresata CEO Abhi Mehta considers the next frontier of analytics.

“The value in the ecosystem sits really high up the stack, in what we call advanced analytics applications,” Mehta remarked in an interview on theCUBE last year. He predicts that within four years, the “massive race to zero in large parts of the historical data analytics stack where billions of dollars are currently being made” will shift spending to “the highest point of the stack in what we called analytics applications, predictive analytics software that works in Hadoop.” And Tresata isn’t the only vendor eyeing a slice of this rapidly emerging market.

A five-year-old startup called Concurrent is making big strides with Cascading, an open source Java framework that simplifies the development of data-driven apps on the batch processing platform. The firm claims that the software is downloaded more than 130,000 times a month and used by thousands of organizations, including Twitter, eBay and The Climate Corp.

Concurrent this week announced a complementary application performance management solution that aims to accelerate time to market and streamline maintenance for Cascading users. Branded as Driven, the cloud-based service provides visibility into data flows and program logic at runtime to enable test-driven development while allowing practitioners to keep tabs on information quality.

“The release of Driven further enables enterprise users to develop data oriented applications on Apache Hadoop in a more collaborative, streamlined fashion. Driven is the key to unlock enterprises’ ability to drive differentiation through data. There’s a lot more to come – this is only the beginning,” commented Chris Wensel, the founding CTO of Concurrent.

Concurrent Launches Driven APM Service for Hadoop Apps

Concurrent Launches Driven APM Service for Hadoop Apps
Mike Vizard, IT Business Edge
February 7, 2014
http://www.itbusinessedge.com/blogs/it-unmasked/concurrent-launches-driven-apm-service-for-hadoop-apps.html

Cascading has emerged as an open source alternative to the Java application programming interface (API) for building Big Data applications on top of Hadoop. Developed by Concurrent, Cascading has now been deployed in over 6,000 applications, which now makes managing all those applications a challenge.

To address that issue, Concurrent this week unfurled Driven, an application performance management (APM) service that organizations using Cascading can employ via the cloud. Accessed via a plug-in that developers insert inside a Cascading application, Driven exposes an API that allows organizations to track a variety of application performance metrics in real time.

With more organizations building applications on top of Hadoop, Concurrent CTO Chris Wensel says that figuring out where a specific performance issue may lie across a cluster of servers that can consist of hundreds of servers is now a major challenge.

Currently in beta, Wensel says Driven makes it a lot easier to identify not only the applications that are failing to run, but those that are running poorly for one reason or another. That capability is especially critical, says Wensel, when dealing with Hadoop applications that routinely span multiple terabytes of data.

While the development of Big Data applications is clearly still in its infancy, the managing of those applications represents a major challenge of IT operations teams that often have little familiarity with Hadoop. Rather than acquiring APM tools for Big Data applications and the infrastructure needed to run them, Driven provides a convenient service alternative that can be quickly deployed. Best of all, that approach doesn’t require the IT operations team to become a master of all things Hadoop overnight.

Concurrent’s new Hadoop monitoring tool looks out for big-data snags

Concurrent’s new Hadoop monitoring tool looks out for big-data snags
Jordan Novet, VentureBeat
February 4, 2014
http://venturebeat.com/2014/02/03/concurrents-new-hadoop-monitoring-tool-looks-out-for-big-data-snags/

Software developers want to know if their applications are running properly on the Internet. Just look at the runaway success of application-performance management companies like New Relic and AppDynamics.

Similarly, engineers who process lots of different kinds of data using the Hadoop ecosystem of open-source software need a monitoring tool to make sure their Hadoop jobs are running right.

Just such a tool is coming out today from Concurrent, the company behind the open-source Cascading framework for implementing Hadoop jobs.

It’s called Driven. And it’s been a long time in the making. Founder Chris Wensel has been talking about it for at least two and a half years.

“The net of it is that we’re going to be able to provide very detailed, high-fidelity [information] about data applications as they’re running on Hadoop,” Wensel said in an interview with VentureBeat. “So things like, you know, your joins and merges and filters — you’ll be able to see them progress … . We’ll notify you of failures, and we’ll take you right to the line of code where the application failed essentially and give you insight to go and fix the issue.”

Concurrent is making Driven available as a free service that runs in the cloud, so Cascading users can get a taste of it while building applications, Gary Nakamura, Concurrent’s chief executive, told VentureBeat. If people want to use Driven in production or in on-premises data centers, they’ll have to pay for an annual license, Nakamura said.

Engineers using the service in an early adopter program have been pleased with Driven’s ability to identify problems, so debugging can take less time, Wensel said.

Driven sounds like the kind of thing that could make more companies comfortable with the idea of experimenting with Hadoop. Hadoop can work in addition to or in the place of more traditional, and typically more expensive, data warehousing technology from companies like IBM and Teradata. Hadoop still isn’t extremely widely adopted, but it’s gotten more popular each year since the Apache Hadoop project began in 2006.

The complexity of Hadoop has inhibited its adoption, along with concerns about security and other issues. If Driven does manage to make Hadoop easier to operate, Concurrent could stand to enjoy a degree of the success that Hadoop companies like Cloudera have. Cloudera is in a position to go public later this year.

San Francisco-based Concurrent started in 2008, and to date it has raised $5 million, including a $4 million round last year.

How Concurrent hopes to help ID Hadoop bottlenecks

How Concurrent hopes to help ID Hadoop bottlenecks
Nancy Gohring, IT World
February 4, 2014
http://www.itworld.com/cloud-computing/403167/how-concurrent-hopes-help-id-hadoop-bottlenecks

February 04, 2014, 4:00 AM — Concurrent, the company behind Cascading, the application framework for building big data apps, is now hoping to take the pain out of managing big data apps.

Concurrent is releasing today in beta a product called Driven that lets developers identify problems that are holding up their Hadoop applications.

“Enterprise developers will be able to see their apps running and understand where their apps are failing, down to the line of code,” said Gary Nakamura, CEO of Concurrent.

He hopes Driven will solve a problem that is eating up loads of time for developers. “Hadoop is a complete black box,” he said. “In order to find out what happened to an app running on Hadoop, you have to scrape logs from thousands of nodes and look for a needle in a haystack. That’s often an untenable task that takes weeks or months,” he said.

In fact, companies are now filling positions for a person who’s entire job it is to “stare at Hadoop clusters all day long and make life or death decisions,” said Chris Wensel, CTO of Concurrent.

Complicating matters is that when administrators look at Hadoop jobs, they may see one job that in fact includes multiple jobs. Without insight into those jobs, an administrator may kill a job because it’s clear it’s operating poorly and unknowingly kill an important job that’s been running for a day.

With Driven, users will not only be able to see individual jobs but see who owns them. So if an admin finds a job that’s behaving badly but it’s one that’s a very important, time sensitive project that’s nearly finished, the admin can decide to let it go.

“Or if it turns out that an intern wrote some bad code, you can kill it and go talk to that intern,” Wensel said.

Driven will show enterprise developers where slow-downs are occurring so that they can fix the problem. It also includes some collaboration features designed to make it easy for developers, data scientists and operators, all of whom may work together on an app, to share views of the app in order to discuss areas that might need improvement.

Driven is a cloud service that can work on Hadopp applications running internally or in a public cloud. For now, it supports Cascading apps, which Concurrent says means it’ll be available for multiple thousands of app deployments. It’s free to use in a development environment now with a planned launch of a version for commercial deployments, which will include support, scheduled to launch in the second quarter.

It’s a safe bet that more products like this will appear, given the growth of big data and the emerging challenges enterprises face in managing it.