Category Archives: News

BusinessInterviews.com Q&A with Gary Nakamura, CEO, Concurrent, Inc.

BusinessInterviews.com Q&A with Gary Nakamura, CEO, Concurrent, Inc.
By BusinessInterviews.com
October 10, 2013
http://www.businessinterviews.com/concurrent-gary-nakamura

Concurrent, Inc.’s vision is to become the #1 software platform choice for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data processing applications on Apache Hadoop.

Concurrent is the mind behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 90,000+ user downloads a month. Used by thousands of data-driven businesses including Twitter, eBay, The Climate Corp, and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

What issue does your core product help solve and how so?

Cascading is all about making it easy for non-experts to adopt Hadoop for Big Data application development. When it comes to developing applications on Hadoop, developers program in MapReduce or scripting query languages like Pig and Hive. However, there is a growing realization that MapReduce is difficult to use, and Pig and Hive fall short of enterprise needs and enabling anything more than simple ad hoc analysis.

This is where Cascading comes into play, as it offers the first viable alternative to MapReduce for processing huge data sets on Hadoop. Cascading enables developers, data analysts and data scientists to quickly and easily build robust data processing and data management applications on Hadoop that can be deployed in the cloud or within private data centers. Cascading simplifies programming, workflow and data processing, allowing many successful, data-driven companies to gain significant productivity benefits, making it significantly easier to execute their Big Data strategy with existing skills sets, systems and tools – all while reducing the overall complexity of their technology infrastructure.

What has been your biggest challenge as a business owner and how have you met that challenge?

Our biggest challenge to date has been investing in and perfecting software that we give away for free, allowing tens of thousands of developers to adopt and thousands of companies to benefit and drive their big data strategies.

Going forward it will be about building a business, and continue to deliver useful and valuable software to market.

What’s the most exciting thing on the horizon for you/your Company?

We secured $4 Million in funding earlier this year, and are now working to release our first commercial product. The enterprise product is currently in alpha, and we are working to ship a beta version for additional users to evaluate. It is not a repackaging of our open source tools, but will be a new software offering that adds capabilities aimed at enterprises that need the added visibility and control of their big data applications.

Where do you envision your company in 5 years, 10 years, 20 years?

In 5 years, we want Concurrent to be the source for Big Data application infrastructure. We are well on our way to delivering the most comprehensive set of tools and runtime services aimed at enterprise needs. In 10 years, we want Concurrent to develop the same clout as Oracle, without the same overhead. As for 20 years from now… it’s too tough to say. We will stay focused on delivering great software to market whether in open source software or commercially, staying true to that will carry us forward. We look forward to furthering our business with the release of Concurrent’s first commercial product.

Have you had any mentors or role models that have influenced you? Describe the impact.

There are 2 quotes that have influenced my focus, planning, execution and consistent work ethic:

Thomas Jefferson – The harder I work, the luckier I get.
Branch Rickey – Luck is the residue of design.

The combination of the two has helped me immensely.

Cascading 2.2 Now Available

Introducing Cascading 2.2, the leading Java application framework for building and deploying reliable enterprise Big Data applications on Hadoop

Introducing Cascading 2.2

Cascading is a Java application framework that enables typical developers to quickly and easily develop rich Data Analytics and Data Management applications that can be deployed and managed across a variety of computing environments. Cascading works seamlessly with Apache Hadoop 1.0 and API compatible distributions.

Cascading 2.2 is now publicly available for download.

http://www.cascading.org/downloads/

Cascading 2.2 includes a number of new features and updates:

What’s New in Cascading 2.2
– First class support for field level type information used by the planner
– Pluggable API for custom type coercion on custom types
– Support for blocks of Java “scripts” in addition to expressions
– AssemblyPlanner interface to allow for platform independent generative Flow planning
– Optional CombinedInputFormat support to improve handling with lots of small files
– Added FirstNBuffer and updated Unique to leverage it for faster performance
– MaxValue and MinValue Aggregators to allow for max/min on Comparable types, not just numbers

What’s Improved in Cascading 2.2
– Updated relevant operations to optionally honor SQL like semantics with regard to NULL values
– Updated SubAssembly to support setting local and step properties via the ConfigDef interface
– Updated FlowDef to accept classpath elements for dynamic inclusion of Hadoop jobs
– Updated GlobHfs to minimize resource and CPU usage

For more details on new features and resolved issues see:
https://github.com/Cascading/cascading/blob/2.2/CHANGES.txt

Supporting Resources

Availability and Pricing

Cascading 2.2 is available now, and freely licensable under the Apache 2.0 License Agreement. Concurrent offers standard and premium support subscriptions for enterprise use, with pricing based on number of users. To learn more about Concurrent’s offerings please visit http://www.concurrentinc.com/newsletter.

Open-Source Systems You May Have Taken for Granted: 10 Examples

Open-Source Systems You May Have Taken for Granted: 10 Examples
By Chris Preimesberger
October 4, 2013
http://www.eweek.com/developer/slideshows/open-source-systems-you-may-have-taken-for-granted-10-examples.html/

A key moment in IT history took place in Mountain View, Calif., on Feb. 3, 1998. That was the day a small group of Silicon Valley software developers (which included Dr. Larry Augustin, now CEO of SugarDB, Eric Raymond and Christine Peterson) sat down to decide that there needed to be an actual name for a new software development genre. The now-familiar term “open source” was first coined at this meeting. Since that day, the open-source movement—led today by Linus Torvalds’ Linux—has evolved to affect every corner of the IT industry. Open source now is more mature than ever, with today’s landscape encompassing a much broader range of applications benefiting industry and society. From Web servers to big data analytics to cloud computing to research and exploration, the following is a look at some of the most innovative open-source technologies and projects that illustrate just how far the movement has come. In this slide show, eWEEK and Concurrent CEO Gary Nakamura provide a listing of 10 well-known IT use cases that rely heavily—or even solely—on open-source software.

See the slideshow at: http://www.eweek.com/developer/slideshows/open-source-systems-you-may-have-taken-for-granted-10-examples.html/

How to Escape the Dark Valley of Your Hadoop Journey

How to Escape the Dark Valley of Your Hadoop Journey
Gary Nakamura, CEO, Concurrent, Inc.
September 30, 2013
http://allthingsd.com/20130930/how-to-escape-the-dark-valley-of-your-hadoop-journey/

It happens to the best of us. You know your business is bursting with useful data, and you’ve only begun to scratch the surface. So you strike out to build an analytical platform, using all the great open-source tools you’ve been hearing so much about. First, you have to capture all the data that’s coming in, before you even know what you’re going to do with it. So you build a butterfly net to catch it all, using Hadoop. But as soon as the net is cast, everything goes dark. You know the data’s there, but you can’t get at it, or if you can, it comes out in unusable formats. Your current systems won’t talk to it, and you don’t have a staff of PhDs in programming, the budget to buy a United Nations’ worth of translators or hire an army of consultants. A chill runs down the back of your neck. What have you done? You’ve entered the Dark Valley of Hadoop. That’s the bad news. The good news is that you’re not alone, and there’s a way out.

Warning Signs That You’re in the Dark Valley

Many data-rich companies fall into the Dark Valley for a time. You have the data, but you’re not getting the value that you expect from it. You have problems testing and deploying the applications that are supposed to extract that value. You have difficulty turning business requirements into code that will turn the great Leviathan that is the Hadoop Distributed File System into something halfway manageable. The project for which all of this effort was intended is delayed for months, with cost overruns making stakeholders nervous. Once you finally get the chance to test, you don’t get the results you were expecting. More delays ensue.

One of the cruelest tricks of the Dark Valley is the illusion that you got it right on the first try. Agile design philosophy tells us to work on small projects and test them as quickly as possible, then iterate forward. But Hadoop tends to reveal its weaknesses around manageability the deeper you get into the adoption cycle. If you’re using tools made for programmers, such as Pig and Hive, you’re making a bet that the programmer who built that first iteration is going to be around for the second. In today’s competitive marketplace, there is no guarantee of that. Then there is the fact that MapReduce, Hadoop’s native language, is already on its second version, with a third entirely new computation engine, built from the ground up, rapidly on its way. In the Hadoop ecosystem, many low-level moving parts have the nasty habit of changing every 90 to 120 days. All these moving parts mean that you’re having to keep up with numerous release cycles, which takes your focus off the business at hand.

So if the project mantra is “stand up, rinse and repeat,” it’s the “repeat” part that proves challenging. You find you need to expand your team, the number of systems and the scope of your project. The farther along the path you travel from “build” to “deploy and operate,” the wider the gap between programming tools and enterprise application tools becomes. The open source tools for Hadoop were simply never intended for the mainstream data-driven enterprise. The result is a skills bottleneck, requirements confusion and maintenance complexity — a lot of bumping around in the dark.

You aren’t alone. Some of the largest and most successful data-driven companies are having or have had similar frustrations. LinkedIn, Twitter and BlueKai all started out with native interfaces and have ended up with mountains of unmaintainable code. They have found better, more useable, more sustainable technologies to run their businesses. By investing time and money in alternatives that increased the productivity of their brainy staff, they fought their way out with significant investments of time and money. The good news is that you can learn from their experience and avoid the Dark Valley entirely.

Escape From the Dark Valley

There is a light at the end of the tunnel, but you have to know how to find it. The key lies in knowing your options, which usually involves leveraging the skillsets you already have in-house so that your big data strategy can continue up into the light.

The development methodologies you have in place were established for a reason. Because Hadoop wasn’t designed for the enterprise developer, the first mistake many enterprises make is to invert their planning processes around Hadoop. The best way to protect your existing methodologies is to avoid reconfiguring them in order to use MapReduce. In other words, is it possible to find an easier way for your experienced enterprise developers to use Hadoop instead of hiring MapReduce programmers and attempting to teach them about your business?

Indeed, Hadoop can be tamed through application programming interfaces (APIs) or domain-specific languages (DSLs) that encapsulate and hide MapReduce so that your developers don’t have to master it. For example, a modeling tool such as MicroStrategy, SAS, R, or SPSS can use a DSL that will consume models and run them on Hadoop without needing to write Hive, Pig or MapReduce. Enterprises can leverage existing investments made in Java, SQL and modeling tools that will allow them to quickly start parsing Hadoop datasets without the need to learn another language.

Here are some domain-specific languages that leverage existing development skills:

  • Java: Cascading is a widely used Java development framework that enables developers to quickly build workflows on Hadoop.
  • SQL: Cascading Lingual is an ANSI SQL DSL. This means that developers can leverage existing ETL and SQL and migrate them to Hadoop. The deeper benefit is that applications designed for SQL can access data on Hadoop without modification using a standard database driver.
  • MicroStrategy, SAS, R and SPSS: Cascading Pattern is a scoring engine for modeling applications. In a matter of hours, developers can test and score predictive models on Hadoop.

Newer languages like Scala and Clojure are popular with many developers these days (Clojure is especially popular with data scientists). DSLs for these languages also simplify development on Hadoop.

These APIs abstract the complexity of Hadoop, enabling enterprise developers to spend their time on business logic and creating big data applications instead of getting stuck in the maze of the Hadoop open source projects. The best APIs let you corral the resources you already know how to use — relational databases, ERP systems, and visualization tools — and use them in conjunction with Hadoop.

Conclusion

The power of big data has been established, but our understanding of how to exploit it in the most productive way is still maturing. The initial toolset that came with Hadoop didn’t anticipate the kinds of enterprise applications and powerful analyses that businesses would want to build on it. Thus, many have fallen into the Dark Valley. But a new breed of middleware (APIs and DSLs) has arrived. They keep track of all the variables and peculiarities of Hadoop, abstract them away from development, and offer better reliability, sustainability and operational characteristics so that enterprises can find their way back out into the light.

Nearly 20 years ago, the Web set the stage for existing enterprises and new emerging companies to change and create new innovative businesses. Similarly, big data and the business opportunity that it offers is driving enterprises to extract valuable insights about their business and in many cases create additional and significant monetization opportunities for their existing products and services. Enterprises are transitioning from big data as a project to being in the “business of data.” If that’s not a bright light at the end of the tunnel, what is?

Concurrent’s Chris Wensel: The Open Source Path Is a Rocky Road

Concurrent’s Chris Wensel: The Open Source Path Is a Rocky Road
By Jack M. Germain LinuxInsider
July 2nd 2013
http://www.linuxinsider.com/story/Concurrents-Chris-Wensel-The-Open-Source-Path-Is-a-Rocky-Road-78398.html#sthash.JnrHmu3b.dpuf

“I want the same clout as Oracle. I just don’t want that same infrastructure as Oracle. Open sourcing is a great way to teach people how to write code, see how things work, and get contributions and get people to trust it. It is marketing as well, however. It is a lot of things. Just one thing is missing: It is not a very clean way to make money.”

Big Data and open source software may be the next great unholy alliance in computing’s current promised land, but open source is a broken business model that needs a better vehicle for supporting projects such as programming suites that build database applications.

So argues Chris Wensel, founder and CTO of Concurrent. Wensel started the company in 2008 to focus on the open-source Cascading Project, a framework that he created to better work with Big Data applications.

Wensel quickly discovered that he was part of a bad news / good news scenario. The bad news was that he began his company too early. The good news is the Big Data market is now growing.

Big Data is the process of collecting and analyzing huge volumes of structured and unstructured data. The data sets are so large that they defy handling with traditional database and software techniques.

Cascading is an open-source framework — Java library and runtime — that enables developers to simply develop rich enterprise-grade data analytics and data management applications. The application can be deployed and managed across a variety of Apache Hadoop computing environments on which database apps typically run.

Cascading was named by InfoWorld as the 2012 Best Open Source Software Winner. Wensel, meanwhile, is also the author of Scale Unlimited Apache Hadoop BootCamp, the first commercial Apache Hadoop training course.

In this interview, LinuxInsider talks to Wensel about the challenges of working with Big Data and the hurdles posed by open source software.

Read the entire interview here – http://www.linuxinsider.com/story/Concurrents-Chris-Wensel-The-Open-Source-Path-Is-a-Rocky-Road-78398.html#sthash.JnrHmu3b.dpuf

SiliconAngle interviews Concurrent Inc.’s Gary Nakamura on theCube

At the Fluent Conference this week, John Furrier, theCube host, invited Gary Nakamura, CEO of Concurrent Inc. Startup, to talk in-depth about the trajectory of his company from the seedfunding days to the time when thousands of data driven businesses (including Twitter, eBay, The Climate Corp and Etsy) rely on Cascading as their default framework for building and deploying large scale data processing applications.

Watch the full length video below:

http://siliconangle.com/blog/2013/05/31/sql-is-a-prerequisite-skill-for-mainstream-hadoop-fluentconf/?angle=silicon

5 More Buzz-worthy Big Data Analytics Apps

5 More Buzz-worthy Big Data Analytics Apps
Enterprise Apps Today – Drew Robb
April 10, 2013
http://www.enterpriseappstoday.com/business-intelligence/5-more-buzz-worthy-big-data-analytics-apps.html

Concurrent Lingual

Lingual is a free, open source project that was designed to enable fast and simple Big Data application development on Apache Hadoop. It leverages the platform support of the Cascading application framework, thereby allowing SQL users to utilize existing skills to create and run Big Data applications on Hadoop without any new training.

Cascading is the most widely used and deployed technology for building Big Data apps, with more than 75,000 user downloads a month. Those using it include eBay, TeleNav and Twitter.

“With Concurrent Lingual, companies can leverage existing skill sets and product investments by carrying them over to Hadoop,” said Chris Wensel, CTO and founder of Concurrent. “Analysts and developers, familiar with traditional BI tools, can create and run Big Data applications on Hadoop.”

MapR Puts Hadoop on Ubuntu, Source Code on GitHub

MapR Puts Hadoop on Ubuntu, Source Code on GitHub
eWeek – Darryl K. Taft
April 1, 2013
http://www.eweek.com/database/mapr-puts-hadoop-on-ubuntu-source-code-on-github/

MapR and Canonical announced a partnership to put the MapR Hadoop distribution on Ubuntu. The company also put its source code on GitHub.
MapR Technologies, a Hadoop and big data specialist, and Canonical, the company behind the Ubuntu OS, have teamed up to deliver an integrated offering of the MapR M3 Edition for Apache Hadoop with Ubuntu.

Canonical and MapR said they also are working to develop a Juju Charm that can be used by OpenStack and other customers to easily deploy MapR into their environments. The Juju Charm can be used to deploy MapR M3 into both private and public clouds that have standardized on OpenStack. Juju is Canonical’s package management system for implementing cloud technologies. “Charms” are preconfigured deployment tools for specific products.  In addition, MapR announced that its source code is available on GitHub.

“With the launch of Ubuntu 12.04 LTS and the introduction of Juju, Canonical further established its lead in delivering easy access to the latest open cloud technology, including OpenStack,” Kyle MacDonald, vice president of cloud computing at Canonical, said in a statement. “Our customers want optimized workloads to run in their cloud and now, with the addition of MapR M3 as an easily deployable and enterprise-grade Hadoop solution, we are providing Ubuntu customers with an effective way to implement big data into their operations.”

The free MapR M3 Edition includes HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flume and other Hadoop-related components for unlimited production use. MapR M3 will be bundled with Ubuntu 12.04 LTS and 12.10 via the Ubuntu Partner Archive.

The MapR Hadoop distribution enables Linux applications and commands to access data directly in the cluster via the NFS interface that is available with all MapR Editions.  The MapR M5 and M7 Editions for Apache Hadoop, which provide enterprise-grade features for HBase and Hadoop such as mirroring, snapshots, NFS HA and data placement control, will also be certified for Ubuntu, MapR officials said.

“Ubuntu and MapR are a great combination for implementing an open-source Hadoop cluster in any enterprise,” said Tomer Shiran, director of product management at MapR, in a statement. “For OpenStack customers, this packaged offering of Ubuntu with MapR is a fast and simple means to enable Hadoop as a service in their environments.”

Canonical was the first company to distribute and support OpenStack, and Ubuntu has remained the reference operating system for the OpenStack project since the beginning. Ubuntu is the easiest and most trusted route to an OpenStack cloud, whether for private use or as a commercial public cloud offering.

The Ubuntu/MapR package will be available through the Ubuntu Partner Archive for 12.04 LTS and 12.10 releases of Ubuntu on www.ubuntu.com starting April 25. The MapR Juju Charm will be available April 25 as part of the 13.04 Ubuntu release at http://jujucharms.com/.

Meanwhile, MapR announced the availability of its source code on GitHub, making it easy to access and modify components in the MapR Distribution for Apache Hadoop. The MapR Distribution combines more than a dozen different packages for Hadoop. The source code for these packages, including MapR’s fixes and enhancements, is now publicly available on GitHub at https://github.com/mapr/. In addition, MapR is releasing binaries, source code and documentation in a public Maven repository making it easier for developers to develop, build and deploy their Hadoop-based applications.

“MapR is committed to supporting the Hadoop ecosystem and to providing an open enterprise-grade Hadoop platform,” Shiran said. “By making this source code easily accessible on GitHub and providing open APIs such as NFS and ODBC, MapR is ensuring that end users have available an open and flexible, enterprise-grade platform for Hadoop.”

The following Hadoop-related projects are included in the MapR Distribution and are available on GitHub: Apache Hive, Apache Pig, Cascading, Apache HCatalog, Apache HBase, Apache Oozie, Apache Flume, Apache Sqoop, Apache Mahout and Apache Whirr. All MapR fixes and enhancements are available as open-source products, complete with version tagging.

VCs Bet Big Bucks on Hadoop

VCs Bet Big Bucks on Hadoop
March 27, 2013 – Deborah Gage
http://blogs.wsj.com/venturecapital/2013/03/27/vcs-bet-big-bucks-on-hadoop/

“A couple of companies that make it easier for developers to write applications that run on Hadoop have also been funded recently. Concurrent this month raised $4 million and hired a new CEO, Gary Nakamura, while Continuuity so far has raised about $12.5 million.

Some of Continuuity’s founders were introduced by Concurrent’s founder, Chris Wensel , and the two companies are likely to become partners, according to Continuuity co-founder and CEO Todd Papaioannou.”