All posts by Pierre-Yves Poli

Concurrent, Inc. and Analytics Inside Partner to Deliver Practical Solutions to Help Enterprises Leverage Data

 

Companies Collaborate to Strip Hype from Big Data to Deliver Solutions with Tangible Value to Enterprise Customers

SAN FRANCISCO and WESTERVILLE, OHIO – June 30, 2015 – Concurrent, Inc., the leader in data application infrastructure, and Analytics Inside LLC, a company that specializes in advanced analytical solutions for intelligence, law enforcement and healthcare, today announced they are teaming up to help businesses speed development of Hadoop-based enterprise data applications. Analytics Inside provides products, services and training across these and other industry verticals.

This alliance between Concurrent, whose Cascading platform and Driven solution help enterprises create, deploy, run and manage data applications at scale, and Analytics Inside, who utilizes the most effective, advanced analytical technologies using machine learning, natural language processing and Hadoop to provide powerful service-oriented solutions, means that users within the big data community will be able to tap into both companies’ extensive, deep knowledge in the industry. Analytics Inside is also providing Concurrent-developed training services for Cascading, as well as a large number of training classes developed for other big data products and technologies.

Analytics Inside has years of experience and expansive knowledge deploying Cascading-based applications in production. Cascading has provided stability and ease in the underlying Hadoop stack that has allowed Analytics Inside products to move from the earliest Hadoop versions, then to YARN – and now to Tez seamlessly. This allows Analytics Inside to run applications on any fabric of choice. By leveraging Cascading and Driven, Analytics Inside has the ability to take big data projects from inception to production quickly. Additionally, it has seen a steady increase in usage of Cascading and has been delivering consulting and training services to Fortune 1000 companies nationally. 

Concurrent provides the proven data application platform for reliably building and deploying data applications on Hadoop. With more than 275,000 downloads a month, Cascading is the enterprise data application platform of choice and has become the de-facto standard for building data-centric applications. Driven is the industry’s leading application performance management product for the data-centric enterprise, built to provide business and operational visibility and control to organizations that need to deliver operational excellence. Together, Cascading and Driven deliver a one-two punch to knock out the complexity and provide a proven reliable solution for enterprises to execute their big data strategies.

In addition to this partnership, the principals at Analytics Inside, Michael Covert and Victoria Loewengart, have written a book published by Packt Publishing, titled “Learning Cascading.” Available now, this book is intended for software developers, system analysts, big data product managers and data scientists who want to deploy big data solutions using Cascading. It offers basic and advanced features of Cascading through a practical, hands-on approach with step-by-step instructions and code samples.

With “Learning Cascading,” readers will learn how to:

  • Design, develop and use custom operations
  • Acquire the skills needed to integrate Cascading with external systems
  • Gain expertise in testing, QA and performance tuning to run an efficient and successful Cascading project
  • Explore project management methodologies and steps to develop workable solutions
  • Discover the future of big data frameworks and understand how Cascading can help your software to evolve with it
  • Uncover sources of additional information and other tools that can make the development tasks a lot easier

To order a copy of “Learning Cascading,” please visit: https://www.packtpub.com/big-data-and-business-intelligence/learning-cascading.

Supporting Quotes

“We’ve been using Cascading in production for years now, and have developed many solutions for our customers that have now stood the test of time due to the unique ability of Cascading to adapt to the rapidly changing and evolving big data ecosystem. Additionally, Cascading and Driven have given us a competitive advantage by reducing development time and abstracting away many of the undue complexities of MapReduce. Because of this longstanding relationship, we’ve been able to provide a range of next-generation, advanced analytical solutions to maximize the value of big data for organizations that depend on the data collected. This is a strategic partnership for us because our solutions open the doors for customers to new ways of understanding, exploiting and acting on their data.”

– Michael Covert, CEO, Analytics Inside

“Throughout our interactions with the team at Analytics Inside, we have not only been impressed by their knowledge of Cascading, but their keen understanding of enterprise business needs and how they are looking to leverage all of their data assets. Analytics Inside is an ideal partner for us as it provides a unique combination of business acumen, data engineering skills and analytical expertise to help our customers transform their business.”

– Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

About Analytics Inside

Founded in 2005, Analytics Inside is a Westerville, Ohio based company that specializes in advanced analytical solutions and services. Analytics Inside produces the RelMiner™ family of relationship mining systems, which includes RelPredict™ machine learning and RelExtract™ Natural Language Processing. These systems are based on big data, graph theory, machine learning, data visualization and natural language processing. For further information, visit http://www.AnalyticsInside.us or email info@AnalyticsInside.us.

About Concurrent, Inc.

Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 275,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

 

Concurrent, Inc. and BitWise Inc. Partner to Bring ETL to the Big Data Era

BitWise joins the Concurrent Partner Program to Allow Quick, Easy and Cost-Effective Data Migration to Customers to Benefit from Hadoop Investments

SAN FRANCISCO – June 15, 2015 – Concurrent, Inc., the leader in data application infrastructure, and BitWise Inc., a global, end-to-end software consulting organization providing guaranteed, efficient and cost-effective service, today announced a strategic partnership to provide services and tools to enable enterprises to rapidly execute on their big data migration strategies.

As the big data industry continues to grow and more organizations enter into the business of data, enterprises have begun to entertain the idea of leveraging new data sources for business value. However, enterprises are often stymied in their efforts to explore more innovative big data approaches by legacy ETL tools that cannot deliver on the requirements of 21st century data-focused businesses.

This partnership between Concurrent, whose Cascading platform and Driven solution delivers the ability to create data processes that can be adjusted, dispatched and scheduled easily, and BitWise, who is focused on data migration for production-grade applications, enables enterprises to migrate ETL workloads to Cascading. This allows organizations to now operationalize their data integration strategies more quickly and in a scalable fashion, while significantly reducing licensing and operations costs.

BitWise has more than a decade of experience working with ETL solutions, and has developed data migration best practices and integration techniques from working on ETL projects for a wide range of Fortune 500 companies.

Concurrent provides the proven data application platform for reliably building and deploying data applications on Hadoop. With more than 275,000 downloads a month, Cascading is the enterprise data application platform of choice and has become the de-facto standard for building data-centric applications. Driven is the industry’s leading application performance management product for the data-centric enterprise, built to provide business and operational visibility and control to organizations that need to deliver operational excellence. Together, Cascading and Driven deliver a one-two punch to knock out the complexity and provide a proven reliable solution for enterprises to execute their big data strategies.

To learn more, BitWise’s executive vice president, Shahab Kamal, and Concurrent’s vice president of field engineering, Supreet Oberoi, will be delivering a joint, complimentary webinar on offloading ETL on Hadoop. Taking place Wednesday, June 24 at 11 a.m. PDT/2 p.m. EDT, the 30-minute webinar will discuss how to jumpstart your ETL offloading project. To register, please visit http://bit.ly/1QtCrMR.

Supporting Quotes

“There is a common misconception that there is a one-size-fits-all approach to enterprise data migration. However, there are simpler, more supportive methods than those facilitated by proprietary ETL solutions. This partnership between BitWise and Concurrent delivers just that – a robust solution that addresses this active pain point for enterprise IT teams. We are excited to team our expertise with Concurrent’s tools to help enterprises finally benefit from their big data/Hadoop investments.”

– Shahab Kamal, executive vice president, BitWise Inc.

“For too long enterprises have been constrained in executing on their big data strategies by the cost, limited scalability and inflexible feature sets of proprietary solutions. We look forward to joining forces with BitWise to knock down these barriers, and help enterprises to process not only their existing data, but to also leverage new data sources, such as logs and social media, to allow them to finally enter into the business of data.”

– Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

About BitWise Inc.

BitWise is a global, full service, IT consulting, services and project delivery organization. It provides guaranteed, efficient and cost-effective software solutions to enterprises worldwide, using an onsite-offshore delivery model. BitWise is headquartered in Schaumburg, Ill. and maintains regional facilities in the U.S. and Australia. The company has its own dedicated offshore development centers in India with reliable and secure connectivity to most cities in the U.S., Europe and Asia-Pacific. BitWise can be found online at http://www.bitwiseglobal.com.

About Concurrent, Inc.

Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 275,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Concurrent, Inc. Continues to Lead the Market for Data-Driven Enterprise Application Development on Hadoop, Announces General Availability of Newest Cascading Version

Cascading 3.0 Supports Multiple Compute Fabrics for Running Enterprise Data Applications at the Speed of Business 

SAN FRANCISCO – June 9, 2015Concurrent, Inc., the leader in data application infrastructure, today announced the general availability of Cascading 3.0, the most widely used application platform for building and deploying enterprise data applications on Apache Hadoop.

The data application technology landscape continues to rapidly evolve with new, state-of-the-art, compute fabrics but with varying degrees of maturity. Enterprises who have invested in Hadoop are seeking practical solutions that provide the degrees of freedom to solve simple to complex problems, all while ensuring their investments are protected against change and technical uncertainty. Organizations need an enterprise-grade data application platform to quickly build and consistently deliver these data products to the business.

Cascading 3.0 represents a milestone release for future-proofing investments in next-generation data infrastructure by allowing enterprises to take advantage of newer compute fabrics as they come available without taking on any inherent risks. Cascading 3.0 abstracts away the underlying platform, allowing users to execute against additional platforms, not just MapReduce. Users have unique portability across programming languages (Java, SQL, Scala), Hadoop distributions (Cloudera, Hortonworks, MapR) and now compute fabrics. Native support for Apache Tez is the first to be supported, with others to be quickly added.

Cascading 3.0 Sets the Bar for Enterprise Application Development

With more than 275,000 user downloads a month, Cascading is the de facto standard in enterprise application infrastructure technology. With tens of thousands of production deployments and millions of production jobs, Cascading is relied on by the largest data-centric businesses, supported by key strategic partnerships with Hortonworks and broad support with all major Hadoop distributions. Cascading accelerates and simplifies enterprise application development, and meets a variety of enterprise use cases – ranging from simple to complex.

Cascading 3.0 is a major leap forward in enterprise data-centric application development. Features and benefits include:

  • Allows enterprises to build data applications once, and run applications on the compute fabric that best meets their business needs
  • Supports local In-memory, MapReduce and Apache Tez
  • Delivers a flexible runtime layer for new computation fabrics to integrate and adhere to the semantics of a given compute engine, such as MapReduce or any DAG-like model through its pluggable query planner
  • Provides benefits through portability to third-party products, data applications, frameworks and dynamic programming languages built on Cascading
  • Supports compatibility with all major Hadoop vendors and service providers, including Altiscale, Amazon EMR, Cloudera, Hortonworks , MapR and Qubole, among others

Concurrent is leading the market in big data application infrastructure with Cascading and Driven, the industry’s leading performance management product for the data-centric enterprise. With Cascading at the core, Concurrent continues to meet customer demand for state-of-the-art technologies, while balancing the pragmatic needs for enterprise application development, forging important industry partnerships and making data-driven application development simpler, faster and smarter. Together, Cascading and Driven deliver a one-two punch to knock out the complexity and provide a proven reliable solution for enterprises to execute their big data strategies.

Supporting Quotes

“Thanks to Cascading’s new TEZ engine, we were able to increase the processing speed of our main workload by a factor of 2x in production, and upwards of 10x in integration testing, compared to the traditional Hadoop MAPREDUCE engine. This acceleration was achieved with no significant changes to our application code, which is written in Scala using the Scalding framework. This spectacular improvement enables us to delay increasing the size of our cluster, improve turnaround time and at the same time enforce integration testing multiple times per hour, which is essential to ensure the quality and trustworthiness of our solution in a cost-effective way. To sum up, this collaboration allowed us to highly improve the quality and delivery of our service to our clients – Transparency RM offers Business Intelligence services and real online uses certification in a big data context. We look forward to the next adventure!”

– Cyrille Chépélov, Chief Innovation Officer, Transparency RM

“BloomReach standardized on Cascading as our data application development framework because it was easy to use and accommodated our scalability requirements. With terabytes of data processed every day by thousands of jobs, combined with rapid application iteration to deliver the best discovery experience to over 75 million consumers monthly, we require a flexible, scalable and efficient application development framework and chose Cascading. We are excited to see Cascading 3.0 hit this milestone so we can start testing out even better performing compute fabrics like Tez and Spark.”

Amit Aggarwal, head of development, BloomReach

“We have transformed how brands and agencies effectively develop and deploy mobile marketing campaigns and we did that by transforming ad performance analytics. Instead of sampling, we process data sets that include over 30B data points to derive exactly how ads performed. We developed our innovative analytics applications such as Attribution and Report Builder using Cascading and Driven. The Cascading application development platform enabled us to manage our development process and increase developer productivity by 10x over hand coding in MapReduce. It organizes our code to promote reuse and standardization, which made it easier for us to build highly resilient, enterprise quality applications from day one. Cascading 3.0 is an exciting milestone as it delivers a truly future-proof application development platform so we can explore other computing fabrics to solve the next problem without retooling and retraining.”

  • Mikhail Izrailov, big data specialist, Medialets

“With reported improvement gains of up to 15 times faster execution time, Tez and Scalding seem to be really delivering from Cascading’s perspective, as new execution fabrics are proving that Cascading and Scalding are enterprise ready, on a path of continuous improvement and can withstand the exciting claims from the growing community of Spark users.”

– Antonios Chalkiopoulos, author of “Programming MapReduce with Scalding” book

“This new, available version of Cascading will enable our users even further by simplifying application development, accelerating time to market and allowing enterprises to leverage existing, and more importantly, new and emerging data infrastructure and programming skills.”
– Chris Wensel, founder and CTO, Concurrent, Inc.

Availability and Pricing

Cascading version 3.0 is immediately and freely licensable under the Apache 2.0 License Agreement. To learn more about Cascading, visit http://cascading.org. Concurrent also offers standard and premium support subscriptions for enterprise use.

Supporting Resources

About Concurrent, Inc.

Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 275,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

###

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

 

 

An Inside View of Mainstream Enterprise Hadoop Adoption

Nicole Hemsoth
June 1st 2015

Few organizations have holistic insight into how the overall Hadoop ecosystem is trending. From the analysts to the Hadoop vendors themselves, there is always some other piece of the market or data that tends to be somewhat incomplete due to a single-vendor view of adoption or market data that might be outdated to allow time for analysis.

But for a company like Concurrent, the vendor behind the application development platform, Cascading, which lets users build data oriented applications on top of Hadoop in a more streamlined way, there are no blind spots when it comes to seeing the writing on the Hadoop wall. And Concurrent has had plenty of time to watch the Hadoop story unfold in full. Beginning in 2008 and through some big name use cases at web scale companies, the small company (now 25 folks, 60% of whom are engineers) saw the first flush of the Hadoop boom and rode the tide, with a combined total of close to $15 million in investments.

Beyond what Concurrent does for end users of Hadoop who want to build and deploy applications on the framework as well as monitor and track their performance via the new Driven tooling they rolled out this month, part of what makes the company interesting from the high level is that they have some unique insight into how companies are using Hadoop at scale.

As Gary Nakamura, CEO of Concurrent, tells The Platform, while they do watch what the analysts groups say about Hadoop’s rise through the mainstream enterprise ranks, their insight into what’s actually happening in the market through the use of their Cascading (and now Driven) product lines extends across all three of the major Hadoop distribution vendors as well as the open source version. While they do not have specific numbers to share, aside from an approximate 290,000 downloads of Cascading per month, Nakamura says there are distinct adoption trends they have been tracking showing healthy (although not meteoric) growth of Hadoop adoption for production workloads in mainstream enterprise settings.

By mainstream, Nakamura means large to mid-sized companies in telco, finance, healthcare, and retail. These users take a “very pragmatic approach that tends to follow milestones with 0-24 months being the experimental phase” before shifting out to build larger clusters and add more applications to the ranks. “We have many mainstream enterprise users who are somewhere in that 24 months to seven years category with an average node count for Hadoop workloads being somewhere between 100 and 200, although we also have users in the mainstream [not the LinkedIn, Twitter, and similar companies] with around 1500 nodes.”

“Beyond the startups and bleeding edge, for the mainstream world, this pragmatism extends to wanting to show a particular ROI for specific projects, then they move out to look at how they can move other applications. For mainstream users though, this is a measured approach in part because this is not a trivial expense. And even though the chasm will close slower than people think with Hadoop adoption, it will happen. If you look at MapR, Cloudera, and Hortonworks, they are getting a lot of new logos each quarter, the growth is there, but mainstream companies are very measured in how they are looking at Hadoop, especially at that 0-24 month stage, where a lot of them are,” Nakamura explains.

The leading edge companies tend to jump immediately into the deep end, but for the high-value companies that are in the infant stages (where the majority of mainstream companies are now, according to Nakamura), it’s about proving an ROI on a Hadoop investment. The Driven product that announced this month allows for complete monitoring and visualization on the health and performance of individual jobs as well as cluster-level metrics, in part to give these mainstream enterprises something that aid in their ability to show the value of the jobs they’ve pushed to Hadoop, oftentimes legacy applications that are moved in pieces—one by one to start, before more applications are moved with the help of Cascading for building out the broader Hadoop strategy.

“These mainstream users tend to come us very well informed about Hadoop. They know what they are doing. Their questions by the time to get us are more clarifying—how many deployments are there, are there similar cases of migrating business-level use cases to Hadoop that we’ve worked with before. They do not want to be the guinea pigs, in other words.” Nakamura notes that the driver for Hadoop adoption among these users is not coming from the top down (there are no directives from CIOs demanding a Hadoop strategy) but users are seeing clear opportunities for their workloads to run on Hadoop and want to be able to use Cascading to build and deploy new applications, then be able to confirm progress using the new Driven tool, especially since accountability with so many different stakeholders is critical.

Once mainstream larger enterprise users get beyond the initial growing pains (past 24 months) Nakamura says they start to see how they can take advantage of the reusable components and connectors of Cascading, which lets them roll developments made on one application into another. “We see this is as the path to growing Hadoop at these enterprises, they can create new applications much quicker to do the second, third, then before they know it, forty applications. We have users now that in three years have started this way and now have 800 applications in production.”

Nakamura does not disagree that there has been a leveling of the steep adoption curve we saw over the last couple of years, but says that with so many companies in that early 0-24 stage, he expects another rush around the bend.

Concurrent Introduces Driven 1.2

New Release Delivers Deep Operational Insights for Hadoop Applications

SAN FRANCISCO – May 27, 2015 – Concurrent, Inc., the leader in data application infrastructure, today announced the latest release of Driven, the industry’s leading application performance management product for the data-driven enterprise. Driven is built to address the challenges of business-critical data application development and deployment, delivering control and performance management for enterprises seeking to achieve operational excellence on Hadoop.

As enterprises continue in the business of data, IT leaders are rapidly executing on next-generation data management strategies. However, working on Hadoop has proved to be difficult. Specifically, enterprises are struggling with data governance, compliance and application performance management due to a plethora of compute fabrics, a shortage of performance management tools and Hadoop’s out-of-the-box complexity.

 

Driven offers unmatched visibility into applications written in Cascading, Scalding, Cascalog, Hive and native MapReduce, providing deep insights, search, segmentation and visualizations for service-level agreement (SLA) management – all while collecting rich operational metadata in a scalable data repository. This allows users to isolate, control, report and manage a broad range of data applications, from the simplest to the most complex data processes, allowing enterprises to leverage capabilities across a wide variety of application frameworks. Driven is a proven performance management solution that enterprises can rely on to deliver against their data strategies.

 

Key features of Driven 1.2 include:

 

Enhanced governance and compliance for Hadoop apps

  • Create and export saved views of a rich metadata repository to support governance and compliance.
  • Visualize end-to-end data lineage, allowing users to view the data pipeline in real time.
  • Detect applications that violate service-level agreements (SLAs) and policies.

 

Operational excellence for Hadoop apps

  • Integration with third-party monitoring tools and notification platforms like HipChat, PagerDuty and Nagios, allowing organizations to seamlessly integrate Driven into their existing communication and management platforms.
  • Integration with JIRA, allowing users to quickly create JIRA issues with detailed application views and application specific operational data.
  • Application performance segmentation by team, department or custom tags for role-based views, chargeback models and capacity planning.

 

Driven can be downloaded now at http://cascading.io/download-driven-server/.

 

To learn more about the features in the latest release of Driven, register for the webinar titled, “How to get Hadoop Application Intelligence With Driven” taking place on Wednesday, May 27 at 11 a.m. PT by visiting: http://info.cascading.io/how-to-get-hadoop-app-intelligence-with-driven

 

Supporting Quotes

“HomeAway, Inc., the world leader in vacation rentals, has selected Driven in order to reduce development times and time-to-market with specific data-integration projects. Driven, was a catalyst to the Cascading development process. It also provided great collaboration opportunities with other data engineers and data analysts allowing them to understand the data flows at-a-glance without having to read any code.”

– René X. Parra, Principal Architect, HomeAway, Inc.

 

“As a healthcare analytics provider, Driven gives us clear detailed insights into how our Hadoop applications are performing on our AWS Hadoop clusters.  It has accelerated our ability to identify issues in our code, saving us valuable time and money.”

– Jeff Rick, CTO, Deerwalk Inc.

 

“Organizations expect to operate Hadoop applications with the reliability, ease and manageability they are accustomed to, With Driven, enterprises are equipped with the right solution to not only achieve operational excellence on Hadoop, but to aggressively drive their plans forward.”

– Gary Nakamura, CEO, Concurrent, Inc.

 

Supporting Resources

 

About Concurrent, Inc.

Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 275,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

 

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

11 tools for a healthy Hadoop relationship.

Supreet Oberoi
May 8th 2015

http://thenextweb.com/dd/2015/05/08/11-tools-for-a-healthy-hadoop-relationship/

I’m often asked which Hadoop tools are “best.” The answer, of course, is that it depends, and what it depends on is the stage of the Hadoop journey you’re trying to navigate. Would you show up with a diamond ring on a first date? Would you arrange dinner with your spouse by swiping right on Tinder? In the same way, we can say that some Hadoop tools are certainly better than others, but only once we know where you are in your data project’s lifecycle.

Here then, are the major stages of a happy and fulfilling relationship with Hadoop, and the tools and platforms most appropriate for navigating each one.

Dating: Exploring your new Hadoop partner

When a business analyst is first matched with a Hadoop deployment, he or she is typically drawn by great promise. What kind of data does the deployment hold? What does that data actually look like? Can you combine it with other data sets to learn something interesting?

bigdata 520x309 11 tools for a healthy hadoop relationship

Answering these questions doesn’t require fancy, large-volume clusters. What’s ideal is a simple, Excel-like interface for reading a few rows, teasing out the fields and getting to know the distribution. At this data exploration stage, visualization platforms like Datameer and Trifacta deliver excellent productivity. If you’re comfortable with SQL, you could also try Hive, but that might be overkill given the learning curve.

One reason that some folks fail to make it past a first date with Hadoop is that they equate visualization with reporting; they go right to a BI tool like Tableau.  The visualization tools above are easier to use and therefore better suited to exploration and quick hypothesis building.

Moving In: Getting comfortable with further Hadoop developments

Before jumping into a production-grade data application, the analyst needs to see a more concrete vision of it if he or she hopes to win the organization’s blessings — and funding. In traditional software terms, it’s like building a prototype with Visual Basic to get everyone on board, without worrying about all the customer-support scaffolding that will eventually be necessary. Basically, you want a nice, slick hack.

data scientist 520x364 11 tools for a healthy hadoop relationship

Here again, the visualization tools can be helpful, but even better are GUI-based application development platforms like SnapLogic, Platfora and Pentaho. These tools make building basic applications fast and easy, and they’re relatively simple to learn.

Tying the knot: Committing to a production relationship with Hadoop

Once a prototype’s value is recognized, the enterprise typically turns the application over to an operations team for production. That team cares not only about the application’s functionality, but also about its capacity to be deployed and run reliably, its scalability and its portability across data fabrics.

The operations folks must also make sure the application’s execution honors service-level agreements (SLAs), and that it integrates seamlessly with inbound data sources like data warehouses, mainframes and other databases, as well as with outbound systems such as HBase and Elastic Search.

At this production deployment stage, development and operations teams typically forgo the GUI-based platforms, opting instead for the control and flexibility of API-based frameworks such as Cascading, Scalding and Spark. (Full disclosure: I work for Concurrent, the company behind Cascading.)

With these you get not only the control necessary to engineer applications that address the complexity of your enterprise, but also test-driven development, code reusability, complex orchestration of data pipelines, and continuous performance tuning. These capabilities are vital for production-grade applications, and are not available in GUI-based platforms or ad hoc query tools.

180930601 520x332 11 tools for a healthy hadoop relationship

Building a big data family: Nurturing a thriving nursery of Hadoop apps

Once a team is married to a big data platform, they soon find themselves with a growing family of applications that execute on a common Hadoop cluster (or other shared resource).

There are two big challenges at this stage. First, each application must continue to operate under defined controls. These include SLA and governance, of course, but it’s also crucial to prevent unauthorized use of sensitive data fields across use cases and that proper privacy curtains are respected.

Second, production teams must carefully manage how a brood of apps uses the shared cluster resources. Specifically, utilization should represent the relative importance of lines of businesses and use cases. In addition, teams must make sure that no application goes hungry for compute resources, either now or in the future (the discipline known as capacity planning).

The ideal tools at this stage deliver performance monitoring for enterprises looking to achieve operational excellence on Hadoop. These include Driven, Cloudera Manager, and even Hadoop’s out-of-the-box Job Tracker. (Full disclosure: my company, Concurrent, also makes Driven.)

Maturing with big datatering a rich, lifelong partnership with Hadoop

Like any relationship, the one with Hadoop can become more complex as it matures.  Typically, the enterprise will transform production data applications in rich, enterprise-class data products, both by piping them into downstream systems (for example, a recommendation engine) and by making their output available to businesspeople to foster better decisions.

data flow 520x272 11 tools for a healthy hadoop relationship

At this most advanced stage in the data product lifecycle, the enterprise’s needs are the most diverse. On top of your production-ready development platform, you’ll want highly intelligent pipeline monitoring. For example, if one application encounters a problem and a downstream one depends on its output, you’ll want to raise an alert so your team can react quickly to resolve the problem.

You’ll also want tools that quickly pinpoint whether a performance bottleneck resulted from a problem in code, data, hardware or shared resources (your cluster configuration). Driven, Cloudera Manager, and Hadoop’s Job Tracker are also helpful here.  You may also want to give businesspeople an easy, flexible ways of getting results they want, while still offering a level of abstraction from the compute interface – a query framework like SQL-based Hive is a great choice here.

To choose the right tool, know where you are in the Hadoop relationship cycle 

In conclusion, before you can answer which tools and approaches work best, you have to understand where you are in the lifecycle of your data operation. Like any true love, your relationship with Hadoop will mature over time, and you’ll need to reevaluate your needs as they mature along with it.

Hadoop and beyond: A primer on Big Data for the little guy

Alexandra Weber Morales
April 28th 2015

http://sdtimes.com/hadoop-and-beyond-a-primer-on-big-data-for-the-little-guy/

Have you heard the news? A “data lake” overflowing with information about Hadoop and other tools, data science and more threatens to drown IT shops. What’s worse, some Big Data efforts may fail to stay afloat if they don’t prove their worth early on.

“Here’s a credible angle on why Big Data could implode,” began Gary Nakamura, CEO of Concurrent, which makes Cascading, an open-source data application development platform that works with Hadoop, and Driven, a tool for visualizing data pipeline performance. “A CTO could walk into a data center, and when they say, ‘Here is your 2,000-node Hadoop cluster,’ the CTO says, ‘What the hell is that and why am I paying for it?’ That could happen quite easily. I predicted last year that this would be the ‘show me the money’ year for Hadoop.”

While plenty can go wrong, Nakamura is bullish on Hadoop. With companies like his betting robustly on the Hadoop file system (and its attendant components in the Big Data stack), now is a strategic moment to check your data pipelines for leaks. Here’s a primer on where the database market stands, what trends will rock the boat, and how to configure your data science team for success.

Follow the leader
Risks aside, no one—not even the federal government—is immune to the hope that Big Data will bring valuable breakthroughs. Data science has reached presidential heights, with the Obama administration’s appointment of former LinkedIn and Relate IQ quantitative engineer DJ Patil as the United States’ first Chief Data Scientist in February. If Patil’s slick talks and books are any indication, he is at home in a political setting. Though building on government data isn’t new for many companies offering services in real estate (Zillow), employment (LinkedIn), small business (Intuit), mapping (ESRI) or weather (The Climate Corporation), his role should prompt many more to innovate with newly opened data streams via the highly usable data.gov portal.

“I think it’s wonderful that the government sees what’s happening in the Big Data space and wants to grow it. I worked at LinkedIn for three years, and for a period of time [Patil] was my manager. It’s great to see him succeed,” said Jonathan Goldman, director of data science and analytics at Intuit. (Goldman cofounded Level Up Analytics, which Intuit acquired in 2013.)

Defining the kernel, unifying the stack
“In the last 10 years we’ve gone through a massive explosion of technology in the database industry,” said Seth Proctor, CTO of NuoDB. “Ten years ago, there were only a few dozen databases out there. Now you have a few hundred technologies to consider that are in the mainstream, because there are all these different applications and problem spaces.”

After a decade of growth, however, the Hadoop market is consolidating around a new “Hadoop kernel,” similar to the Linux kernel, and the industry standard Open Data Platform announced in February is designed to reduce fragmentation and rapidly accelerate Apache Hadoop’s maturation. Similarly, the Algorithms, Machines and People Laboratory (AMPLab) at the University of California, Berkeley is now halfway through its six-year DARPA-funded Big Data research initiative, and it’s beginning to move up the stack and focus on a “unification philosophy” around more sophisticated machine learning, according to Michael Franklin, director of AMPLab and associate chair of computer science at UC Berkeley.

“If you look at the current Big Data ecosystem, it started off with Google MapReduce, and things that were built at Amazon and Facebook and Yahoo,” he said at the annual AMPLab AMP Camp conference in late 2014. “The first place most of us saw that stuff was when the open-source version of Hadoop MapReduce came out. Everyone thought this is great, I can get scalable processing, but unfortunately the thing I want to do is special: I want to do graphs, I want to do streaming, I want to do database queries.

“What happened is, people started taking the concepts of MapReduce and started specializing. Of course, that specialization leads to a bunch of problems. You end up with stovepipe systems, and for any one problem you want to solve, you have to cobble together a bunch of systems.

“So what we’re doing in the AMPLab is the opposite: Don’t specialize MapReduce; generalize it. Two additions to Hadoop MapReduce can enable all the models.”

First, said Franklin, general directed acyclic graphs will enrich the language, adding more operators such as joins and filters and flattening. Second, data sharing across all the phases of the program will enable better performance that is not disk-bound.

Thus, the Berkeley Data Analytics Stack (BDAS, pronounced “Badass”) starts with the Hadoop File System storage layer with resource virtualization via Mesos and Yarn. A new layer called Tachyon provides caching for reliable cross-cluster data sharing at memory speed. Next is the processing layer with AMPLab’s own Spark and Velox Model Serving. Finally, access and interfaces include BlinkDB, SampleClean, SparkR, Shark, GraphX, MLlib and Spark Streaming.

On the commercial side, Concurrent is one of many companies seeking to simplify Hadoop for enterprise users. “I’ve seen many articles about Hadoop being too complex, and that you need to hire specialized skill,” said Nakamura. “Our approach is the exact opposite. We want to let the mainstream world leverage all they have spent and execute on that strategy without having to go hire specialized skill. Deploying a 50-node cluster is not a trivial task. We solve the higher-order problem: We’re going to help you run and manage data applications.”

The case for enterprise data warehousing
One of the most compelling scenarios for Hadoop and its ilk isn’t as exciting as new applications, but some say it is an economic imperative. With the cost of traditional data warehouses running 20x to 40x higher per terabyte than Hadoop, offloading enterprise data hubs or warehouses to Hadoop running on commodity hardware does more than just save cents and offer scalability; it enables dynamic data schemas for future data science discoveries, rather than the prescriptive, static labels of traditional relational database management systems (RDBMS).

According to the website of Sonra.io, an Irish firm offering services for enterprise Hadoop adoption, “The data lake, a.k.a. Enterprise Data Hub, will not replace the data warehouse. It just addresses its shortcomings. Both are complimentary and two sides of the same coin. Unlike the monolithic view of a single enterprise-wide data model, the data lake relaxes standardization and defers modeling… If implemented properly, this results in a nearly unlimited potential for agile insight.”

The fight for the hearts of traditional enterprise data warehouse (EDW) and RDBMS customers isn’t a quiet one, however. Some of the biggest posturing has come from the great minds behind seminal database technologies, such as Bill Inmon, “the father of data warehousing.”

Inmon has criticized Cloudera advertisements linking Big Data to the data warehouse. “While it is true that the data warehouse often contains large amounts of data, there the comparison to Big Data ends. Big Data is good at gobbling up large amounts of data. But analyzing the data, using the data for integrated reporting, and trusting the data as a basis for compliance is simply not in the cards,” he wrote on his blog.

“There simply is not the carefully constructed and carefully maintained infrastructure surrounding Big Data that there is for the data warehouse. Any executive that would use Big Data for Sarbanes-Oxley reporting or Basel II reporting isn’t long for his/her job.”

Ralph Kimball is the 1990s rival who countered Inmon’s top-down EDW vision with a proposal for small, star or snowflake schema-based data marts that form a composite, bottom-up EDW. Kimball has boarded the Big Data train, presenting a webinar with Cloudera about building a data warehouse with Hadoop.

“The situation that Hadoop is in now is similar to one the data community was in 20 years ago,” said Eli Collins, chief technologist for Cloudera. “The Inmon vs. Kimball debates were about methodologies. I don’t see it as a hotly debated item anymore, but I think it’s relevant to us. Personally, I don’t want to remake mistakes from the past. The Kimball methodology is about making data accessible so you can ask questions. We want to continue to make data more self-service.”

“We can look to history to see how the future is going,” said Jim Walker, director of product marketing at Hortonworks. “There is some value to looking at the Inmon vs. Kimball debate. Is one approach better than another? We’re now seeing bit more of the Inmon approach. Technology has advanced to the point where we can do that. When everything was strictly typed data, we didn’t have access. Hadoop is starting to tear down some of those walls with schema-on-read over schema-on-write. Technology has opened up to give us a hybrid approach. So there are lessons learned from the two points of view.”

The idea that the data lake is going to completely replace the data warehouse is a myth, according to Eamon O’Neill, director of product management at HP Software in Cambridge, Mass. “I don’t see that happening very soon. It’s going to take time for the data lake to mature,” he said.

“I was just talking to some entertainment companies that run casinos here at the Gartner Business Intelligence and Analytics Summit, and a lot of the most valuable data, they don’t put in Hadoop. It may take years before that’s achieved.”

However, there are cases where it makes sense, continued O’Neill. “It depends on how sensitive the data is and how quickly you want an answer. There are kinds of data you shouldn’t pay a high price for, or that when you query you don’t care if it takes a while. You put it in SAP or Oracle when you want the answer back in milliseconds.”

Data at scale: SQL, NoSQL or NewSQL?
Another bone of contention is the role of NoSQL to solve the scalability limitations of SQL or RDBMS when faced with Web-scale transaction loads. NoSQL databases sacrifice ACID (atomicity, consistency, isolation and durability) data quality constraints to increase performance. MongoDB, Apache HBase, Cassandra, Accumulo, Couchbase, Riak and Redis are among popular NoSQL choices that let analysts query data with SQL but provide different optimizations based on the application (such as streaming, columnar data or scalability).

“One of the brilliant things about SQL is that it is a declarative language; you’re not telling the database what to do, you’re telling it what you want to solve,” said NuoDB’s Proctor, whose company is one of several providing NewSQL alternatives that combine NoSQL scalability with ACID promises. “You understand that there are many different ways to answer that question. One thing that comes out of the new focus on data analysis and science is a different view on the programming model: where the acceptable latencies are, how different sets of problems need different optimizations, and how your architecture can evolve.”

NuoDB was developed by database architect Jim Starkey, who is known for such contributions to database science as the blob column type, type event alerts, arrays, and triggers. Its distributed shared nothing architectures aggregate opt-in nodes in a SQL/ACID-compliant database that “behaves like a flock of birds that fly in an organized fashion but without a central point of control or a single point of failure,” according to Proctor.

Elsewhere, MIT professor and Turing Award winner Michael Stonebraker, the “father of the modern relational database,” cofounded VoltDB in 2009, another NewSQL offering that claims “insane” speed. The company recently touted a benchmark of 686,000 transactions per second for a Spring MVC-enabled application using VoltDB. Another company he founded, Vertica, was acquired by HP Software in 2011 and added to its Haven Big Data platform.

“Vertica is a very fast, scalable data engine with a columnar database architecture that’s very good for OLAP queries,” explained HP’s O’Neill. “It runs on commodity servers and scales out horizontally. Very much like Hadoop, it’s designed to be a cluster of nodes. You can use commodity two-slot servers, and just keep adding them.”

On the SQL side, HP took the Vertica SQL query agent and put it on Hadoop. “It’s comparable to other SQL engines on Hadoop,” said O’Neill. “It’s for when the customer is just in the stage of exploring the data, and they need a SQL query view on it so they can figure out if it’s valuable or not.”

On the NewSQL side, HP Vertica offers a JDBC key-value API to quickly query data on a single node for high-volume requests returning just a few results.

With the explosion of new technologies, it’s likely that the future will include more database specialization. Clustrix, for example, is a San Francisco-based NewSQL competitor that is focusing on e-commerce and also promoting the resurgence of SQL on top of distributed shared nothing architectures.

SQL stays strong
Meanwhile, movements at Google and Facebook are showing the limitations of NoSQL.

“Basically, the traditional database wisdom is just plain wrong,” said Stonebraker in a 2013 talk at MIT where he criticized standard RDBMS architectures as ultimately doomed. According to him, traditional row-based data storage cannot match column-based storage’s 100x performance increase, and he predicted that online analytical processing (OLAP) and data warehouses will migrate to column-based data stores (such as Vertica) within 10 years.

Meanwhile, there’s a race among vendors for the best online transaction processing (OLTP) data storage designs, but classic models spend the bulk of their time on buffer pools, locking, latching and recovery—not on useful data processing.

Despite all that, operational data still relies on SQL—er, NewSQL.

“Hadoop was the first MapReduce system that grabbed mindshare,” said Proctor. “People said, ‘Relational databases won’t scale.’ But then about a year and a half ago, Google turned around and said, ‘We can’t run AdWords without a relational database.’ ”

Proctor was referring to Google’s paper “F1: A Distributed SQL Database that Scales.” In it, Google engineers took aim at the claims of “eventual consistency,” which could not meet the hard requirements they faced with maintaining financial data integrity.

“…Developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date,” according to the F1 paper. “We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level. Full transactional consistency is one of the most important properties of F1.”

“Intuit is going through a massive transformation to leverage data—not just to understand customers, but also to feed it back into our products,” said Lucian Lita, cofounder of Level Up Analytics and now director of data engineering at Intuit. “We’re building a very advanced Big Data platform and putting Intuit on the map in terms of data science. We’re educating internally, contributing to open source and starting to have a good drumbeat.”

As an example, Quickbooks Financing uses data to solve a classic small business problem: They need financing to grow, but they’re so new that they can’t prove they are viable. “Intuit uses Big Data techniques to get attributes of your business, score it, and we should get something like the normal 70% rejection rate by banks turned into a 70% acceptance rate,” said Intuit’s Goldman.

“Small businesses don’t have access to data like Walmart and Starbucks do. We could enable that: Big Data for the little guy,” he said.

The experience of running a data science startup didn’t just translate into being acquired by an established enterprise. It also gave Goldman, Lita and cofounder Anuranjita Tewary, now director of product management at Intuit, insight into how to form effective data science teams.

“When we were working at Level Up Analytics, we spoke with over 100 companies building data products,” Tewary said. “This helped us understand how to structure teams to succeed. It’s important to hire the right mix of skills: product thinking, data thinking and engineering.”

When she looked at struggling data science projects at a multinational bank, a media company and an advertising company, Tewary saw common pitfalls. One of the most frequent? “Treating the data product as a technology project, ending up with not much to show for it and no business impact,” she said.

“It was more, ‘What technology should we have just for the sake of having this technology?’”

Because the tools have gotten easier to use, the idea of having Big Data for Big Data’s sake (and running it in a silo) may not be long for this world.

“It used to be that we were selling tools to IT. Now we think about analytics tools we can sell to business,” said HP’s O’Neill. “Increasingly, data scientists live in the marketing or financial departments, trying to predict what’s going to happen.”

In 1985, the physicist and statistician Edwin Thompson Jaynes wrote, “It would be very nice to have a formal apparatus that gives us some ‘optimal’ way of recognizing unusual phenomena and inventing new classes of hypotheses that are most likely to contain the true one; but this remains an art for the creative human mind.”

That quote is one of the inspirations behind Zoubin Ghahramani’s Automatic Statistician, a Cambridge Machine Learning Group that won a $750,000 Google Focused Research Award. Using Bayesian inference, the Automatic Statistician system examines unstructured data, explores possible statistical models that could explain it, and then reports back with 10 pages of graphics and natural language, describing patterns in the data.

IBM’s Watson and HP’s IDOL are trying to solve the unstructured data problem. The fourth technology on HP’s Haven Big Data platform, IDOL is for parsing “human data”: prose, e-mails, PDFs, slides, videos, TV, voicemail and more. “It extracts from all these media, finds key concepts and indexes them, categorizes them, performs sentiment analysis—looking for tones of voice like forceful, weak, angry, happy, sad,” said O’Neill. “It groups similar documents, categorizes topics into taxonomies, detects language, and makes these things available for search.”

As the F1 paper explained, “conventional wisdom in the engineering community has been that if you need a highly scalable, high- throughput data store, the only viable option is to use a NoSQL key/value store, and to work around the lack of ACID transactional guarantees and the lack of conveniences like secondary indexes, SQL, and so on. When we sought a replacement for Google’s MySQL data store for the AdWords product, that option was simply not feasible: the complexity of dealing with a non-ACID data store in every part of our business logic would be too great, and there was simply no way our business could function without SQL queries.

“Instead of going NoSQL, we built F1, a distributed relational database system that combines high availability, the throughput and scalability of NoSQL systems, and the functionality, usability and consistency of traditional relational databases, including ACID transactions and SQL queries.”

While this results in a more low-level commit latency, improvements in the client application have kept observable end-user speed as good or better than before, according to Google engineers.

In a similar vein, Facebook created Presto, a SQL engine optimized for low-latency interactive analysis of petabytes of data. Netflix’s Big Data Platform team is an enthusiastic proponent of Presto for querying a multi-petabyte scale data warehouse for things like A/B tests, user streaming experiences or recommendation algorithms.

FoundationDB is another hybrid NewSQL database whose proprietary NoSQL-style core key value store can act as universal storage with the transactional integrity of SQL DBMS. Unfortunately, many FoundationDB users were surprised in March when Apple purchased the ISV, possibly for its own high-volume OLTP needs, and its open-source components were promptly removed from GitHub.

Building better data science teams
As with any technology initiative, the biggest success factors aren’t the tools, but the people using them.

Intuit is a case in point: With the 2013 acquisition of Level Up Analytics, the consumer tax and accounting software maker injected a team of data science professionals into the heart of its business.

Futuristic as the possibility of parsing both human data and the data “exhaust” from the Internet of Things sounds, the technology may be the easy part. As the textbook “Doing Data Science” by Cathy O’Neil and Rachel Schutt explained, “You all are not just nerds sitting in the corner. You have increasingly important ethical questions to consider while you work.”

Ultimately, user-generated data will form a feedback loop, reinforcing and influencing subsequent user behaviors. According to O’Neil and Schutt, for data science to thrive, it will be critical to “bring not just a set of machine learning tools, but also our humanity, to interpret and find meaning in data and make ethical, data-driven decisions.”

How to find a data scientist
Claudia Perlich knows quite a bit about competition: She’s a three-time winner of the KDD Cup, the ACM’s annual data mining and knowledge discovery competition. Her recommendation for finding a qualified data scientist? Use a platform like Kaggle and make it a contest.

First, however, you must answer the hardest question: What data science problem to solve. “This is typically one of the hardest questions to answer, but very much business-dependent,” she said. Once you have formulated a compelling question, a Kaggle competition “will get you almost surely the highest-performance solution emerging from the work of thousands of competing data scientists. You could even follow up with whoever is doing well and hire them.

“It turns out that many companies are using Kaggle as a screening platform for DS applicants. It is notably cheaper than paying a headhunter, and you can be confident that the person you hire can in fact solve the problem.”

Concurrent, Inc. and Big Data AB Join Forces to Accelerate Big Data Application Deployment

Companies’ Combined Experience with Production-Grade, Large-Scale Deployments Helps Businesses Seeking Competitive Advantage, Leveraging Data Assets

SAN FRANCISCO and STOCKHOLM, SWEDEN – April 7, 2015 – Concurrent, Inc., the leader in data application infrastructure, and Big Data AB, a consultancy focusing solely on big data technologies based on Hadoop, today announced they are teaming up to help businesses speed development of Hadoop-based enterprise data applications.

This partnership between Concurrent, whose Cascading platform and Driven solution help enterprises create, deploy, run and manage data applications at scale, and Big Data AB, the premier big data consultancy in Sweden, means organizations worldwide will be able to tap into the companies’ extensive consulting, integration and development experience for both of their growing global customer bases.

Big Data AB has developed some of Sweden’s first Hadoop-based data analytics solutions that are in production. The company has helped customers such as King Digital Entertainment and Svenska Spel integrate their transaction-intensive systems into large-scale analytics and reporting infrastructures.

Concurrent provides the proven data application platform for reliably building and deploying data applications on Hadoop. With more than 275,000 downloads a month, Cascading is the enterprise data application platform of choice and has become the de-facto standard for building data-centric applications. Driven is the industry’s leading application performance management product for the data-centric enterprise, built to provide business and operational visibility and control to organizations that need to deliver operational excellence. Together, Cascading and Driven deliver a one-two punch to knock out the complexity and provide a proven reliable solution for enterprises to execute their big data strategies.

Supporting Quotes

“We help enterprise customers design big data processing pipelines. To process data at scale, you want to express your data transformation as code and have proper application management for optimum execution. Concurrent provides just the right level of abstraction on top of core Hadoop technologies to help you focus on business problems.”

– Johan Pettersson, founder and consultant, Big Data AB

“At Concurrent, our focus is on solutions that enable our customers to build, run and manage data applications at scale. We seek to partner only with consultants and integrators that have proven experience developing applications that have been deployed at scale and are in production. Big Data AB is the ideal partner for Concurrent as we both understand the realities and requirements of building applications that our customers can rely on.

– Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

About Big Data AB
Big Data AB is a Stockholm-based consultancy, established in 2011 that focuses on Hadoop technologies. Working with a number of fast-growing online companies within gaming, advertising and payment, Big Data AB has a proven track-record of putting Hadoop solutions into production.

About Concurrent, Inc.
Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 275,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Concurrent, Inc. Wraps Up Banner Year with Record Growth, Corporate Momentum and Product Innovation

SAN FRANCISCO – Jan. 27, 2015 – Concurrent, Inc., the leader in data application infrastructure, closed 2014 with tremendous growth and momentum, product innovation and community success in the enterprise. The company continues to pave the road for enterprise big data application infrastructure with rapid adoption of its Cascading platform, along with the introduction of its flagship enterprise solution, Driven. Additional milestones from the past year include $10 million in Series B funding, expansion of its leadership team, new strategic partnerships, and strong industry accolades and recognition.

As more and more enterprises solidify and advance their data strategies, with it comes the increased demands and requirements for a more reliable and repeatable way to operationalize their data. As a result, Concurrent has focused on delivering a simplified and consistent way for enterprises to build, run and manage their business-critical data applications.

2014 Corporate and Product Milestones

  • Secured $10 million in Series B funding, led by new investor Bain Capital Ventures with the participation of existing investors Rembrandt Ventures and True Ventures.
  • Expanded leadership team with the appointments of Supreet Oberoi as vice president of field engineering and Salil Deshpande to the board of directors.
  • Launched and announced the general availability of Driven – the industry’s first application performance management product for the data-centric enterprise, built to provide business and operational visibility and control to organizations who need to deliver operational excellence.
  • Launched Driven 1.1, adding support for Apache Hive and MapReduce, for deeper visualization into enterprise data pipelines, as well as data application segmentation and SLA management.
  • Launched Cascading 3.0, the most widely used and deployed platform for building and deploying robust, enterprise-grade big data applications on Hadoop. Cascading delivers a stable enterprise-grade API that enterprises can rely on, providing the flexibility to choose a compute engine to meet the needs of the use case, i.e. MapReduce, Tez, Spark, Storm or Samza.
  • Increased Cascading user downloads from 130,000 to more than 275,000 per month.
  • Added new strategic partnerships with technology leaders including Hortonworks, Rackspace, , Orzota, EPAM and Databricks.
  • Received accolades such as SD Times 100, InfoWorld Bossies, DataWeek Awards, and CRN Emerging Vendors and Big Data 100 lists.

Supporting Quote
“As enterprises continue to realize they are in the business of data, 2014 proved to be an exceptional year for Concurrent. Delivering simple and reliable products at our core, we continued to meet customer demand for enterprise needs for next-generation data processing by delivering valuable products that enterprise can rely on for years to come. As we look to the year ahead, we expect to see significant growth and look forward to seeing more enterprises achieving great success with Cascading and Driven.“ – Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

About Concurrent, Inc.
Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 275,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

9 Business Intelligence and Analytics Predictions for 2015

Drew Robb
January 16, 2015

http://www.enterpriseappstoday.com/business-intelligence/9-business-intelligence-and-analytics-predictions-for-2015.html

What’s ahead for business intelligence and data analytics in 2015?

When we asked analysts, forecasters and pundits for their predictions for the most relevant business intelligence and analytics trends for 2015, they came up with a host of topics, including mobile maturity, Hadoop growth, better text analysis and incorporation of the Internet of Things (IoT) into analytics.

Here is what they saw as they gazed into their crystal balls.

Better Mobile BI

Mobile business intelligence has been all the rage for a couple of years now. But reality hasn’t quite matched up to expectations. Adi Azaria, co-founder of Sisense, said it has gotten to the point where mobile requirements are forcing a fundamental change in the approach to BI.

“Rather than elaborate visualizations, you will see hard numbers, simple graphs and conclusions,” Azaria said. “For instance, with wearable devices you might look at an employee and quickly see the key performance indicator.”

Ellie Fields, vice president of Product Marketing at Tableau, also sees progress being made on mobile, which she thinks will translate into more capabilities in the hands of sales and field service personnel.

“The level of maturity being achieved will help mobile workers accomplish light analysis on the road,” she said.

Year of Hadoop

Hadoop has been a high visibility item for a couple of years now. But Gary Nakamura, CEO of Concurrent, believes Hadoop will break through and become a worldwide phenomenon in 2015. A sign that this is occurring, he said, is a growing wave of Hadoop-related acquisitions and IPOs. “Hadoop is rapidly spreading across Europe, Asia and other parts of the world so there will be strong Hadoop adoption this year,” he said.

Text Analysis Matures

While Hadoop is likely to be big this year, not everyone thinks it will realize its huge potential over the course of 2015. An intermediate stage that incorporates better text analysis may be required on the way to realizing the Big Data dream, Azaria said.

“Unstructured data has posed many obstacles in the past, but will come into its own in 2015,” said Azaria. “Text analysis will gain increasing traction, with Web data, documents and images, and companies finally able to tackle unstructured data in meaningful ways.”

IoT Analytics

First we used databases for analysis, then we added in unstructured data sources and data from a far greater number of mobile devices. Up that by two or three orders of magnitude as the Internet of Things (IoT) takes hold.

“When you have millions of devices, systems and machines connected on the Internet generating all kinds of machine log data, making sense of this data becomes a strategic opportunity for manufacturers, service providers and end users,” said Puneet Pandit, CEO of Glassbeam. “The use cases of this application include support automation, remote diagnostics, predictive maintenance, installed base analysis, product quality reporting and service revenue generation.”

Bob Muglia, CEO of Snowflake Computing, pointed out that in addition to IoT data, companies will make better use of log data and device data, which in the past has been largely collected and analyzed in isolated silos using special-purpose tools.

“There is an emerging recognition of the enormous value in analyzing machine data together with transactional data,” said Muglia. “This recognition will catalyze a rapid shift to data processing and analytics that enables business analysts to gain deeper business insight through the combination of structured and semi-structured data for analysis inside a single system.”

Democratic Data Rules

The strategy of bringing democracy to the Middle East hasn’t exactly panned out. But democracy is  coming to analytics, according to Pandit. Business people inside enterprises are increasingly moving away from relying on internal IT to provide analytics. Software-as a-wervice (SaaS) was the first wave of this. The next step is further simplification of the application layer, with business users demanding real time ad-hoc analysis so they can perform their own “what-if” analyses, find hidden nuggets in data, and build charts, graphs and dashboards and then publish them among their user community.

Sustained Analytics Growth

Gartner numbers show that advanced analytics has surpassed the $1 billion per year mark. That makes it the fastest-growing segment of the business intelligence and analytics software market. Gartner analyst Alexander Linden expects that level of growth to continue as more and more business units gain access to applications that can harness analytics.

Eric Berridge, CEO of Bluewolf, is equally bullish.

“Investments in predictive analytics and data intelligence will explode,” he said. “Seventy-one percent of companies will increase their investments in data intelligence and predictive analytics in the coming year,” he said, citing Bluewolf research.

IT/Business Partner on Analytics

IT used to call the shots on what business intelligence and analytics applications were deployed and who had access to them. Then came SaaS and cloud-based freedom. Brett Azuma, an analyst at 451 Research, predicts that we are about to enter an era of compromise where both sides will have a say in more of a peaceful coexistence.

“Self-service data preparation and harmonization will complement and coexist with IT’s traditional data management tools, which will continue to address critical issues around data security, compliance and governance,” Azuma said.

Invisible Analytics

Gartner analyst David Cearley predicts increasing invisibility in analytics as the volume of data involved grows and the trend toward embedded BI accelerates. To his mind, every application will be an analytic app to some degree, and the era of point analytic tools is gradually fading. “Analytics will become deeply, but invisibly, embedded everywhere,” he said.

Consumerization of Business Intelligence

Eldad Farkash, co-founder and CTO of Sisense, predicts that the term “business Intelligence” will morph into “data intelligence” and that BI will become a more integral part of our lives – even outside the office.

“BI will finally evolve from being a reporting tool into data intelligence that every entity from governments to cities to individuals will use to prevent traffic, detect fraud, track diseases, manage personal health and even notify you when your favorite fruit has arrived at your local market,” Farkash said. “We will see the consumerization of BI, where it will extend beyond the business world and become intricately woven into our everyday lives directly impacting the decisions we make.”