All posts by KIm Loughead

Do you speak Hadoop? What you need to know to get started.

Do you speak Hadoop? What you need to know to get started.
Neil A. Chaudhuri, GCN
May 1, 2014
http://gcn.com/Articles/2014/05/01/Hadoop-basics

It’s a funny word. You have only a vague notion of what it is. You’ve heard that it takes a lot of work but is potentially beneficial. Maybe if you learned more about it you too could enjoy its benefits. But even if you wanted to try it, you wouldn’t even know where to start.

If it isn’t obvious, I’m talking about Hadoop. Not Zumba.

Google first described Hadoop 10 years ago—an eternity in technology—but it is only recently that the rest of us have begun to explore its potential. Even as businesses and government agencies have built Hadoop solutions, it remains a source of confusion to many. In order to assess its potential, IT managers must first understand what Hadoop is.

Why Hadoop?

Hadoop is not just one thing. It is a combination of components that work together to facilitate cloud-scale analytics.

Hadoop provides an abstraction for running analytics on a cluster of commodity hardware when there is too much data for a single machine. The analytics program need not know about the cluster, how work is divided across it, nor the vagaries of cluster management. If a machine fails, Hadoop handles that.

HDFS stands for the Hadoop file system. It’s optimized for storing lots of data across a computing cluster. Users simply load files into HDFS, and it figures out how to distribute the data. Virtually all interactions with Hadoop involve HDFS directly or indirectly.

MapReduce is often mistaken for Hadoop itself, but in fact it is Hadoop’s programming model (commonly in Java) for analytics on data in HDFS. To understand the conceptual foundations of MapReduce, imagine two relational database tables—one for bank accounts and the second for account transactions. To find the average transaction amount for each account, a user would “map” (or transform) the two original tables to a single dataset via a join.

Then all the individual transaction amounts with the same account number would be “reduced” (or aggregated) to a single amount via a “GROUP BY” clause. MapReduce allows users to apply precisely these same concepts to a large data set distributed across a cluster, but the operations can be quite slow as files are continually read and written.

Hive allows users to project a tabular structure on the data so they can eschew the MapReduce API in favor of a SQL-like abstraction called HiveQL. Anyone used to SQL staples like “CREATE TABLE,” “SELECT,” and “GROUP BY” will find HiveQL eases the transition to Hadoop. Familiar abstraction aside, Hive queries run as MapReduce jobs under the hood.

There are other notable components in Hadoop:

HBase. The Hadoop database and an example of the so-called NoSQL databases described in a previous column.

Zookeeper. A centralized service for coordinating activities among the machines in the cluster.

Hadoop Streaming. A MapReduce API that lets developers use popular scripting languages (e.g. Ruby or Python).

Pig. An analytic abstraction similar to Hive but with a query syntax called Pig Latin (yes, seriously), which prefers a scripting pipeline approach to the SQL-like HiveQL.

Beyond Hadoop

As powerful as it is, many aspects of Hadoop remain too low-level, error-prone and slow for developers who need higher levels of abstraction. Cascading enables simpler and more testable workflows for multiple MapReduce jobs. Apache Spark lets developers treat data sets like simple lists and uses cluster memory to make jobs run faster. A companion project to Spark, Shark, similarly uses memory to make Hive queries run faster.

Apache Accumulo was originally built by the National Security Agency to supplement HBase with cell-level security. Numerous projects, including the analytics and visualization tool Lumify, are built on Accumulo.

In 2010, Google wrote a paper on Dremel, which facilitates fast queries on cloud-scale data. Dremel supports Google’s BigQuery product and has never been released, but Cloudera’s Impala and MapR’s Apache Drill are open-source implementations.

With network data (e.g. SIGINT or financial transactions) requiring graph analytics, MapReduce can be especially slow. A popular alternative programming model is Bulk-Synchronous Parallel (BSP), which abandons disk I/O for messages sent along the network. Apache Hama and Apache Giraph both use BSP to support graph analytics, and Titan is a graph database that uses HBase and other backends to store cloud-scale graphs.

While these tools excel with batch analytics, and what about analytics on data streaming from a feed like a message queue or Twitter? Apache Storm and Spark Streaming can help there.

All these tools leverage existing Hadoop artifacts like HDFS files and Hive queries. For example, I have written analytics with Spark against an existing HBase table.

Hopefully you can now assess Hadoop to see if it has a place in your enterprise. As for me, I am still trying to wrap my head around Zumba.

The Business of Data

The Business of Data
Jelani Harper, Dataversity
April 24, 2014
http://www.dataversity.net/business-data/

A health insurance company leverages its massive amounts of data to customize customer service for patients by conducting product assessments and medication reconciliations.

An exceedingly large and well-known power supply company makes substantial additions to the Internet of Things by manufacturing jet engines and windmills, and monitoring the output of their data to schedule maintenance.

An automobile manufacturing company merges manufacturing telemetry with the telemetry of its vehicles to optimize assembly line production.

The commonly found confluence of Data Scientists, engineers, and operations personnel of any number of industries designs recommendation engines for their web sites to boost traffic.

Regardless of the industry, regardless of the area of specialty of a particular enterprise, a number of organizations are involved in the business of Data – the utilization of data and data-driven processes to operate and improve their businesses.

At the core of this process of automating and operationalizing data to create meaningful action increasing business value is application building, a necessity for Big Data, conventional data, or any degree of integration of the two—and without which data might derive insight, but can’t actually do any of the aforementioned processes that is revolutionizing the way business is practiced in the 21st Century:

“It is very clear that the broader world is moving towards enterprise data applications,” observed Chris Wensel, founder and Chief Technology Officer of Concurrent, whose open source application building framework Cascading is receiving 100,000 downloads a month with 11 percent month-over-month growth and thousands of deployments.

“It’s about businesses taking their assets and their data and building applications that enhance their business processes. It’s very clear that the market is moving on to the next level of maturity, and it has been doing so at a very rapid pace during the past 18 months.”

Without Data Science

Much of the current hype revolving about Data Science and the shortage of viable Data Scientists pertains to the myriad responsibilities these workers have. A large part of that responsibility has to do with integrating architecture from newer technologies (such as Hadoop and various NoSQL offerings) and their forms of data with legacy systems in a way that can produce meaningful action for the business. Part of the appeal of using application building platforms such as Cascading is that it is a Java-based API—which means that engineers and developers can utilize whatever Java compatible language they’ve been using for years to develop applications utilizing all forms of data, big or otherwise.

Subsequently, there is a whole set of skills necessary for manipulating Big Data platforms such as Hadoop and others related to Data Science that is no longer needed to create actionable applications from data. Organizations don’t necessarily need to wait for Data Scientists to graduate and can instead concentrate on solving their business problems in a much more expedient and economical fashion—while utilizing current personnel. Wensel acknowledged the impact of Cascading’s single API approach:

“What does the enterprise have on the bench? Java developers. They have to learn new APIs [to build data applications] and understand the business problems. Where Cascading comes in is we normalize that API. We give them one API to learn so they can solve multiple business problems and focus on the quality of the business stack underneath and not focus on anything else, while continuing to leverage their existing talent.”

Operationalizing Data

In addition to considerably reducing the skills necessary to manipulate Big Data and transform it with applications that fulfill business objectives, application building platforms also produce a degree of reliability and automation that is necessary for applications to function accordingly. Although scripts such as R and others are useful for creating analytics and building applications, they frequently lack the scalability to handle Big Data sets.

The true advantage to application building is that it effectively operationalizes data, and goes a step further than providing the conventional insight of Business Intelligence or analytics to actually provide action. There are a number of critical prerequisites that must be accounted for to consistently operationalize data in a manner that is reliable enough so that crucial business processes can hinge upon them. The action must be continuous, self-sustaining, and ideally function in a way so that even laymen can use it.

Without a comprehensive application building platform, Data Scientists and IT are simply linking together a variety of disparate technologies in a way that may cause latency or, even worse, provides the potential for malfunctioning. With such a platform, IT and operations personnel can utilize a single framework in a language with which they are already familiar to provide that perpetual reliability needed to automate valuable business processes. And, with application building monitoring tools such as Concurrent’s Driven, which provides unparalleled insight and specificity into the nature of applications and revolutionizes the time frame required to address latency issues and malfunctions, those same personnel can ensure that those processes are actually optimized. Concurrent CEO Gary Nakamura commented that:

“The skills cap is broadly systemic across most organizations. They all want to access the data, and they all want to build things on top of new data stores like Hadoop, but the challenge is that they’ve been using SQL for the last 30 years or they’ve been building Java applications for the last 20 years. There is a situation where only the elite can do low level things.”

That’s where Cascading provides the instruction and the interfaces so the rest of the organization with different levels of skills in front of the enterprise can leverage the data in Hadoop and operationalize it so that they can put applications into business processes.

Integration Logic

In addition to facilitating the business logic and operationalizing processes that are integral for professionals to maximize business value, application building also plays a crucial role for integrating a variety of data sources and management tools. As Big Data technologies mature, there is an emerging trend for the enterprise to utilize platforms such as Hadoop to store all of their data and to integrate Big Data with traditional proprietary data to conduct comprehensive analytics on aggregated data sets.

A similar capability exists for data application frameworks such as Cascading, which also has an integration API in which developers or Data Scientists can utilize this single tool to access various types of data, programming languages and platforms to manipulate data that is pivotal for a particular application. The result is that developers, operations personnel and engineers can now focus on business logic regardless of the different technologies involved, since Cascading has a specific API for integration. Wensel discussed the integration capabilities of Cascading in the following case study:

“Why can’t you just read data from Oracle and write it in Cassandra with a single application, and not have five applications to do that and have that data be stale the moment you actually get your hands on it? Cascading is designed to solve that problem; it’s first-class. It’s focused on solving any kind of problem where integration with existing legacy systems is a priority.”

Better than BI?

The reality of the influence and influx of data on business and operational processes is that in more and more instances, organizations are actually becoming involved in the business of data.  Data is no longer simply used to enhance their processes, but instead is actually sold as a product or service—such as is the case with geo-spatial data or gene-sequencing information. In such situations, data applications are a requisite for the business and effectively represent the vital means by which such business is provided and for how it functions.

In this respect, data application building represents the evolution of BI. Organizations are transcending the ability to simply conduct analysis with data, and are instead using that data to produce specific action that creates business value:

“That’s truly what’s happening here,” Wensel reflected. “BI is being overrun by people who are actually delivering data as a product based on data. They’re taking raw materials, and delivering things. BI is a subset of that.”

In which case, data application building is the larger picture.

Hortonworks, Concurrent Team Up on Cascading SDK

Hortonworks, Concurrent Team Up on Cascading SDK
Barry Levin, Newsfactor
April 22, 2014
http://www.newsfactor.com/story.xhtml?story_id=112009BFL3Z4

“By expanding our alliance with Concurrent and integrating with the Cascading application platform, Hortonworks’ customers can now drive even more value from their enterprise data by enabling the rapid development of data-driven applications,” said John Kreisa, vice president of marketing at Hortonworks, of the Concurrent partnership.

Concurrent and Hortonworks are teaming up to streamline enterprise application development of data-centric applications. The strategic partnership was announced Tuesday.

Concurrent, an enterprise data application platform company, will integrate its Cascading SDK, a widely used application development framework for Hadoop-based data applications, with Hortonworks Data Platform. Hortonworks, a provider of enterprise-targeted Apache Hadoop, will certify, support and deliver Cascading.

Hortonworks said it will also guarantee the ongoing compatibility of Cascading-based applications with future releases of the Hortonworks Data Platform, through continuous testing for compatibility and direct customer support.

Apache Tez

Cascading, in a future release, is expected to support Apache Tez, which will allow projects to accommodate faster response times and near real-time data processing. Companies that are using Cascading, Lingual, Scalding, or other dynamic programming language APIs and frameworks will now have the ability to migrate to new releases of the data platform that support Apache Tez.

On its Web site, Hortonworks said Apache Tez allowed “projects in the Apache Hadoop ecosystem…to meet demands for fast response times and extreme throughput at petabyte scale.”

John Kreisa, vice president of strategic marketing at Hortonworks, said in a statement that, “by expanding our alliance with Concurrent and integrating with the Cascading application platform, Hortonworks’ customers can now drive even more value from their enterprise data by enabling the rapid development of data-driven applications.”

Cascading is open-source software that is employed for the development and implementation of complex data processing workflows on Hadoop, thereby providing an abstracted and higher level control over MapReduce processing, which is the data processing backbone in Hadoop. It offers a computation engine, systems integration framework, data processing and scheduling. Concurrent’s services include commercial support for Cascading, and Cascading was initially created by Chris Wensel, the Concurrent founder.

‘A Must-Have’

Also open source, Apache Hadoop is a framework for processing large data sets. It is intended to provide insights into large stores of structured and unstructured data. Hortonworks was founded in 2001 by members of the original Hadoop development and operations team at Yahoo.

“The need for simple, powerful tools for big-data application development is a must-have to survive in today’s competitive climate,” Concurrent CEO Gary Nakamura told news media.

Hortonworks’ alliance with Concurrent is only the latest in a series. Last week, the Palo Alto, Calif.-based Hortonworks announced a new addition to its joint engineering alliance with open-source provider Red Hat. The two companies will integrate Red Hat’s platform-as-a-service OpenShift with Hortonworks Data Platform, so that Hadoop can be used in an open hybrid cloud. This follows an announcement in February that the companies will deliver an open-source initiative for delivering Hadoop to the hybrid cloud.

In early April, Hortonworks announced a strategic partnership with Lucidworks, allowing users of Hortonworks Data Platform to access and analyze their data through the open-source enterprise search platform, Solr.

At the beginning of April, Hortonworks released version 2.1 of its data platform. Highlights of the new release included interactive SQL query, improved capabilities for data governance and security, and two new processing engines for streaming and search.

Hortonworks Adds Cascading For Big Data App Development

Hortonworks Adds Cascading For Big Data App Development
Doug Henschen, Information Week
April 21, 2014
http://www.informationweek.com/big-data/software-platforms/hortonworks-adds-cascading-for-big-data-app-development/d/d-id/1204594

Hortonworks adds Concurrent’s Cascading SDK to its Hadoop distribution to help developers operationalize big data applications.

Hortonworks wants to make it easier to build big data applications, so on Monday it announced that it will add software and support for the popular Cascading app-development framework to its Hadoop distribution.

Developed and supported by Concurrent Inc., Cascading is a Java-based framework for app development popularized by Internet giants including eBay, LinkedIn, and Twitter, and increasingly used by more conventional enterprises to operationalize big data applications. Where data analysts tend to do interactive, ad-hoc analyses across Hadoop, the Cascading framework is geared to application developers who have to create repeatable big-data systems that run day after day. For example, one of Cascading’s cable company customers is using Cascading to develop applications based on set-top box data now analyzed on Hadoop.

“This company brings in 19 terabytes of set-top-box data per day, and they need to build applications that consume that data, process it, and deliver data products to different constituents including marketing and sales,” said Gary Nakamura, Concurrent’s CEO in a phone interview withInformationWeek.

Cascading shields developers from the complexities of Hadoop programming, and with recent updates it has been certified by Hortonworks to work with Hadoop 2.0 and its YARN resource management framework. Cascading will also make use of Tez, a new feature of Hadoop 2.0 that eliminates the intermediate writes and delays associated with first-generation MapReduce programming.

“We’ve gone a lot deeper with Hortonworks with this announcement so that the 6,500-plus deployments that we have of Cascading can migrate from using MapReduce to Apache Tez without any code changes,” said Nakamura.

Concurrent’s partnership with Hortonworks is non-exclusive, according to Nakamura, but he described Hortonworks as having “open arms to other technologies that help with the broader ecosystem and enterprise adoption of Hadoop.” Nakamura didn’t elaborate, but one reason for the tighter partnership with Hortonworks might be Cloudera’s efforts to go beyond Hive with its Impala offering, which offers an interactive SQL interface for Hadoop. Concurrent offers Cascading Lingual as a SQL-on-Hadoop interface for developers building analytic applications.

With this week’s announcement, Hortonworks will ship the Concurrent Software Development Kit as part of its Hortonworks Data Platform distribution and it will also offer first- and second-tier support for the software.

Concurrent, Inc. and Hortonworks Join Forces to Advance Enterprise Data Application Development on Hadoop

Leading Big Data Companies Simplify Enterprise Development for Rich Data-Centric Applications

SAN FRANCISCO and PALO ALTO, Calif. – April 21, 2014Concurrent, Inc., the enterprise data application platform company, and Hortonworks, the leading contributor to and provider of enterprise Apache™ Hadoop®, today expanded their strategic partnership to simplify enterprise application development for data-centric applications. The Cascading SDK will now be integrated and delivered with the Hortonworks Data Platform (HDP). In addition, Hortonworks will certify, support and deliver Cascading™, the most widely used application development framework for data applications on Hadoop.

As enterprise adoption of Hadoop continues to rise, users are looking to reliably operationalize their data. This brings a new set of challenges for enterprises – ranging from leveraging developer best practices, data application scalability and data systems integration. The Concurrent and Hortonworks partnership underscores the timely importance of simplifying enterprise application development for these new data-centric applications.

Concurrent and Hortonworks Expand Strategic and Technical Partnership

Concurrent and Hortonworks share a deep commitment to accelerating Hadoop adoption across the enterprise, empowering developers to quickly build rich data-centric enterprise applications on Hadoop. The partnership benefits users by combining the robustness and simplicity of Cascading with the reliability and stability of Hortonworks Data Platform.

As part of today’s news, Hortonworks will certify, support and deliver the Cascading SDK with HDP. Hortonworks’ commitment to the Cascading community and users extends beyond redistribution of the Cascading SDK, also guaranteeing the ongoing compatibility of Cascading-based applications across future releases of HDP with continuous compatibility testing and direct HDP customer support for Cascading.

Upcoming releases of Cascading will also support Apache Tez. Tez is a significant development in the Hadoop ecosystem, enabling projects to meet demands for faster response times and delivering near real-time big data processing. Tez is a general data-processing fabric and MapReduce replacement that provides a powerful framework for executing a complex topology of tasks. In addition, Tez executes on top of Apache™ Hadoop® YARN, a sub-project of Hadoop which separates resource management and processing components. YARN fundamentally enables a broader array of interaction patterns for data stored in HDFS beyond MapReduce and makes Hadoop 2.0 a more general data processing platform.

In addition, thousands of companies that already use Cascading, Lingual, Scalding or Cascalog, or any other dynamic programming language APIs and frameworks built on top of Cascading, have the flexibility to seamlessly migrate to newer versions of HDP that support Apache Tez, with zero investment required to take advantage of this improved processing environment.

To learn more about how Concurrent and Hortonworks accelerate big data application development with Cascading and HDP, register for a joint webinar, taking place Tuesday, April 22 at 10 a.m. PDT, by visiting http://hortonworks.com/webinars.

Supporting Quotes

“Hadoop unleashes insight and value from enterprise data as a core component of the modern data architecture, integrating with and complementing existing systems. By expanding our alliance with Concurrent and integrating with the Cascading application platform, Hortonworks’ customers can now drive even more value from their enterprise data by enabling the rapid development of data-driven applications.”

-John Kreisa, vice president of strategic marketing, Hortonworks

“As more enterprises realize they are in the business of data, the need for simple, powerful tools for big data application development is a must-have to survive in today’s competitive climate. Our deepened relationship with Hortonworks furthers our commitment to Hadoop and drives new innovation around the development of enterprise data applications.”

-Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

Further reading

About Hortonworks

Hortonworks develops, distributes and supports the only 100% open source Apache Hadoop data platform. Our team comprises the largest contingent of builders and architects within the Hadoop ecosystem who represent and lead the broader enterprise requirements within these communities.

The Hortonworks Data Platform provides an open platform that deeply integrates with existing IT investments and upon which enterprises can build and deploy Hadoop-based applications.

Hortonworks has deep relationships with the key strategic data center partners that enable our customers to unlock the broadest opportunities from Hadoop.

For more information, visit www.hortonworks.com

About Concurrent, Inc.

Concurrent, Inc. is the leader in Big Data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading™, the most widely deployed technology for data applications with more than 150,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Keith Giannini or Anna Vaverka
MSLGROUP
(415) 817-2500
hortonworks@schwartzmsl.com

Webinar | Accelerate Big Data Application Development with Cascading and HDP – Apr 22, 2014

Join Hortonworks and Concurrent to learn how to accelerate your big data application development with the popular Cascading framework and Hortonworks Data Platform.

Date: Tuesday, April 22, 2014
Time: 10am Pacific / 1pm Eastern

In this webinar, we will describe how developers can create future proof, data-driven applications built on Apache Hadoop. We will show how organizations can take advantage of the latest Hadoop processing frameworks like YARN and Tez. And we’ll finish it off with a demonstration on application development with Java-based Cascading middleware on the Hortonworks Sandbox.

Register at: http://bit.ly/1qYX9BW

Concurrent, Inc. Adds Vice President of Field Engineering to Support Accelerated Customer and Product Growth

Former American Express Big Data Evangelist Joins the Ranks of the Enterprise Software Company Behind Cascading™ and Driven

SAN FRANCISCO – April 2, 2014Concurrent, Inc., the enterprise Big Data application platform company, today announced the appointment of Supreet Oberoi as vice president of field engineering. In response to increasing customer and user demand, as well as massive growth in enterprise data applications, Oberoi will lead the company’s efforts to deploy and operationalize its commercial offerings, which include products, support and professional services.

Oberoi brings 20 years of enterprise software experience, joining Concurrent from American Express where he served as Big Data technical evangelist and director of Big Data technical delivery. Combining business acumen with technical insight, he developed reference architectures and new enterprise-level capabilities, such as operational visibility, governance, compliance, data management and quality with the Apache Hadoop™ stack using many Big Data technologies. In addition, he developed groundbreaking capabilities in fraud, risk and marketing on the Big Data platform. Prior to American Express, Oberoi held several vice president, director and senior-level positions at Agile Software, oneREV, Oracle and RTI.

Oberoi joins a seasoned leadership team that includes Concurrent CEO Gary Nakamura, and Chris Wensel, founder, CTO and API author of Cascading™, the most widely used and deployed technology for Big Data applications with more than 150,000 downloads a month. Oberoi’s appointment follows the company’s recent announcement of Driven, the industry’s first application performance management product for Big Data applications – purpose-built to address the pain points of enterprise application development and application performance management on Hadoop.

Supporting Quotes

“As we accelerate growth and operationalize our commercial product, Driven, it becomes critical to have the market expertise, wisdom and counsel of a veteran leadership team. Supreet’s extensive Big Data experience and first-hand knowledge from the trenches of several high-growth technology companies makes him a key addition to our leadership team at this point in the company’s evolution.”

-Gary Nakamura, CEO, Concurrent, Inc.

“Hadoop continues to be the fabric of choice for tackling Big Data, yet the need for it to easily integrate with existing systems creates a barrier to mass adoption. Further, Hadoop traditionally represents a black box, where it’s difficult to understand exactly what’s going on. I look forward to joining Concurrent at such a pivotal time for the company as we continue to develop products and solutions, such as Cascading and Driven, that provide visibility and manageability to the Hadoop black box.”

-Supreet Oberoi, Vice President of Field Engineering, Concurrent, Inc.

Supporting Resources

About Concurrent, Inc.

Concurrent, Inc. delivers the #1 application development platform for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data applications at scale on Apache Hadoop™.

Concurrent is the team behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 150,000 user downloads a month. Used by thousands of businesses including Twitter, eBay, The Climate Corp and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

MeetUp | How AdMobius Uses Cascading to Connect Mobile Data to its Users – Apr 7, 2014

Sign-up here: http://meetu.ps/2h3XFn

When:
Monday, Apr 7, 2014
6:00 PM (PT)

Where:
Etsy
20 California St, Floor 3
San Francisco, CA 94111

How AdMobius Uses Cascading to Connect Mobile Data to its Users

AdMobius is the industry’s first Mobile Audience Management Platform (MAMP) and enables publishers and advertisers to discover and target relevant audiences at scale. By organizing and interpreting unique demographic and interest-based information, AdMobius unlocks the rich value of mobile data.

Come and join Jyotirmoy Sundi, Data Engineer at AdMobius, to learn more about how by adopting Cascading, AdMobius was able to create their “Device Graph” data application that aggregates data and pairs that information to mobile devices. Also, learn how AdMobius created their own custom Tap for data interoperability between their application and their existing infrastructure.

This MeetUp will be hosted at Etsy in San Francisco, CA. Food and drinks will be provided.

Concurrent, Inc. Enhances Mass Enterprise Big Data Adoption with Intel® Data Platform

Cascading Certified Compatible with Intel Data Platform to Deliver Powerful, Secure Big Data Application Development on Hadoop

SAN FRANCISCO – March 26, 2014Concurrent, Inc., the enterprise Big Data application platform company, today announced its compatibility with the Intel® Data Platform. The move combines the power of Cascading, the most widely used and deployed application platform for building robust enterprise Big Data applications on Hadoop, with all the performance, manageability and security features of the Intel Data Platform to drive powerful, reliable enterprise Big Data application development on Apache™ Hadoop®.

As enterprises continue to rapidly adopt Hadoop to manage, drive and leverage insights into data sets, there is an ever-growing need for a cost-effective, scalable Big Data platform for easily building data-centric applications. With its intuitive API, Cascading fulfills this mission-critical need by allowing developers to leverage existing skillsets, investments and systems to build enterprise-class applications on Hadoop. With more than 150,000 downloads a month, Cascading is compatible with all popular Hadoop distributions, including Hadoop 2 and YARN.

As Hadoop applications leverage an enterprise’s most valuable asset – its data, a secure infrastructure is crucial. Enter the Intel Data Platform, which incorporates and builds on an open source software platform to provide distributed processing and data management for enterprise applications analyzing massive amounts of diverse data. Built from ‘the silicon up,’ the Intel Data Platform features up to 30x boost in Hadoop performance with hardware-enabled encryption, along with networking and IO optimizations that ensure maximized performance for Big Data deployments.

This certification offers the right fit for businesses seeking to make the most of their Big Data by combining the productivity benefits of Cascading with the security, performance and manageability advantages of Intel Data Platform. As more and more businesses find themselves in the business of data, Cascading with the Intel Data Platform represents another leap forward for enterprise Big Data professionals to tap into the Big Data gold mine.

Supporting Quotes

“The Intel Data Platform provides a platform for Big Data professionals to conduct secure, simple and quick next-generation analytics on massive amounts of data. The combination of the Intel Data Platform with Cascading offers customers a foundation for businesses to realize the transformational opportunity of their Big Data.”

-Jobi George, Director of Big Data Software Product & Channels, Intel’s Datacenter Software Division

“Today’s announcement demonstrates Concurrent’s continued mission of making enterprise data application development easy for the masses and enabling businesses to operationalize their data. Now our users can rely on the Intel Data Platform’s next-generation analytics and secure infrastructure combined with all the power and benefits of Cascading to drive business differentiation.”

-Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

About Concurrent, Inc.

Concurrent, Inc. delivers the #1 application development platform for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data applications at scale on Apache Hadoop™.

Concurrent is the team behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 150,000+ user downloads a month. Used by thousands of businesses including Twitter, eBay, The Climate Corp and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Intel is a registered trademark of Intel Corporation in the United States and other countries.

Event | Concurrent, Inc. to Present at Big Data TechCon 2014

Alexis Roos to Deliver Two Sessions on How to Easily Develop Big Data Applications on Apache Hadoop™

SAN FRANCISCO – March 25, 2014Concurrent, Inc., the enterprise Big Data application platform company, today announced that Alexis Roos, senior solutions architect, will deliver two sessions at the third annual Big Data TechCon 2014, taking place March 31 – April 2 in Boston. This three-day event is the how-to Big Data training conference for professionals implementing and analyzing Big Data, and will feature practical tutorials for IT and Big Data professionals.

Concurrent Presentations At-A-Glance

What:  “How to Build Enterprise Data Apps with Cascading
Who: Alexis Roos, senior solutions architect, Concurrent, Inc.
When: Wednesday, April 2 at 2 p.m. EST
How: Register at http://bigdatatechcon.com/registrationdetails.html

Session Description

Cascading is the most popular application development framework for building enterprise-grade data applications on Apache Hadoop. This open source development framework allows developers to leverage their existing skillsets, such as Java and SQL, to create reliable applications without having to think in MapReduce. In this presentation, Alexis will give an introduction to Cascading, how it works and then dive into building applications with Cascading. Attendees will learn what types of use cases exist for data-driven businesses, how to approach them with Cascading and its vast ecosystems, and the best practices for Cascading application development.

What:  “Cascading Lingual Shows How SQL Can Save Your On-and-Off Relationship with Hadoop
Who: Alexis Roos, senior solutions architect, Concurrent, Inc.
When: Wednesday, April 2 at 3:45 p.m. EST
How: Register at http://bigdatatechcon.com/registrationdetails.html

Session Description

With Hadoop, life is complicated. It’s a give-and-take relationship where you constantly want to migrate workloads onto Hadoop for processing but also want to get data off for reporting and analysis. These scenarios can be tricky and will often complicate your relationship with Hadoop. However, SQL is the therapy that will help. This class will introduce Cascading Lingual, an open-source project that provides ANSI-compatible SQL, enabling fast and simple Big Data application development on Hadoop. Alexis will demonstrate how Cascading Lingual can be used with various tools like R and desktop applications to drive improved execution on Big Data strategies by leveraging existing in-house resources, skills sets and product investments. Attendees will learn how data scientists and developers can now easily work with data stored on Hadoop using their favorite BI tool.

About the Speaker

Alexis Roos is a senior solutions architect focusing on Big Data solutions at Concurrent, Inc. He has more than 18 years of experience in software and sales engineering, helping both Fortune 500 firms and startups build new products that leverage Big Data, application infrastructure, security, databases and mobile technologies. Prior, Alexis worked for Sun Microsystems and Oracle for more than 13 years, as well as Couchbase and several large systems integrators in Europe.

Supporting Resources

About Concurrent, Inc.

Concurrent, Inc. delivers the #1 application development platform for Big Data applications. Concurrent builds application infrastructure products that are designed to help enterprises create, deploy, run and manage data applications at scale on Apache Hadoop™.

Concurrent is the team behind Cascading™, the most widely used and deployed technology for Big Data applications with more than 150,000+ user downloads a month. Used by thousands of businesses including Twitter, eBay, The Climate Corp and Etsy, Cascading is the de-facto standard in open source application infrastructure technology.

Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com