Skip to main content

Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting us. A member of our team will be in touch shortly. Close

  1. Blog
  2. Article

Tom Callway
on 19 February 2015

Ubuntu, Hortonworks and Microsoft = Big Data Hosted Solution


The first Microsoft Azure hosted service to run Linux (on Ubuntu) announced at Strata Conference

This week thousands of people are in California at Strata + Hadoop World to learn more about the technology and business of big data. At the Strata Conference, Microsoft announced yesterday the preview of Azure HDInsight on Ubuntu. This is a recognition that Ubuntu, the leading scale-out and cloud Linux, is great for running Big Data solutions.

Microsoft’s Ranga Rengarajan, corporate vice president, Data Platform and Joseph Sirosh, corporate vice president, Machine Learning noted that, Azure HDInsight, is Microsoft’s Apache Hadoop-based service in the Azure cloud. It is designed to make it easy for customers to evaluate petabytes of all types of data with fast, cost-effective scale on demand, as well as programming extensions so developers can use their favorite languages. Microsoft customers like Virginia Tech, Chr. Hanson, Mediatonic and many others are using it to find important data insights. And, yesterday, they announced that customers can run HDInsight on Ubuntu clusters (the leading scale-out Linux), in addition to Windows, with simple deployment, a managed service level agreement and full technical support. This is particularly compelling for people that already use Hadoop on Linux on-premises like on Hortonworks Data Platform, because they can use common Linux tools, documentation, and templates and extend their deployment to Azure with hybrid cloud connections.

Combined with Canonical’s Juju, Cloud Orchestration tool, Ubuntu make it a breeze to test, deploy, scale and manage Big Data architectures. This is the result of years of effort to optimize Big Data workloads on Ubuntu by our development teams.

For over a decade, DevOps teams have been working with “classical” Configuration Management tools. They have become very successful at building insurance that each server under their watch would run in perfect accordance with their desires and policies.

However, when it comes to Big Data, whether to process vast data sets, or to run real time analytics on unpredictable data streams, or to offer Data-aaS, new questions arise: How to embrace fast paced scalability of their architectures, whether up when the flow grows, or in, when business flow slows? How to stay ahead of the game in a world of faster than ever changing technologies? Add multi-clouds to the equation to prevent single points of failure and you end up with a nightmare for every decision maker.

Containerization has received a lot of positive reviews as an attempt to fix some of these issues by maintaining a single and lightweight “image” of application that becomes cloud-agnostic. But it also came with a list of new and still-to-be-fixed concerns regarding security and, to come back to the first point, orchestration.

So what is good cloud orchestration? To answer that question we have to get back to the requirements for such a tool:

  • Be portable: orchestration is valuable if and only if it is adaptable to each and every substrate: public cloud, private cloud, hybrid cloud, bare metal, containers…
  • Manage scalability: deploying an architecture and not being able to scale it from the management tool doesn’t make sense. To orchestration, the deployment targets should be infinite. The tool must be able to get any share of that infinity and change its mind at any point in time.
  • Manage services: to consumers of the architecture, the knowledge of each machine involved in a scale out service is pointless. What is important is to know how to access the service that the cluster provides.
  • Manage relations: at cloud scale, what matters is that pieces of architecture can communicate together.

What is our answer to those requirements? Juju.

  • Juju creates portable architectures: When deploying a service, Juju makes the minimum number of assumptions regarding the substrate. It always starts with a vanilla OS image, and adds software or containers on top. All configuration information is processed dynamically. Then it can export to a standard YAML file, and reproduce the same architecture regardless of the provider.
  • Juju can scale architectures in and out: Juju offers commands to add or remove service units, efficiently providing ways to scale in both directions. Complemented with a system collecting performance metrics and pointing to its API, it becomes very easy to design autoscalable solutions that do not rely on a cloud provider to function.
  • Juju manages services: The best illustration of Juju’s focus on service is its GUI: whether a cluster has 2 or 200 nodes, it still comes up as a single box.
  • Juju manages relations: Juju can create and manage relations between services by exposing parameters to other services, and consume exposed variables. Juju plugs services into each other, add credentials, and allows the smoothest way to run complex architectures.

On top of that, Juju comes with a centralized Charm Store, a unique marketplace where all charms are stored and exchanged. The main benefit of this approach is that you’ll always find the currently best charm available for a service. If it doesn’t match your own preferences, you can fork it, and share your views with others, thus helping to create an even better experience for future users.For Enterprises, this is a guarantee that their DevOps team are always up-to-date and as agile as they can be when it comes to building new services for the company.

So take Juju, the best in class cloud orchestration tool, with Ubuntu, the best OS for Big Data deployment and Azure, the most advanced Enterprise cloud together to make it easy for customers to evaluate petabytes of all types of data fast.

Related posts


sergiodj
18 November 2024

Profile-guided optimization: A case study

Ubuntu Article

Software developers spend a huge amount of effort working on optimization – extracting more speed and better performance from their algorithms and programs. This work usually involves a lot of time-consuming manual investigation, making automatic performance optimization a hot topic in the world of software development. Profile-guided opt ...


Serdar Vural
28 October 2024

Canonical at India Mobile Congress 2024 – a retrospective

AI OpenStack

With an ambition to become Asia’s technology hub for telecommunications in the 5G/6G era, India hosts the annual India Mobile Congress (IMC) in Pragati Maidan, New Delhi. IMC is an annual trade exhibition for the telecommunication sector, bringing together operators, system integrators, as well as software and hardware vendors. It has now ...


Lech Sandecki
23 October 2024

6 facts for CentOS users who are holding on

Cloud and server Article

Considering migrating to Ubuntu from other Linux platforms, such as CentOS? Find six useful facts to get started! ...