Charm School!

2011-11-22 (cloud, charmschool, juju)

Wanna learn more about juju?

Drop by Charm School:

Details from Jorge’s post:

We're holding a Charm School on IRC.

juju Charm School is a virtual event where a juju expert
is available to answer questions about writing your own
juju charms. The intended audience are people who deploy
software and want to contribute charms to the wider devops
community to make deploying in the public and private
cloud easy.

Attendees are more than welcome to:

Ask questions about juju and charms
Ask for help modifying existing scripts and make charms out of them
Ask for peer review on existing charms you might be working on

Though not required, we recommend that you have juju installed
and configured if you want to get deep into the event.

Monitoring Hadoop Benchmarks TeraGen/TeraSort with Ganglia

2011-11-08 (cloud, hadoop, juju)

#########################################################
NOTE: Repost

The ubuntu project “ensemble” is now publicly known as “juju”. This is a repost of an older article Monitoring Hadoop Benchmarks TeraGen/TeraSort with Ganglia to reflect the new names and updates to the api.

#########################################################

Here I’m using new features of Ubuntu Server (namely juju) to easily deploy Ganglia alongside a small Hadoop cluster to play around with monitoring some benchmarks like Terasort.

Short Story

Deploy hadoop and ganglia using juju:

$ juju bootstrap
$ juju deploy --repository "~/charms"  local:hadoop-master namenode
$ juju deploy --repository "~/charms"  local:ganglia jobmonitor
$ juju deploy --repository "~/charms"  local:hadoop-slave datacluster
$ juju add-relation namenode datacluster
$ juju add-relation jobmonitor datacluster
$ for i in {1..6}; do
$   juju add-unit datacluster
$ done
$ juju expose jobmonitor

When all is said and done (and EC2 has caught up), run the jobs

$ juju ssh namenode/0
ubuntu$ sudo -su hdfs
hdfs$ hadoop jar hadoop-*-examples.jar teragen -Dmapred.map.tasks=100 -Dmapred.reduce.tasks=100 100000000 in_dir
hdfs$ hadoop jar hadoop-*-examples.jar terasort -Dmapred.map.tasks=100 -Dmapred.reduce.tasks=100 in_dir out_dir

While these are running, we can run

$ juju status

to get the URL for the jobmonitor ganglia web frontend

http://<jobmonitor-instance-ec2-url>/ganglia/

and see…

and a little later as the jobs run…

Of course, I’m just playing around with ganglia at the moment… For real performance, I’d change my juju config file to choose larger (and ephemeral) EC2 instances instead of the defaults.

A Few Details…

Let’s grab the charms necessary to reproduce this.

First, let’s install juju and set up a our charms.

$ sudo apt-get install juju charm-tools

Note that I’m describing all this using an Ubuntu laptop to run the juju cli because that’s how I roll, but you can certainly use a Mac to drive your Ubuntu services in the cloud. The juju CLI is already available in ports, but I’m not sure the version. Homebrew packages are in the works. Windows should work too, but I don’t have a clue.

$ mkdir -p ~/charms/oneiric
$ cd ~/charms/oneiric
$ charm get hadoop-master
$ charm get hadoop-slave
$ charm get ganglia

That’s about all that’s really necessary to get you up and benchmarking/monitoring.

I’ll do another post on how to adapt your own charms to use monitoring and the monitor juju interface as part of the “Core Infrastructure” series I’m writing for charm developers. I’ll go over the process of what I had to do to get the hadoop-slave service talking to monitoring services like ganglia.

Until then, clone/test/enjoy… or better yet, fork/adapt/use!

Painless Hadoop / Ubuntu / EC2

2011-11-08 (cloud, juju, hadoop)

#########################################################
NOTE: Repost

The ubuntu project “ensemble” is now publicly known as “juju”. This is a repost of an older article Painless Hadoop / Ubuntu / EC2 to reflect the new names and updates to the api.

#########################################################

Thanks Michael Noll for the posts where I first learned how to do this stuff:

I’d like to run his exact examples, but this time around I’ll use juju for hadoop deployment/management.


The Short Story

Setup

install/configure juju client tools

$ sudo apt-get install juju charm-tools
$ mkdir ~/charms && charm getall ~/charms

run hadoop services with juju

$ juju bootstrap
$ juju deploy --repository ~/charms local:hadoop-master namenode
$ juju deploy --repository ~/charms local:hadoop-slave datanodes
$ juju add-relation namenode datanodes

optionally add datanodes to scale horizontally

$ juju add-unit datanodes
$ juju add-unit datanodes
$ juju add-unit datanodes

(you can add/remove these later too)

Scaling is so easy there’s no point in separate standalone -vs- multinode versions of the setup.

Data and Jobs

Load your data and jars

$ juju ssh namenode/0

ubuntu$ sudo -su hdfs

hdfs$ cd /tmp
hdfs$ wget http://files.markmims.com/gutenberg.tar.bz2
hdfs$ tar xjvf gutenberg.tar.bz2

copy the data into hdfs

hdfs$ hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg

run mapreduce jobs against the dataset

hdfs$ hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar wordcount -Dmapred.map.tasks=20 -Dmapred.reduce.tasks=20 gutenberg gutenberg-output

That’s it!


Now, again with some more details…

Installing juju

Install juju client tools onto your local machine…

# sudo apt-get install juju charm-tools

We’ve got the juju CLI in ports now too for Mac clients (Homebrew is in progress).

Now generate your environment settings with

$ juju

and then edit ~/.juju/environments.yaml to use your EC2 keys. It’ll look something like:

environments:
  sample:
    type: ec2
    control-bucket: juju-<hash>
    admin-secret: <hash>
    access-key: <your ec2 access key>
    secret-key: <your ec2 secret key>
    default-series: oneiric

In real life you’d probably want to specify default-image-type to at least m1.large too, but I’ll give some examples of that in later posts.

Hadoop

Grab the juju charms

Make a place for charms to live

$ mkdir charms/oneiric
$ cd charms/oneiric
$ charm get hadoop-master
$ charm get hadoop-slave

(optionally, you can charm getall but it’ll take a bit to pull all charms).

Start the Hadoop Services

Spin up a juju environment

$ juju bootstrap

wait a minute or two for EC2 to comply. You’re welcome to watch the water boil with

$ juju status

or even

$ watch -n30 juju status

which’ll give you output like

$ juju status
2011-07-12 15:20:54,978 INFO Connecting to environment.
The authenticity of host 'ec2-50-17-28-19.compute-1.amazonaws.com (50.17.28.19)' can't be established.
RSA key fingerprint is c5:21:62:f0:ac:bd:9c:0f:99:59:12:ec:4d:41:48:c8.
Are you sure you want to continue connecting (yes/no)? yes
machines:
  0: {dns-name: ec2-50-17-28-19.compute-1.amazonaws.com, instance-id: i-8bc034ea}
services: {}
2011-07-12 15:21:01,205 INFO 'status' command finished successfully

Next, you need to deploy the hadoop services:

$ juju deploy --repository ~/charms local:hadoop-master namenode
$ juju deploy --repository ~/charms local:hadoop-slave datanodes

now you simply relate the two services:

$ juju add-relation namenode datanodes

Relations are where the juju special sauce is, but more about that in another post.

You can tell everything’s happy when juju status gives you something like (looks a bit different, but basics are the same):

$ juju status
2011-07-12 15:29:20,331 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-50-17-28-19.compute-1.amazonaws.com, instance-id: i-8bc034ea}
  1: {dns-name: ec2-50-17-0-68.compute-1.amazonaws.com, instance-id: i-4fcf3b2e}
  2: {dns-name: ec2-75-101-249-123.compute-1.amazonaws.com, instance-id: i-35cf3b54}
services:
  namenode:
    formula: local:hadoop-master-1
    relations: {hadoop-master: datanodes}
    units:
      namenode/0:
        machine: 1
        relations:
          hadoop-master: {state: up}
        state: started
  datanodes:
    formula: local:hadoop-slave-1
    relations: {hadoop-master: namenode}
    units:
      datanodes/0:
        machine: 2
        relations:
          hadoop-master: {state: up}
        state: started
2011-07-12 15:29:23,685 INFO 'status' command finished successfully

Loading Data

Log into the master node

$ juju ssh namenode/0

and become the hdfs user

ubuntu$ sudo -su hdfs

pull the example data

hdfs$ cd /tmp
hdfs$ wget http://files.markmims.com/gutenberg.tar.bz2
hdfs$ tar xjvf gutenberg.tar.bz2

and copy it into hdfs

hdfs$ hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg

Running Jobs

Similar to above, but now do

hdfs$ hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar wordcount gutenberg gutenberg-output

you might want to explicitly call out the number of jobs to use…

hdfs$ hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar wordcount -Dmapred.map.tasks=20 -Dmapred.reduce.tasks=20 gutenberg gutenberg-output

depending on the size of the cluster you decide to spin up.

You can look at logs on the slaves by

$ juju ssh datanodes/0
ubuntu$ tail /var/log/hadoop/hadoop-hadoop-datanode*.log
ubuntu$ tail /var/log/hadoop/hadoop-hadoop-tasktracker*.log

similarly for subsequent slave nodes if you’ve spun them up

$ juju ssh datanodes/1

or

$ juju ssh datanodes/2

Horizontal Scaling

To resize your cluster,

$ juju add-unit datanodes

or even

$ for i in {1..10}
$ do
$   juju add-unit datanodes
$ done

Wait for juju status to show everything in a happy state and then run your jobs.

I was able to add slave nodes in the middle of a run… they pick up load and crank.

Check out the juju status output for a simple 10-slave cluster here

Ensemble renamed to juju

2011-10-12 (cloud, juju, ensemble, node, mongo)

Just a note that the Ubuntu Ensemble suite of DevOps tools for Ubuntu Server has been renamed to juju.

I’ll be updating previous posts to reflect the name changes so they’ll be up to date.

Where did the 'close-lid' action go in gnome3?

2011-09-13 (howto, gnome, ubuntu, power management)

Here’s a workaround… (thanks slangasek!)

gsettings set org.gnome.settings-daemon.plugins.power lid-close-ac-action 'nothing'