Running the LinuxPlumbers Conference Schedule with Juju

2012-09-25 (cloud, production, juju)

Written by Mark Mims and Chris Johnston

Hey, so last month we ran scheduling for the Linux Plumbers Conference entirely on juju!

Here’s a little background on the experience.

Along the way, we’ll go into a little more detail about running juju in production than the particular problem at hand might warrant. It’s a basic stack of services that’s only alive for 6-months or so… but this discussion applies to bigger longer-running production infrastructures too, so it’s worth going over here.

The App

So summit is this great django app built for scheduling conferences. It’s evolved over time to handle UDS-level traffic and is currently maintained by a Summit Hackers team that includes Chris Johnston and Michael Hall.

Chris contacted me to help him use juju to manage summit for this year’s Plumbers conference. At the time we started this, the 11.10 version of juju wasn’t exactly blessed for production environments, but we decided it’d be a great opportunity to work things out.

The Stack

A typical summit stack’s got postgresql, the django app itself, and a memcached server.

We additionally talked about putting this all behind some sort of a head like haproxy.

This’d let the app scale horizontally as well as give us a stable point to attach an elastic-ip. We decided to not do this at the time b/c we could most likely handle the peak conference load with a single django service-unit provided we slam select snippets of the site into memcached.

This turned out to be true load-wise, but it really would’ve been a whole lot easier to have a nice constant haproxy node out there to tack the elastic-ip to. During development (charm, app, and theme) you want the freedom to destroy a service and respawn it without having to use external tools to go around and attach public IP addresses to the right places. That’s a pain. Also, if there’s a sensitive part of this infrastructure in production, it wouldn’t be postgresql, memcached, or haproxy… the app itself would be the most likely point of instability, so it was a mistake to attach the elastic-ip there.

The Environment

choice of cloud

We chose to use ec2 to host the summit stack… mostly a matter of convenience. The juju openstack-native provider wasn’t completed when we spun up the production environment for linuxplumbers and we didn’t have access to a stable private ubuntu cloud running the openstack-ec2-api at the time. All of this has subsequently landed, so we’d have more options today.

the charms

We forked Michael Nelson’s excellent django charm to create a summit-charm and freely specialized it for summit.

Note that we’re updating this charm for 12.04 here, but this will probably go away in the near future and we’ll just use a generic django charm. It turns out we didn’t do too much here that won’t apply to django apps in general, but more on that another time.

There was nothing special about our tuning of postgresql or memcached. We just used the services provided by the canned charms. These sort of peripheral services aren’t the kind of charms you’re likely to be making changes to or tweaking outside of their exposed config parameters. I know jack about memcached, so I’ll defer to the experts in this regard. Similarly for postgresql… and haproxy if we used it in this stack.

The summit charm is a little different. It’s something we were continuing to tweak during development. Perhaps with future more generic django charm versions, we won’t need to tweak the charm itself… just configure it.

We used a “local” repository for all charms because the charm store hadn’t landed when we were setting this up. Well, now that the charm store is live, you can just deploy the canned charms straight from the store

`juju deploy -e summit memcached`

and keep the ones you want to tweak in a local repository…

`juju deploy -e summit --repository ~/charms local:summit`

all within the same environment. It works out nicely.

control environment

We had multiple people to manage the production summit environment. What’s the best way to do that? It turns out juju supports this pretty well right out of the box. There’s an environment config for the set of ssh public keys to inject into everything in the environment as it starts up… you can read more about that on askubuntu.

Note that this is only useful to configure at the beginning of the stack. Once you’re up, adding keys is problematic. I don’t even recommend trying b/c of the risk of getting undetermined state for the environment. i.e., different nodes with different sets of keys depending on when you changed the keys relative to what actions you’ve performed on the environment. It’s a problem.

What I recommend now is actually to use another juju environment… (and no, we’re not paid to promote cloud providers by the instance :) I wish! ) a dedicated “control” environment. You bootstrap it, then set up a juju client that controls the main production environment. Then set up a shared tmux session that any of the admins for the production environment can use:

Adding/changing the set of admin keys is then done in a single place. This technique isn’t strictly necessary, but it was certainly worth it here with different admins having various different levels of familiarity with the tools. I started it as a teaching tool, left it up because it was an easy control dashboard, and now recommend it because it works so well.

it’s chilly in here

Yeah, so during development you break things. There were a couple of times using 11.10 juju that changes to juju core prevented a client from talking to an existing stack. Aargh! This wasn’t going to fly for production use.

The juju team has subsequently done a bunch to prevent this from happening, but hey we needed production summit working and stable at the time. The answer… freeze the code.

Juju has an environment config option juju-origin to specify where to get the juju installed on all instances in the environment. I branched juju core to lp:~mark-mims/juju/running-summit and just worked straight from there for the lifetime of the environment (still up atm). Easy enough.

Now the tricky part is to make sure that you’re always using the lp:~mark-mims/juju/running-summit version of the juju cli when talking to the production summit environment.

I set up

#!/bin/bash
export JUJU_BRANCH=$HOME/src/juju/running-summit
export PATH=$JUJU_BRANCH/bin:$PATH
export PYTHONPATH=$JUJU_BRANCH

which my tmuxinator config sources into every pane in my summit tmux session.

This was also done on the summit-control instance so it’s easy to make sure we’re all using the right version of the juju cli to talk to the production environment.

backups

The juju ssh subcommand to the rescue. You can do all your standard ssh tricks…

juju ssh postgresql/0 'su postgres pg_dump summit' > summit.dump

… on a cronjob. Juju just stays out of the way and just helps out a bit with the addressing. Real version pipes through bzip2 and adds timestamps of course.

Of course snapshots are easy enough too via euca2ools, but the pgsql dumps themselves turned out to be more useful and easy to get to in case of a failover.

debugging

The biggest debugging activity during development was cleaning up the app’s theming. The summit charm is configured to get the django app itself from one application branch and the theme from a separate theme branch.

So… ahem… “best practice” for theme development would’ve been to develop/tweak the theme locally, then push to the branch. A simple

juju set --config=summit.yaml summit/0

would update config for the live instances.

Well… some of the menus from the base template used absolute paths so it was simpler to cheat a bit early in the process to test it all in-place with actual dns names. Had we been doing this the “right” way from the beginning we would’ve had much more confidence in the stack when practicing recovery and failover later in the cycle… we would’ve been doing it all since day one.

Another thing we had to do was manually test memcached. To test out caching we’d ssh to the memcached instance, stop the service, run memcached verbosely in the foreground. Once we determined everything was working the way we expected, we’d kill it and restart the upstart job.

This is a bug in the memcached charm imo… the option to temporarily run verbosely for debugging should totally be a config option for that service. It’d then be a simple matter of

juju set memcached/0 debug=true

and then

juju ssh memcached/0

to watch some logs. Once we’re convinced it’s working the way it should

juju set memcached/0 debug=false

should make it performant again.

Next time around, we should take more advantage of juju set config to update/reconfigure the app as we made changes… and generally implement a better set of development practices.

monitoring

Sorely lacking. “What? curl doesn’t cut it?”… um… no.

planning for failures

Our notion of failover for this app was just a spare set of cloud credentials and a tested recovery plan.

The plan we practiced was…

  • bootstrap a new environment (using spare credentials if necessary)

  • spin up the summit stack

  • ssh to the new postgresql/0 and drop the db (Note: the postgresql charm should be extended to accept a config parameter of a storage url, S3 in this case, to slurp the db backups from)

  • restore from offsite backups… something along the lines of

    cat summit-$timestamp.dump.bz2 | juju ssh -e failover postgresql/0 ‘bunzip2 -c | su - postgres pgsql summit’

In practice, that took about 10-15minutes to recover once we started acting. Given the additional delay between notification and action, that could spell an hour or two of outtage. That’s not so great.

Juju makes other failover scenarios cheaper and easier to implement than they used to be, so why not put those into place just to be safe? Perhaps the additional instance costs for hot-spares wouldn’t’ve been necessary for the entire 6-months of lead-time for scheduling and planning this conference, but they’d certainly be worth the spend during the few days of the event itself. Juju sort of makes it a no-brainer. We should do more posts on this one issue… the game has changed here.

Lessons Learned

What would we do differently next time? Well, there’s a list :).

  • use the stable ppa… instead of freezing the code
  • sit the app behind haproxy
  • use s3fs or equivalent subordinate charm to manage backups instead of just sshing them off the box
  • better monitoring… we’ve gotten a great set of monitoring charms recently… thanks Clint!
  • log aggregation would’ve been a little bit of overkill for this app, but next time might warrant it
  • it’s cheap to add failover with juju… just do it
  • maybe follow a development process a little more carefully next time around :)
  • we’ll soon have access to a production-stable private ubuntu cloud for these sorts of apps/projects

Scaling a 2000-node Hadoop cluster on EC2/Ubuntu with Juju

2012-06-04 (cloud, hadoop, juju)

Written by Mark Mims and James Page

Lately we’ve been fleshing out our testing frameworks for Juju and Juju Charms. There’s lots of great stuff going on here, so we figured it’s time to start posting about it.

First off, the coolest thing we did during last month’s Ubuntu Developer Summit (UDS) was get the go-ahead to spend more time/effort/money scale-testing Juju.

The Plan

  • pick a service that scales

  • spin up a cluster of units for this service

  • try to run it in a way that actively engages all units of the cluster

  • repeat:

    • instrument
    • profile
    • optimize
    • grow

James, Kapil, Juan, Ben, and Mark sat down over the course of a couple of nights at UDS to take a crack at it. We chose Hadoop. We started with 40 nodes and iterated up 100, 500, 1000 and 2000. Here’re some notes on the process.

Hadoop

Hadoop was a pretty obvious choice here. It’s a great actively-maintained project with a large community of users. It scales in a somewhat known manner, and the hadoop charm makes it super-simple to manage. There are also several known benchmarks that are pretty straightforward to get going, and distribute load throughout the cluster.

There’s an entire science/art to tuning hadoop jobs to run optimally given the characteristics of a particular cluster. Our sole goal in tuning hadoop benchmarks was to engage the entire cluster and profile juju during various activities throughout an actual run. For our purposes, we’re in no hurry… a slower/longer run gives us a good profiling picture for managing the nodes themselves under load (with a sufficient mix of i/o -vs- cpu load).

EC2

Surprisingly enough, we don’t really have that many servers just lying around… so EC2 to the rescue.

Disclaimer… we’re testing our infrastructure tools here, not benchmarking hadoop in EC2. Some folks advocate running hadoop in a cloudy virtualized environment… while some folks are die-hard server huggers. That’s actually a really interesting discussion. It comes down to the actual jobs/problems you’re trying to solve and how those jobs fit in your data pipeline. Please note that we’re not trying to solve that problem here or even provide realistic benchmarking data to contribute to the discussion… we’re simply testing how our infrastructure tools perform at scale.

If you do run hadoop in EC2, Amazon’s Elastic Map Reduce service is likely to perform better at scale in EC2 than just running hadoop itself on general purpose instances. Amazon can do all sorts of stuff internally to show hadoop lots of love. We chose not to use EMR because we’re interested in testing how juju performs with generic Ubuntu Server images, not EMR… at least for now.

Note that stock EC2 accounts limit you to something like 20 instances. To grow beyond that, you have to ask AWS to bump up your limits.

Juju

We started scale testing from a fresh branch of juju trunk… what gets deployed to the PPA nightly… this freed us up to experiment with live changes to add instrumentation, profiling information, and randomly mess with code as necessary. This also locks in the branch of juju that the scale testing environment uses.

As usual, juju will keep track of the state of our infrastructure going forward and we can make changes as necessary via juju commands. To bootstrap and spin up the initial environment we’ll just use shell scripts wrapping juju commands.

Spinning up a cluster

These scripts are really just hadoop versions of some standard juju demo scripts such as those used for a simple rails stack or a more realistic HA wiki stack.

The hadoop scripts for EC2 will get a little more complex as we grow simply because we don’t want AWS to think we’re a DoS attack… we’ll pace ourselves during spinup.

From the hadoop charm’s readme, the basic steps to spinning up a simple combined hdfs and mapreduce cluster are:

juju bootstrap

juju deploy hadoop hadoop-master
juju deploy -n3 hadoop hadoop-slavecluster

juju add-relation hadoop-master:namenode hadoop-slavecluster:datanode
juju add-relation hadoop-master:jobtracker hadoop-slavecluster:tasktracker

which we expand on a bit to start with a base startup script that looks like:

#!/bin/bash

juju_root="/home/ubuntu/scale"
juju_env=${1:-"-escale"}

###

echo "deploying stack"

juju bootstrap $juju_env

deploy_cluster() {
  local cluster_name=$1

  juju deploy $juju_env --repository "$juju_root/charms" --constraints="instance-type=m1.large" --config "$juju_root/etc/hadoop-master.yaml" local:hadoop ${cluster_name}-master

  juju deploy $juju_env --repository "$juju_root/charms" --constraints="instance-type=m1.medium" --config "$juju_root/etc/hadoop-slave.yaml" -n 37 local:hadoop ${cluster_name}-slave

  juju add-relation $juju_env ${cluster_name}-master:namenode ${cluster_name}-slave:datanode
  juju add-relation $juju_env ${cluster_name}-master:jobtracker ${cluster_name}-slave:tasktracker

  juju expose $juju_env ${cluster_name}-master

}

deploy_cluster hadoop

echo "done"

and then manually adjust this for cluster size.

Configuring Hadoop

Note that we’re specifying constraints to tell juju to use different sized ec2 instances for different juju services. We’d like an m1.large for the hadoop master

juju deploy ... --constraints "instance-type=m1.large" ... hadoop-master

and m1.mediums for the slaves

juju deploy ... --constraints "instance-type=m1.medium" ... hadoop-slave

Note that we’ll also pass config files to specify different heap sizes for the different memory footprints

juju deploy ... --config "hadoop-master.yaml" ... hadoop-master

where hadoop-master.yaml looks like

# m1.large
hadoop-master:
  heap: 2048
  dfs.block.size: 134217728
  dfs.namenode.handler.count: 20
  mapred.reduce.parallel.copies: 50
  mapred.child.java.opts: -Xmx512m
  mapred.job.tracker.handler.count: 60
#  fs.inmemory.size.mb: 200
  io.sort.factor: 100
  io.sort.mb: 200
  io.file.buffer.size: 131072
  tasktracker.http.threads: 50
  hadoop.dir.base: /mnt/hadoop

and

juju deploy ... --config "hadoop-slave.yaml" ... hadoop-slave

where hadoop-slave.yaml looks like

# m1.medium
hadoop-slave:
  heap: 1024
  dfs.block.size: 134217728
  dfs.namenode.handler.count: 20
  mapred.reduce.parallel.copies: 50
  mapred.child.java.opts: -Xmx512m
  mapred.job.tracker.handler.count: 60
#  fs.inmemory.size.mb: 200
  io.sort.factor: 100
  io.sort.mb: 200
  io.file.buffer.size: 131072
  tasktracker.http.threads: 50
  hadoop.dir.base: /mnt/hadoop

Note also that we also have our juju environment configured to use instance-store images… juju defaults to ebs-rooted images, but that’s not a great idea with hdfs. You specify this by adding a default-image-id into your ~/.juju/environments.yaml file. This gave each of our instances an extra ~400G local drive on /mnt… hence the hadoop.dir.base of /mnt/hadoop in the config above.

40 nodes and 100 nodes

Both the 40-node and 100-node runs went as smooth as silk. The only thing to note was that it took a while to get AWS to increase our account limits to allow for 100+ nodes.

500 nodes

Once we had permission from Amazon to spin up 500 nodes on our account, we initially just naively spun up 500 instances… and quickly got throttled.

No particular surprise, we’re not specifying multiplicity in the ec2 api, nor are we using an auto scaling group… we must look like a DoS attack.

The order was eventually fulfilled, and juju waited around for it. Everything ran as expected, it just took about an hour and 15 minutes to spin up the stack. This gave us a nice little cluster with HDFS storage of almost 200TB

The hadoop terasort job was run from the following script

#!/bin/bash

SIZE=10000000000
NUM_MAPS=1500
NUM_REDUCES=1500
IN_DIR=in_dir
OUT_DIR=out_dir

hadoop jar /usr/lib/hadoop/hadoop-examples*.jar teragen -Dmapred.map.tasks=${NUM_MAPS} ${SIZE} ${IN_DIR}

sleep 10

hadoop jar /usr/lib/hadoop/hadoop-examples*.jar terasort -Dmapred.reduce.tasks=${NUM_REDUCES} ${IN_DIR} ${OUT_DIR}

which, with a replfactor of 3, engaged the entire cluster just fine, and ran terasort with no problems

Juju itself seemed to work great in this run, but this brought up a couple of basic optimizations against the EC2 api:

- pass the '-n' options directly to the provisioning agent... don't expand `juju deploy -n <num_units>` and `juju add-unit -n <num_units>` in the client
- pass these along all the way to the ec2 api... don't expand these into multiple api calls

We’ll add those to the list of things to do.

1000 nodes

Onward, upward!

To get around the api throttling, we start up batches of 99 slaves at a time with a 2-minute wait between each batch

#!/bin/bash

juju_env=${1:-"-escale"}
juju_root="/home/ubuntu/scale"
juju_repo="$juju_root/charms"

############################################

timestamp() {
  date +"%G-%m-%d-%H%M%S"
}

add_more_units() {
  local num_units=$1
  local service_name=$2

  echo "sleeping"
  sleep 120

  echo "adding another $num_units units at $(timestamp)"
  juju add-unit $juju_env -n $num_units $service_name
}

deploy_slaves() {
  local cluster_name=$1
  local slave_config="$juju_root/etc/hadoop-slave.yaml"
  local slave_size="instance-type=m1.medium"
  local slaves_at_a_time=99
  #local num_slave_batches=10

  juju deploy $juju_env --repository $juju_repo --constraints $slave_size --config $slave_config -n $slaves_at_a_time local:hadoop ${cluster_name}-slave
  echo "deployed $slaves_at_a_time slaves"

  juju add-relation $juju_env ${cluster_name}-master:namenode ${cluster_name}-slave:datanode
  juju add-relation $juju_env ${cluster_name}-master:jobtracker ${cluster_name}-slave:tasktracker

  for i in {1..9}; do
    add_more_units $slaves_at_a_time ${cluster_name}-slave
    echo "deployed $slaves_at_a_time slaves at $(timestamp)"
  done
}

deploy_cluster() {
  local cluster_name=$1
  local master_config="$juju_root/etc/hadoop-master.yaml"
  local master_size="instance-type=m1.large"

  juju deploy $juju_env --repository $juju_repo --constraints $master_size --config $master_config local:hadoop ${cluster_name}-master

  deploy_slaves ${cluster_name}

  juju expose $juju_env ${cluster_name}-master
}

main() {
  echo "deploying stack at $(timestamp)"

  juju bootstrap $juju_env --constraints="instance-type=m1.xlarge"

  sleep 120
  deploy_cluster hadoop

  echo "done at $(timestamp)"
}
main $*
exit 0

We experimented with more clever ways of doing the spinup (too little coffee at this point of the night)… but the real fix is to get juju to take advantage of multiplicity in api calls. Until then, timed batches work just fine.

Juju spun the cluster up in about 2 and a half hours. It had about 380TB of HDFS storage

The terasort job that was run from the script above with

SIZE=10000000000
NUM_MAPS=3000
NUM_REDUCES=3000

eventually completed.

2000 nodes

After the 1000-node run, we chose to clean up from the previous job and just add more nodes to that same cluster.

Again, to get around the api throttling, we added batches of 99 slaves at a time with a 2-minute wait between each batch until we got near 2000 slaves.

This gave us almost 760TB of HDFS storage

and was running fine

but was stopped early b/c waiting for the job to complete would’ve just been silly at this point. With our naive job config, we’re considerably past the point of diminishing returns for adding nodes to the actual terasort, and we’d captured the profiling info we needed at this point.

Juju spun up 1972 slaves in just over seven hours total. Profiling showed that juju was spending a lot of time serializing stuff into zookeeper nodes using yaml. It looks like python’s yaml implementation is python, and not just wrapping libyaml. We tested a smaller run replacing the internal yaml serialization with json.. Wham! two orders of magnitude faster. No particular surprise.

Lessons Learned

Ok, so at the end of the day, what did we learn here?

What we did here is the way developing for performance at scale should be done… start with a naive, flexible approach and then spend time and effort obtaining real profiling information. Follow that with optimization decisions that actually make a difference. Otherwise it’s all just a crapshoot based on where developers think the bottlenecks might be.

Things to do to juju as a result of these tests:

  • streamline our implementation of ‘-n’ options

    • the client should pass the multiplicity to the provisioning agent
    • the provisioning agent should pass the multiplicity to the EC2 api
  • don’t use yaml to marshall data in and out of zookeeper

  • replace per-instance security groups with per-instance firewalls

What’s Next?

So that’s a big enough bite for one round of scale testing.

Next up:

  • land a few of the changes outlined above into trunk. Then, spin up another round of scale tests to look at the numbers.
  • more providers (other clouds as well as a MaaS lab too)
  • regular scale testing? - can this coincide with upstream scale testing for projects like hadoop?
  • test scaling for various services? What does this look like for other stacks of services?

Wishlist

  • find some better test jobs! benchmarks are boring… perhaps we can use this compute time to mine educational data or cure cancer or something?

  • perhaps push juju topology information further into zk leaf nodes? Are there transactional features in more recent versions of zk that we can use?

  • use spot instances on ec2. This is harder because you’ve gotta incorporate price monitoring.

Charm School!

2011-11-22 (cloud, charmschool, juju)

Wanna learn more about juju?

Drop by Charm School:

Details from Jorge’s post:

We're holding a Charm School on IRC.

juju Charm School is a virtual event where a juju expert
is available to answer questions about writing your own
juju charms. The intended audience are people who deploy
software and want to contribute charms to the wider devops
community to make deploying in the public and private
cloud easy.

Attendees are more than welcome to:

Ask questions about juju and charms
Ask for help modifying existing scripts and make charms out of them
Ask for peer review on existing charms you might be working on

Though not required, we recommend that you have juju installed
and configured if you want to get deep into the event.

Monitoring Hadoop Benchmarks TeraGen/TeraSort with Ganglia

2011-11-08 (cloud, hadoop, juju)

#########################################################
NOTE: Repost

The ubuntu project “ensemble” is now publicly known as “juju”. This is a repost of an older article Monitoring Hadoop Benchmarks TeraGen/TeraSort with Ganglia to reflect the new names and updates to the api.

#########################################################

Here I’m using new features of Ubuntu Server (namely juju) to easily deploy Ganglia alongside a small Hadoop cluster to play around with monitoring some benchmarks like Terasort.

Short Story

Deploy hadoop and ganglia using juju:

$ juju bootstrap
$ juju deploy --repository "~/charms"  local:hadoop-master namenode
$ juju deploy --repository "~/charms"  local:ganglia jobmonitor
$ juju deploy --repository "~/charms"  local:hadoop-slave datacluster
$ juju add-relation namenode datacluster
$ juju add-relation jobmonitor datacluster
$ for i in {1..6}; do
$   juju add-unit datacluster
$ done
$ juju expose jobmonitor

When all is said and done (and EC2 has caught up), run the jobs

$ juju ssh namenode/0
ubuntu$ sudo -su hdfs
hdfs$ hadoop jar hadoop-*-examples.jar teragen -Dmapred.map.tasks=100 -Dmapred.reduce.tasks=100 100000000 in_dir
hdfs$ hadoop jar hadoop-*-examples.jar terasort -Dmapred.map.tasks=100 -Dmapred.reduce.tasks=100 in_dir out_dir

While these are running, we can run

$ juju status

to get the URL for the jobmonitor ganglia web frontend

http://<jobmonitor-instance-ec2-url>/ganglia/

and see…

and a little later as the jobs run…

Of course, I’m just playing around with ganglia at the moment… For real performance, I’d change my juju config file to choose larger (and ephemeral) EC2 instances instead of the defaults.

A Few Details…

Let’s grab the charms necessary to reproduce this.

First, let’s install juju and set up a our charms.

$ sudo apt-get install juju charm-tools

Note that I’m describing all this using an Ubuntu laptop to run the juju cli because that’s how I roll, but you can certainly use a Mac to drive your Ubuntu services in the cloud. The juju CLI is already available in ports, but I’m not sure the version. Homebrew packages are in the works. Windows should work too, but I don’t have a clue.

$ mkdir -p ~/charms/oneiric
$ cd ~/charms/oneiric
$ charm get hadoop-master
$ charm get hadoop-slave
$ charm get ganglia

That’s about all that’s really necessary to get you up and benchmarking/monitoring.

I’ll do another post on how to adapt your own charms to use monitoring and the monitor juju interface as part of the “Core Infrastructure” series I’m writing for charm developers. I’ll go over the process of what I had to do to get the hadoop-slave service talking to monitoring services like ganglia.

Until then, clone/test/enjoy… or better yet, fork/adapt/use!

Painless Hadoop / Ubuntu / EC2

2011-11-08 (cloud, juju, hadoop)

#########################################################
NOTE: Repost

The ubuntu project “ensemble” is now publicly known as “juju”. This is a repost of an older article Painless Hadoop / Ubuntu / EC2 to reflect the new names and updates to the api.

#########################################################

Thanks Michael Noll for the posts where I first learned how to do this stuff:

I’d like to run his exact examples, but this time around I’ll use juju for hadoop deployment/management.


The Short Story

Setup

install/configure juju client tools

$ sudo apt-get install juju charm-tools
$ mkdir ~/charms && charm getall ~/charms

run hadoop services with juju

$ juju bootstrap
$ juju deploy --repository ~/charms local:hadoop-master namenode
$ juju deploy --repository ~/charms local:hadoop-slave datanodes
$ juju add-relation namenode datanodes

optionally add datanodes to scale horizontally

$ juju add-unit datanodes
$ juju add-unit datanodes
$ juju add-unit datanodes

(you can add/remove these later too)

Scaling is so easy there’s no point in separate standalone -vs- multinode versions of the setup.

Data and Jobs

Load your data and jars

$ juju ssh namenode/0

ubuntu$ sudo -su hdfs

hdfs$ cd /tmp
hdfs$ wget http://files.markmims.com/gutenberg.tar.bz2
hdfs$ tar xjvf gutenberg.tar.bz2

copy the data into hdfs

hdfs$ hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg

run mapreduce jobs against the dataset

hdfs$ hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar wordcount -Dmapred.map.tasks=20 -Dmapred.reduce.tasks=20 gutenberg gutenberg-output

That’s it!


Now, again with some more details…

Installing juju

Install juju client tools onto your local machine…

# sudo apt-get install juju charm-tools

We’ve got the juju CLI in ports now too for Mac clients (Homebrew is in progress).

Now generate your environment settings with

$ juju

and then edit ~/.juju/environments.yaml to use your EC2 keys. It’ll look something like:

environments:
  sample:
    type: ec2
    control-bucket: juju-<hash>
    admin-secret: <hash>
    access-key: <your ec2 access key>
    secret-key: <your ec2 secret key>
    default-series: oneiric

In real life you’d probably want to specify default-image-type to at least m1.large too, but I’ll give some examples of that in later posts.

Hadoop

Grab the juju charms

Make a place for charms to live

$ mkdir charms/oneiric
$ cd charms/oneiric
$ charm get hadoop-master
$ charm get hadoop-slave

(optionally, you can charm getall but it’ll take a bit to pull all charms).

Start the Hadoop Services

Spin up a juju environment

$ juju bootstrap

wait a minute or two for EC2 to comply. You’re welcome to watch the water boil with

$ juju status

or even

$ watch -n30 juju status

which’ll give you output like

$ juju status
2011-07-12 15:20:54,978 INFO Connecting to environment.
The authenticity of host 'ec2-50-17-28-19.compute-1.amazonaws.com (50.17.28.19)' can't be established.
RSA key fingerprint is c5:21:62:f0:ac:bd:9c:0f:99:59:12:ec:4d:41:48:c8.
Are you sure you want to continue connecting (yes/no)? yes
machines:
  0: {dns-name: ec2-50-17-28-19.compute-1.amazonaws.com, instance-id: i-8bc034ea}
services: {}
2011-07-12 15:21:01,205 INFO 'status' command finished successfully

Next, you need to deploy the hadoop services:

$ juju deploy --repository ~/charms local:hadoop-master namenode
$ juju deploy --repository ~/charms local:hadoop-slave datanodes

now you simply relate the two services:

$ juju add-relation namenode datanodes

Relations are where the juju special sauce is, but more about that in another post.

You can tell everything’s happy when juju status gives you something like (looks a bit different, but basics are the same):

$ juju status
2011-07-12 15:29:20,331 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-50-17-28-19.compute-1.amazonaws.com, instance-id: i-8bc034ea}
  1: {dns-name: ec2-50-17-0-68.compute-1.amazonaws.com, instance-id: i-4fcf3b2e}
  2: {dns-name: ec2-75-101-249-123.compute-1.amazonaws.com, instance-id: i-35cf3b54}
services:
  namenode:
    formula: local:hadoop-master-1
    relations: {hadoop-master: datanodes}
    units:
      namenode/0:
        machine: 1
        relations:
          hadoop-master: {state: up}
        state: started
  datanodes:
    formula: local:hadoop-slave-1
    relations: {hadoop-master: namenode}
    units:
      datanodes/0:
        machine: 2
        relations:
          hadoop-master: {state: up}
        state: started
2011-07-12 15:29:23,685 INFO 'status' command finished successfully

Loading Data

Log into the master node

$ juju ssh namenode/0

and become the hdfs user

ubuntu$ sudo -su hdfs

pull the example data

hdfs$ cd /tmp
hdfs$ wget http://files.markmims.com/gutenberg.tar.bz2
hdfs$ tar xjvf gutenberg.tar.bz2

and copy it into hdfs

hdfs$ hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg

Running Jobs

Similar to above, but now do

hdfs$ hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar wordcount gutenberg gutenberg-output

you might want to explicitly call out the number of jobs to use…

hdfs$ hadoop jar /usr/lib/hadoop-0.20/hadoop-examples.jar wordcount -Dmapred.map.tasks=20 -Dmapred.reduce.tasks=20 gutenberg gutenberg-output

depending on the size of the cluster you decide to spin up.

You can look at logs on the slaves by

$ juju ssh datanodes/0
ubuntu$ tail /var/log/hadoop/hadoop-hadoop-datanode*.log
ubuntu$ tail /var/log/hadoop/hadoop-hadoop-tasktracker*.log

similarly for subsequent slave nodes if you’ve spun them up

$ juju ssh datanodes/1

or

$ juju ssh datanodes/2

Horizontal Scaling

To resize your cluster,

$ juju add-unit datanodes

or even

$ for i in {1..10}
$ do
$   juju add-unit datanodes
$ done

Wait for juju status to show everything in a happy state and then run your jobs.

I was able to add slave nodes in the middle of a run… they pick up load and crank.

Check out the juju status output for a simple 10-slave cluster here