<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Northern Nevada Software Developers Group</title>
	<atom:link href="http://softwaredevelopersgroup.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://softwaredevelopersgroup.com</link>
	<description></description>
	<lastBuildDate>Thu, 23 Feb 2012 05:05:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>NuGet Project Uncovered: SpecificationExtensions.[MSTest &#124; NUnit &#124; Xunit]</title>
		<link>http://elegantcode.com/2012/02/22/nuget-project-uncovered-specificationextensions-mstest-nunit-xunit/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-specificationextensions-mstest-nunit-xunit</link>
		<comments>http://elegantcode.com/2012/02/22/nuget-project-uncovered-specificationextensions-mstest-nunit-xunit/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-specificationextensions-mstest-nunit-xunit#comments</comments>
		<pubDate>Thu, 23 Feb 2012 05:05:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/22/nuget-project-uncovered-specificationextensions-mstest-nunit-xunit/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. NOTE: this project is one I created and as it turns out this has now become it’s introductory post. The SpecificationExtensions.[MSTest &#124; NUnit &#124; Xunit] are a set of NuGet packages [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<blockquote><p>NOTE: this project is one I created and as it turns out this has now become it’s introductory post.</p>
</blockquote>
<p>The SpecificationExtensions.[MSTest | NUnit | Xunit] are a set of NuGet packages that add C# <a href="http://staxmanade.blogspot.com/2009/02/fluent-specification-extensions.html" >fluent specification extensions</a> to your test project. I first blogged about this in early 2009 and have had a set of these that I take with me for every project I work on.</p>
<p>There are a number of other options out there for specification extensions, but since I first created my original set, I haven’t used anything else (although I should as I might be able to learn a little from each).</p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb10_thumb1.png"><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb10_thumb" border="0" alt="image_thumb10_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb10_thumb_thumb.png" width="339" height="202" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/22/nuget-project-uncovered-specificationextensions-mstest-nunit-xunit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: EventAggregator.Net</title>
		<link>http://elegantcode.com/2012/02/21/nuget-project-uncovered-eventaggregator-net/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-eventaggregator-net</link>
		<comments>http://elegantcode.com/2012/02/21/nuget-project-uncovered-eventaggregator-net/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-eventaggregator-net#comments</comments>
		<pubDate>Wed, 22 Feb 2012 05:04:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/21/nuget-project-uncovered-eventaggregator-net/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. NOTE: this project is one I created and as it turns out this has now become it’s introductory post. EventAggregator.Net is a single C# file that can provide a basis for [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<blockquote><p>NOTE: this project is one I created and as it turns out this has now become it’s introductory post.</p>
</blockquote>
<p><a href="http://nuget.org/packages/EventAggregator.Net" >EventAggregator.Net</a> is a single C# file that can provide a basis for a simple in memory Pub/Sub event aggregator.</p>
<p>I extracted this out of my <a href="http://statlight.codeplex.com" >StatLight</a> project as I found that I often wanted a similar one and kept finding myself copy/pasting this into projects. I figured a single location for this project would be better and I use StatLight as the first dog bowl when I need to dog food the project.</p>
<p>If you’re familiar with the…</p>
<blockquote><p>Install-Package Caliburn.Micro.EventAggregator      </p>
</blockquote>
<p>…then you know probably know what this project is like. </p>
<p>Its history starts a few years ago when I read <a href="http://jeremydmiller.com/" >Jeremey Miller’s</a> Braindump on <a href="http://codebetter.com/jeremymiller/2009/07/22/braindump-on-the-event-aggregator-pattern/" >Event Aggregator Pattern</a> and decided I wanted rip out StatLight’s usage of the Prism event aggregator and replace it with a similar one to the one found in StoryTeller. It’s gone through quite a few revisions inside of StatLight since then and eventually made its way into its own project.</p>
<p>Some thanks have to go out to the great feedback and pull requests from <a href="https://github.com/JakeGinnivan" >Jake Ginnivan</a> who found this project on his own (before I publicized it).</p>
<p>If you’re interested in using it, I’d recommend checking out the source’s test project and the SampleUsage project. The SampleUsage project demonstrates how you can configure the tool to publish events in an async mode.</p>
<p>One concept introduced in this EventAggregator is taking the IEventAggregator interface and breaking it up into two interfaces (<strong>IEventPublisher</strong> and <strong>IEventSubscriptionManager</strong>). This proved extremely useful when trying to diagnose components that did both aggregator subscription management vs ones that only published events. It even helped to easily diagnose components that did not correctly unregister objects.</p>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/21/nuget-project-uncovered-eventaggregator-net/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: DumpToText</title>
		<link>http://elegantcode.com/2012/02/20/nuget-project-uncovered-dumptotext/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-dumptotext</link>
		<comments>http://elegantcode.com/2012/02/20/nuget-project-uncovered-dumptotext/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-dumptotext#comments</comments>
		<pubDate>Tue, 21 Feb 2012 05:01:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/20/nuget-project-uncovered-dumptotext/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. NOTE: this project is one I created and as it turns out this has now become its introductory post. DumpToText is a single C# extension I wrote a little while back. [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<blockquote><p>NOTE: this project is one I created and as it turns out this has now become its introductory post.</p>
</blockquote>
<p><a href="http://nuget.org/packages/DumpToText" >DumpToText</a> is a single C# extension I wrote a little while back. The inspiration from this came from the need to view the values of an object graph quickly and easily during a TDD session.</p>
<p>Have you ever been doing TDD and something isn’t working quite as expected? Would it be nice to just dump out the values of an object quickly without having to spin up the debugger?</p>
<p>The inspiration for this project came from an amazing feature of <a href="http://www.linqpad.net/" >LINQPad</a>. If you have ever used <a href="http://www.linqpad.net/" >LINQPad</a> then you’re aware of the amazing ability for it to take any object and create a view of it’s data. Take the simple anonymous type below.</p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb2_thumb.png"><img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb2_thumb" border="0" alt="image_thumb2_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb2_thumb_thumb.png" width="471" height="315" /></a></p>
<p>Now wouldn’t it be great to have that “.Dump()” extension method at hand anywhere in your code and during a TDD session?</p>
<p>That’s why I create <a href="https://github.com/staxmanade/DumpToText" >DumpToText</a>.</p>
<p>Now if I have a test as follows and want to see it’s data. I can use the ‘.DumpToText()” extension method to have it print out an ASCII based representation of the object graph.</p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb4_thumb.png"><img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb4_thumb" border="0" alt="image_thumb4_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb4_thumb_thumb.png" width="323" height="178" /></a></p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb6_thumb.png"><img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb6_thumb" border="0" alt="image_thumb6_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb6_thumb_thumb.png" width="390" height="156" /></a></p>
<p>By default this just uses the System.Diagnostics.Trace(…) to write the output to, but you can override the “write” implementation by giving your own delegate as shown below.</p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb8_thumb.png"><img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb8_thumb" border="0" alt="image_thumb8_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb8_thumb_thumb.png" width="871" height="52" /></a></p>
<p>The below shows an example of a nested object that also has an array of items.</p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb14_thumb.png"><img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb14_thumb" border="0" alt="image_thumb14_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb14_thumb_thumb.png" width="264" height="171" /></a></p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb13_thumb1.png"><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image_thumb13_thumb1" border="0" alt="image_thumb13_thumb1" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb13_thumb1_thumb.png" width="573" height="349" /></a></p>
<h5>Anyone out there using <a href="http://approvaltests.sourceforge.net/" >ApprovalTests</a>? (You can get it on NuGet)</h5>
<p>I’ve not taken the chance to use ApprovalTests yet in a project, but I have a strong feeling that my DumpToText helper could be very useful when leveraged in conjunction with ApprovalTests. If anyone out there is using ApprovalTests, I’d love to hear how it’s going, and if you think that DumpToText would be useful there.</p>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/20/nuget-project-uncovered-dumptotext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: Nancy</title>
		<link>http://elegantcode.com/2012/02/19/nuget-project-uncovered-nancy/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-nancy</link>
		<comments>http://elegantcode.com/2012/02/19/nuget-project-uncovered-nancy/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-nancy#comments</comments>
		<pubDate>Mon, 20 Feb 2012 04:59:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/19/nuget-project-uncovered-nancy/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. Nancy is another project founded by an elegant coder. Andreas has blogged about it a number of times here on ElegantCode. Nancy is a lightweight HTTP framework for building web services [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<p><a href="http://nuget.org/packages/Nancy" >Nancy</a> is another project founded by an elegant coder. <a href="http://elegantcode.com/about/andreas-hakansson/" >Andreas</a> has blogged about it a <a href="http://bit.ly/xIlUmy" >number of times here</a> on ElegantCode. </p>
<blockquote><p>Nancy is a lightweight HTTP framework for building web services and sites. The framework runs on both the .net framework and <a href="http://mono-project.com/">Mono</a>.</p>
</blockquote>
<p>I have not used this project myself, but as I started to look it over I think I might have to spin up a site quickly just to try it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/19/nuget-project-uncovered-nancy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: Extended.Wpf.Toolkit</title>
		<link>http://elegantcode.com/2012/02/18/nuget-project-uncovered-extended-wpf-toolkit/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-extended-wpf-toolkit</link>
		<comments>http://elegantcode.com/2012/02/18/nuget-project-uncovered-extended-wpf-toolkit/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-extended-wpf-toolkit#comments</comments>
		<pubDate>Sun, 19 Feb 2012 04:58:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/18/nuget-project-uncovered-extended-wpf-toolkit/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. Extended.Wpf.Toolkit is a project that should not need an introduction, and if you follow this blog you’ve probably heard Brian talk about it. If not, check out some of the posts [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<p><a href="http://nuget.org/packages/Extended.Wpf.Toolkit" >Extended.Wpf.Toolkit</a> is a project that should not need an introduction, and if you follow this blog you’ve probably heard <a href="http://elegantcode.com/about/brian-lagunas/" >Brian</a> talk about it. If not, check out some of the posts on the <a href="http://bit.ly/yV7cma" >Extended WPF Toolkit! </a></p>
<p>This project is one of the most download on codeplex, discussed on <a href="http://channel9.msdn.com" >Channel9</a>, Coding4Fun, and is being leveraged by <a href="http://www.telerik.com" >Telerik</a> in their <a href="http://www.telerik.com/products/orm.aspx" >Open Access ORM</a> product.</p>
<h5>Below is a sample of some of the controls you can find in the toolkit:    <br />(but make sure you check out the <a href="http://wpftoolkit.codeplex.com/" >project site</a> for the full list of controls)</h5>
<ul>
<li>BusyIndicator </li>
<li>Calculator </li>
<li>ChildWindow </li>
<li>ColorCanvas </li>
<li>ColorPicker </li>
<li>DateTimePicker </li>
<li>Magnifier </li>
<li>MultiLineTextEditor </li>
<li>PrimitiveTypeCollectionEditor </li>
<li>RichTextBox </li>
<li>SplitButton </li>
<li>WatermarkTextBox </li>
<li>Wizard </li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/18/nuget-project-uncovered-extended-wpf-toolkit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: TranslatorService.Speech</title>
		<link>http://elegantcode.com/2012/02/17/nuget-project-uncovered-translatorservice-speech/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-translatorservice-speech</link>
		<comments>http://elegantcode.com/2012/02/17/nuget-project-uncovered-translatorservice-speech/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-translatorservice-speech#comments</comments>
		<pubDate>Sat, 18 Feb 2012 04:56:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/17/nuget-project-uncovered-translatorservice-speech/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. TranslatorService.Speech is a small little wrapper around the Bing text to speech API. You will need to get a Bing api key to leverage this. Below is a sample usage of [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<p><a href="http://nuget.org/packages/TranslatorService.Speech" >TranslatorService.Speech</a> is a small little wrapper around the Bing text to speech API. You will need to get a Bing api key to leverage this.</p>
<p>Below is a sample usage of the library.</p>
<blockquote><pre>SpeechSynthesizer speech = new SpeechSynthesizer(APP_ID);   </pre>
<pre>// To obtain a Bing Application ID, go to http://msdn.microsoft.com/en-us/library/ff512420.aspx  

string text = &quot;Have a nice day!&quot;;
string language = &quot;en&quot;; 

using (Stream stream = speech.GetSpeakStream(text, language))
{
    using (SoundPlayer player = new SoundPlayer(stream))
        player.PlaySync();
}</pre>
</blockquote>
<p>I threw that into a quick test and was quite impress with how fast it worked.</p>
<p>Unfortunately I tried to translate a larger “paragraph” or so of text and saw.</p>
<blockquote>
<p>System.Net.WebException : The remote server returned an error: (400) Bad Request. </p>
<p>at System.Net.HttpWebRequest.GetResponse()<br />
    <br />at TranslatorService.Speech.SpeechSynthesizer.GetSpeakStream(String text, String language) </p>
<p>at NuGetTestProject.Sample.SampleTest() in <a href="projectfile:8A4BCF1F-3D9C-404F-AEC9-AB7DC9F7BFE9/f%3AClass1.cs%3F14%3F1">Class1.cs: line 14</a></p>
</blockquote>
<p>I didn’t take the time to diagnose why, whether it’s a Bing problem or this library.</p>
<p>Some other observations:</p>
<ul>
<li>+ It supports some Async methods as well </li>
<li>- The Async methods don’t support the standard APM so I couldn’t easily wrap a task around. (I know there are some ways to make it work, but it’s not out of the box easy…) </li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/17/nuget-project-uncovered-translatorservice-speech/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: JsValidator</title>
		<link>http://elegantcode.com/2012/02/16/nuget-project-uncovered-jsvalidator/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-jsvalidator</link>
		<comments>http://elegantcode.com/2012/02/16/nuget-project-uncovered-jsvalidator/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-jsvalidator#comments</comments>
		<pubDate>Fri, 17 Feb 2012 04:55:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/16/nuget-project-uncovered-jsvalidator/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. JsValidator, on first glance, is a down right awesome gem of a NuGet. I’ve heard of Google’s Closure Compiler before, but have never used it. This NuGet makes it a snap [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<p><a href="http://nuget.org/packages/JsValidator" >JsValidator</a>, on first glance, is a down right awesome gem of a NuGet. I’ve heard of <a href="http://code.google.com/closure/compiler/" >Google’s Closure Compiler</a> before, but have never used it. This NuGet makes it a snap to use inside Visual Studio.</p>
<p>When you first install the package:</p>
<blockquote><p><font style="background-color: #ffffff">Install-Package JsValidator;</font></p>
</blockquote>
<p>This NuGet tool automatically updates your project so it will execute the tool on compilation. Awesome if you are not an MSBuild expert as it just configures itself for you straight from the package install.</p>
<p>Once installed, run a build on your solution. On first build you get an error. You’re probably thinking &#8216;”How can this tool be erroring, we just added it to the project”. This error is a good thing. It’s telling you that you now have a manual step to configure it correctly.</p>
<p>Go check out the <a href="http://jsvalidator.codeplex.com/" >codeplex site</a> as it will explain more on how to use it.</p>
<p>I just might consider this on my next new javascript project in Visual Studio.net.</p>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/16/nuget-project-uncovered-jsvalidator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: FastMember</title>
		<link>http://elegantcode.com/2012/02/15/nuget-project-uncovered-fastmember/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-fastmember</link>
		<comments>http://elegantcode.com/2012/02/15/nuget-project-uncovered-fastmember/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-fastmember#comments</comments>
		<pubDate>Thu, 16 Feb 2012 04:48:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/15/nuget-project-uncovered-fastmember/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. FastMember would be something you pull in if you’re trying to do reflection to read properties of your objects, but starting to notice some performance issues. This project will emit some [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<p><a href="http://nuget.org/packages/FastMember" >FastMember</a> would be something you pull in if you’re trying to do reflection to read properties of your objects, but starting to notice some performance issues. This project will emit some IL at runtime that can read your properties much faster than good old reflection (like below)</p>
<blockquote><p><font style="background-color: #ffffff">var value = typeof(MyObject).GetProperty(“MyProp”).GetValue(myObjInstance, new object[0]);</font></p>
</blockquote>
<p>Replace the above with.</p>
<blockquote><pre>var accessor = TypeAccessor.Create(typeof(MyObject)); var value = accessor[obj, “MyProp”];</pre>
</blockquote>
<p>This project lets you either read or assign values to properties. (See my test below)</p>
<p><a href="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb1_thumb.png"><img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb1_thumb" border="0" alt="image_thumb1_thumb" src="http://elegantcode.com/wp-content/uploads/2012/01/image_thumb1_thumb_thumb.png" width="466" height="343" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/15/nuget-project-uncovered-fastmember/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NuGet Project Uncovered: FakeO</title>
		<link>http://elegantcode.com/2012/02/14/nuget-project-uncovered-fakeo/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-fakeo</link>
		<comments>http://elegantcode.com/2012/02/14/nuget-project-uncovered-fakeo/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuget-project-uncovered-fakeo#comments</comments>
		<pubDate>Wed, 15 Feb 2012 04:46:00 +0000</pubDate>
		<dc:creator>Jason Jarrett</dc:creator>
				<category><![CDATA[NuGet]]></category>

		<guid isPermaLink="false">http://elegantcode.com/2012/02/14/nuget-project-uncovered-fakeo/</guid>
		<description><![CDATA[If you are coming to this series of posts for the first time you might check out my introductory post for a little context. FakeO is a fake object generation library. It doesn’t seem to play nicely with objects that require parameters in the constructor. However if you have lots of classes that have a [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>If you are coming to this series of posts for the first time you might check out <a href="http://elegantcode.com/2012/01/22/nuget-project-uncovered-an-introduction-to-the-series/" >my introductory post</a> for a little context.</p>
</blockquote>
<p><a href="http://nuget.org/packages/FakeO" >FakeO</a> is a fake object generation library. It doesn’t seem to play nicely with objects that require parameters in the constructor. However if you have lots of classes that have a default constructor and plain old get/set properties this project might be very useful.</p>
<p>You can even specify not just random text or numbers to generate, but some specific context driven data.</p>
<p>For example:</p>
<ul>
<li>Random company name </li>
<li>Lorem Ipsum to various lengths. </li>
<li>Phone numbers </li>
<li>Random strings based on a regex. </li>
<li>etc…      </li>
</ul>
<blockquote><pre>// example FakeO call
var comp = FakeO.Create.Fake&lt;Company&gt;(
                c =&gt; c.Name = FakeO.Company.Name(),
                c =&gt; c.Phone = FakeO.Phone.Number(),
                c =&gt; c.EmployeeCount = FakeO.Number.Next(100,200)); // random number from 100 to 200</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://elegantcode.com/2012/02/14/nuget-project-uncovered-fakeo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EC2 Performance, Spot Instance ROI and EMR Scalability</title>
		<link>http://www.jesse-anderson.com/2012/02/ec2-performance-spot-instance-roi-and-emr-scalability/</link>
		<comments>http://www.jesse-anderson.com/2012/02/ec2-performance-spot-instance-roi-and-emr-scalability/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 16:00:02 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[hadoop]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[amazon ec2]]></category>
		<category><![CDATA[ec2 performance]]></category>
		<category><![CDATA[elastic mapreduce]]></category>
		<category><![CDATA[infinite monkey theorem]]></category>
		<category><![CDATA[Magnum Opus]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[mapreduce scalability]]></category>
		<category><![CDATA[million monkeys project]]></category>

		<guid isPermaLink="false">http://www.jesse-anderson.com/?p=157</guid>
		<description><![CDATA[Note: This is a very long, technical and detailed discussion of Amazon Web Services.  You can watch the YouTube video below for a less technical explanation or skip to the conclusion to get the results. Introduction In 2006, Amazon introduced Elastic Compute Cloud (EC2) to Amazon Web Services (AWS).  In 2009, Amazon introduced Elastic MapReduce (EMR).  EMR uses [...]]]></description>
			<content:encoded><![CDATA[<p>Note: This is a very long, technical and detailed discussion of Amazon Web Services.  You can watch the YouTube video below for a less technical explanation or skip to the <a href="http://www.jesse-anderson.com/2012/02/ec2-performance-spot-instance-roi-and-emr-scalability/#conclusion">conclusion</a> to get the results.</p>
<p><iframe src="http://www.youtube.com/embed/FyvW-dpskZs" frameborder="0" width="420" height="315"></iframe></p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Introduction</h2>
<p>In 2006, <a href="http://aws.typepad.com/aws/2006/08/amazon_ec2_beta.html">Amazon introduced</a> <a href="http://aws.amazon.com/ec2/">Elastic Compute Cloud (EC2)</a> to <a href="http://aws.amazon.com/">Amazon Web Services (AWS)</a>.  In 2009, <a href="http://aws.typepad.com/aws/2009/04/announcing-amazon-elastic-mapreduce.html">Amazon introduced</a> <a href="http://aws.amazon.com/elasticmapreduce/">Elastic MapReduce (EMR)</a>.  EMR uses <a href="http://hadoop.apache.org/">Hadoop</a> to create <a href="http://en.wikipedia.org/wiki/Mapreduce">MapReduce</a> jobs using EC2 instances with <a href="http://aws.amazon.com/s3/">Simple Storage Service (S3)</a> as the permanent storage mechanism.  In 2011, Amazon added <a href="http://aws.typepad.com/aws/2011/08/run-amazon-elastic-mapreduce-on-ec2-spot-instances.html">Spot Instance support</a> for EMR jobs.  <a href="http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?UsingEMR_SpotInstances.html">Spot Instances</a> allow you to bid on EMR or EC2 instances that are not in use.  The <a href="http://aws.amazon.com/ec2/pricing/">pricing page</a> under the Spot Instances heading gives up to date data on EMR and EC2 instance prices.</p>
<p>In 2011, I created the <a href="http://www.jesse-anderson.com/2011/09/a-few-million-monkeys-randomly-recreate-shakespeare/">Million Monkeys Project</a> (<a href="http://code.google.com/p/million-monkeys-project">source code</a>).  It is a good metric for CPU and memory speed in a Hadoop cluster as it is very computational and memory intensive in its character group testing.  This project will use the Million Monkeys code to profile the various EC2 instances and the scalability of EMR and Hadoop.  I will talk about the cost savings when running EMR jobs as Spot Instances (bid price) instead of On Demand instances (full price).  This post will help engineers in choosing the right EC2 instance types based on the amount of work or computation needed.</p>
<p>When I originally ran the Million Monkeys Project to recreate every work of Shakespeare, I lacked the resources to run it entirely on EMR.  I started the project on an EC2 micro instance, but the instance lacked enough RAM to run everything I needed.  This time, I have the resources to run the entire project and recreate every work of Shakespeare on EMR using a 20 node EMR cluster.</p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Setting Up An EMR Cluster</h2>
<p>To run an EC2 cluster, various Hadoop services like the Task Tracker and DFS service need to be running. This is in addition to the actual Map and Reduce tasks that will do the actual work.  In an EMR cluster, the various Hadoop services are run on a master instance group.  The Map and Reduce tasks are run on a core instance group.  The core instance group is made of up of 1 or more EC2 instances.  When creating the EMR cluster, you can choose a different instance type for the master and core nodes.  You can use the information in this post in deciding which instance type should be used given the task(s).</p>
<p>An EMR cluster is built on EC2 instances and these instances run various parts of the Hadoop cluster.  The data can reside in S3 and be loaded from S3 into the <a href="http://hadoop.apache.org/common/docs/current/">Hadoop Distributed File System (DFS)</a>.  The compiled code or JAR and any input files are stored in an S3 bucket.  At the end of a job, all files that are not in S3 at the termination of the master instance group will be lost.  Therefore, you should make sure that the code places any important output in S3.  In the Million Monkeys code, I created a prefix that could be added to a file’s path to place them directly on S3.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Table 1.1 The Breakdown of Various EC2 Instances Specifications</h5>
<div style="text-align: left;" dir="ltr">
<table style="border-width: 1px; border-color: #80807f; border-style: solid;" border="1" cellspacing="0" cellpadding="3">
<thead>
<tr style="background-color: #a3b8c9;">
<td><strong>Instance Name</strong></td>
<td><strong>Memory</strong></td>
<td><strong>EC2 Compute Units and Cores</strong></td>
<td><strong>Platform</strong></td>
<td><strong>I/O Performance</strong></td>
</tr>
</thead>
<colgroup></colgroup>
<tbody>
<tr style="background-color: #f7fbfd;">
<td style="background-color: #f7fbfd;">Small</td>
<td style="background-color: #f7fbfd;">1.7 GB</td>
<td style="background-color: #f7fbfd;">1 EC2 on 1 Core</td>
<td style="background-color: #f7fbfd;">32-bit</td>
<td style="background-color: #f7fbfd;">Moderate</td>
</tr>
<tr style="background-color: #f7fbfd;">
<td style="background-color: #f7fbfd;">Large</td>
<td style="background-color: #f7fbfd;">7.5 GB</td>
<td style="background-color: #f7fbfd;">4 EC2 on 2 Cores</td>
<td style="background-color: #f7fbfd;">64-bit</td>
<td style="background-color: #f7fbfd;">High</td>
</tr>
<tr style="background-color: #f7fbfd;">
<td style="background-color: #f7fbfd;">Extra Large</td>
<td style="background-color: #f7fbfd;">15 GB</td>
<td style="background-color: #f7fbfd;">8 EC2 on 8 Cores</td>
<td style="background-color: #f7fbfd;">64-bit</td>
<td style="background-color: #f7fbfd;">High</td>
</tr>
<tr style="background-color: #f7fbfd;">
<td style="background-color: #f7fbfd;">High-CPU Medium</td>
<td style="background-color: #f7fbfd;">1.7 GB</td>
<td style="background-color: #f7fbfd;">5 EC2 on 2 Cores</td>
<td style="background-color: #f7fbfd;">32-bit</td>
<td style="background-color: #f7fbfd;">Moderate</td>
</tr>
<tr style="background-color: #f7fbfd;">
<td style="background-color: #f7fbfd;">High-CPU Large</td>
<td style="background-color: #f7fbfd;">7 GB</td>
<td style="background-color: #f7fbfd;">20 EC2 on 8 Cores</td>
<td style="background-color: #f7fbfd;">64-bit</td>
<td style="background-color: #f7fbfd;">High</td>
</tr>
<tr style="background-color: #f7fbfd;">
<td style="background-color: #f7fbfd;">Quadruple Extra Large</td>
<td style="background-color: #f7fbfd;">23 GB</td>
<td style="background-color: #f7fbfd;">33.5 on 8 Cores</td>
<td style="background-color: #f7fbfd;">64-bit</td>
<td style="background-color: #f7fbfd;">Very High</td>
</tr>
</tbody>
</table>
</div>
<p style="text-align: center;"><a href="http://aws.amazon.com/ec2/instance-types/">Source</a>  Note: High-Memory Extra Large, High-Memory Double Extra Large, High-Memory Quadruple Extra Large instances not tested and are not included on this table.</p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Instance Testing</h2>
<p>EC2 has <a href="http://aws.amazon.com/ec2/instance-types/">various instances</a> and performance specifications for those instances.  These EC2 instances are analogous to running a virtual machine in the cloud.  As shown in Table 1.1, each instance type varies in the number of EC2 Computer Units (ECU), the number of virtual cores, the amount of RAM, 32 or 64 bit platform, the amount of disk space, and network or I/O performance.  Some of these descriptions are quite nebulous.  For example, this is the description from Amazon regarding the definition of an ECU (<a href="http://aws.amazon.com/ec2/">Source</a>):</p>
<pre>EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent
CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.</pre>
<p>That description of CPU capacity does not really help in making capacity decisions or really even guessing how to scale an application.  To this end, I ran various tests to give an absolute idea of how each instance compares to another when running the same tests.</p>
<p>For these tests, I ran the Million Monkeys program for 5 continuous hours.  During this time, the Million Monkeys Code is run in a loop and the total number of character groups is calculated.  A character group is a group of 9 characters that is randomly generated using a <a href="http://www.cs.gmu.edu/~sean/research/">Mersenne Twister</a> and its existence is checked against every work of Shakespeare.  The runs lasted for slightly over 5 hours per run and the number of character groups is pro-rated.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 1.1 Total Character Groups Checked In a 5 Hour Pro-rated Period</h5>
<p style="text-align: center;"><a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_totalgroups.png"><img class="aligncenter size-medium wp-image-161 colorbox-157" style="padding: 7px;" title="baseline_totalgroups" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_totalgroups-300x150.png" alt="" width="300" height="150" /></a></p>
<p>Chart 1.1 does not present any surprises.  The Small instance obviously has the fewest character groups, followed by Hi-CPU Medium.  Large comes in third and Extra Large and Hi-CPU Large are a virtual tie, with Extra Large coming out slightly higher.  Quadruple Extra Large is the obvious winner with the highest total character groups.  Chart 1.1 gives an idea of the raw computing power of each instance.  It is not until we start looking at price per unit that we get a handle on cost efficiency of a particular instance.</p>
<p>In the original Million Monkeys project, I ran the entire Hadoop cluster on my home computer, an Intel Core 2 Duo 2.66GHZ with 4 GB RAM running Ubuntu 10.10 64-bit.  In 5 hours, my home computer ran 50,000,000,000 character groups.  One of the main differences between my home computer and the EC2 instances is that my home computer was not running in a virtualized environment.  I have seen a 10-30% decrease in efficiency when using virtualization.  Also, all processing was done locally with the Hadoop services and MapReduce tasks running on the same computer.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 1.2 Spot Instance Savings Per Hour When Compared to On Demand</h5>
<p style="text-align: left;"><a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_costpergrouppercent.png"><img class="aligncenter size-medium wp-image-158 colorbox-157" style="padding: 7px;" title="baseline_costpergrouppercent" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_costpergrouppercent-300x100.png" alt="" width="300" height="100" /></a>Spot Instances help reduce the cost of running an EMR cluster.  The Spot Instance prices will fluctuate as the market price changes.  Chart 1.2 represents the Spot Instance (bid) prices relative to their On Demand (full) prices when I ran their tests.  The savings in this test was very even across all instances at about 65% off their On Demand prices.  With a little bit of forward planning, an EMR cluster can save a lot of money using Spot Instances.</p>
<p>I should point out that running on a Spot Instance does not require a code change per se.  However, an EMR job flow’s Spot Instances can be taken away because of market price fluctuations.  A MapReduce job flow may need to be changed to accommodate an unplanned stoppage.  This might include saving the job state and adding the ability to start back up where it left off at the last save point.  The Million Monkeys code already did this and could take advantage of the Spot Instances without any code changes.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 1.3 Cost Per Hour For On Demand and Spot Instances<br />
<a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_priceperhourabsolute.png"><img class="aligncenter size-medium wp-image-160 colorbox-157" style="padding: 7px;" title="baseline_priceperhourabsolute" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_priceperhourabsolute-300x150.png" alt="" width="300" height="150" /></a></h5>
<p>Chart 1.3 shows another cost breakdown by hour of usage and total costs.  Calculating total cost for a single node cluster with EMR can be done using interactive Table 1.2.  For an On Demand instance, the total cost per hour is master node group plus core instance(s) group, plus EMR costs for all instances.  For a Spot instance, the total cost per hour is master node instance plus core instance(s) group spot price plus EMR instance(s).</p>
<p>For example, when I ran the Hi-CPU Medium instance testing, I paid a spot price for the core instance group of $0.06 per hour ($0.17 On Demand).  I also had to pay for the EMR cluster’s master node ($0.17 per hour) which was a Hi-CPU Medium instance as well.  On top that I have to pay the EMR price per hour ($0.03) for the master and core node.</p>
<p>To help illustrate the total pricing, Table 1.2 details the breakdown of total price per hour for Spot and On Demand instances.</p>
<h5 style="margin-bottom: 5px; text-align: center;" dir="ltr">Table 1.2 Spot Instance and On Demand Price Calculation</h5>
<div style="text-align: center;" dir="ltr">
<table style="border-width: 1px; border-color: #80807f; border-style: solid;" border="0" cellspacing="0" cellpadding="3">
<thead>
<tr style="background-color: #a3b8c9;">
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Price Description </strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Spot Instance Price </strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>On Demand Price </strong></td>
</tr>
</thead>
<colgroup>
<col width="*" />
<col width="*" />
<col width="*" /></colgroup>
<tbody>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Master Node</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotSinglePrice1">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandSinglePrice1">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Master Node EMR</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotSinglePrice2">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandSinglePrice2">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Core Node</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotSinglePrice3">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandSinglePrice3">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Core Node EMR</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotSinglePrice4">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandSinglePrice4">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Total Price Per Hour</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotSinglePrice5">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandSinglePrice5">0</span></td>
</tr>
</tbody>
</table>
</div>
<div style="text-align: center;" dir="ltr"></div>
<table class="aligncenter" style="margin-top: 20px;" border="0" cellspacing="0" cellpadding="3">
<tbody>
<tr>
<td style="padding-right: 20px; background-color: #a3b8c9; text-align: left; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Spot Instance Price Per Hour</strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>On Demand Instance Price Per Hour</strong></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">
<input id="spotSingle1" type="text" value="0.06" /></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">
<input id="onDemandSingle1" type="text" value="0.17" /></td>
</tr>
</tbody>
</table>
<p style="text-align: left;">
<input id="singleCalculate" type="button" value="Recalculate" /> (Try out your own prices)</p>
<p><script type="text/javascript">// <![CDATA[
       var $j = jQuery.noConflict();  $j(function(){   $j("#singleCalculate").click(function(){     calculatePrice();   });   $j().ready(function(){     calculatePrice();   }); }); function calculatePrice() {   var emrPrice = 0.03;  onDemandPrice = parseFloat($j('#onDemandSingle1').val());  spotPrice = parseFloat($j('#spotSingle1').val());  $j('#onDemandSinglePrice1').html(formatCurrency(onDemandPrice));   $j('#spotSinglePrice1').html(formatCurrency(onDemandPrice));   $j('#onDemandSinglePrice2').html(formatCurrency(emrPrice));    $j('#spotSinglePrice2').html(formatCurrency(emrPrice));   $j('#onDemandSinglePrice3').html(formatCurrency(onDemandPrice));   $j('#spotSinglePrice3').html(formatCurrency(spotPrice));    $j('#onDemandSinglePrice4').html(formatCurrency(emrPrice));   $j('#spotSinglePrice4').html(formatCurrency(emrPrice));   $j('#onDemandSinglePrice5').html(formatCurrency(onDemandPrice + emrPrice + onDemandPrice + emrPrice));   $j('#spotSinglePrice5').html(formatCurrency(onDemandPrice + emrPrice + spotPrice + emrPrice)); } function formatCurrency(num) { num = isNaN(num) || num === '' || num === null ? 0.00 : num;  return "$" +   parseFloat(num).toFixed(2); }
// ]]&gt;</script></p>
<p>It is possible to run the master node as a Spot instance instead of an On Demand instance.  Amazon recommends running the master node as an On Demand instance to prevent market price from taking out your master node and stopping the entire cluster.</p>
<p>For these tests, I varied the master node instance type.  Table 1.3 shows a list of instance type for the core and the instance type for the master node I used.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Table 1.3 Core Instance Group Type Used With Master Group Type Test</h5>
<div dir="ltr">
<table class="aligncenter" style="margin-bottom: 20px;" border="0" cellspacing="0" cellpadding="3">
<colgroup>
<col width="*" />
<col width="*" /></colgroup>
<tbody>
<tr>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Core Instance Group Type </strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Master Instance Group Type </strong></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Small</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Small</td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Large</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Large</td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Extra Large</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Extra Large</td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Hi-CPU Medium</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Hi-CPU Medium</td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Hi-CPU Large</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Hi-CPU Medium</td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Quadruple Extra Large</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Hi-CPU Medium</td>
</tr>
</tbody>
</table>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 1.4 Cost Per 100,000,000 Character Groups Checked</h5>
</div>
<h5><a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_pricepergroup.png"><img class="aligncenter size-medium wp-image-159 colorbox-157" style="padding: 7px;" title="baseline_pricepergroup" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/baseline_pricepergroup-300x150.png" alt="" width="300" height="150" /></a></h5>
<p>Breaking down the data into price per unit gives insight into the most cost efficient means of running a job.  In Chart 1.4, I break down the cost by how much it costs to process 100,000,000 character groups.  For Chart 1.4, the lower the number the better.  This bore out my hunch that the best bang for the monkey buck is a Hi-CPU Medium instance.  I was surprised that the Small instance didn’t come in second best; that position was taken by the Large instance.</p>
<p>Once again, we can see the cost benefits of using a Spot instance.  Across the board, the Spot instances have a much smaller variance than their On Demand counterparts.  The Spot instances went from $0.00128 to $0.00497 and the On Demand instances went from $0.00364 to $0.0142.</p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Scalability Testing</h2>
<p>The Instance testing above led up to the next phase of the project.  In Chart 1.4, we found that the Hi-CPU medium instances provided the highest cost efficiency per character group.  Now, I will take the most cost efficient instance and see how well it scales by adding more nodes to the cluster.  For these tests, I created EMR clusters of 1, 2, 3, 4, 5, 10 and 20 nodes.  Once again, I ran each cluster size for 5 hours and captured the results.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 2.1 Spot Instance Savings Compared to On Demand Prices<br />
<a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_costpergrouppercent.png"><img class="aligncenter size-medium wp-image-163 colorbox-157" style="padding: 7px;" title="scalability_costpergrouppercent" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_costpergrouppercent-300x100.png" alt="" width="300" height="100" /></a></h5>
<p>In Chart 2.1, I show the cost savings by comparing Spot and On Demand prices across clusters sizes.  The bars with the the “All” designations show the entire cost roll up of the cluster size.  The core cost is consistent across all node sizes; however, having more nodes running at once increased the savings.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 2.2 Cost Per Hour When Running Various Numbers of Nodes<br />
<a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_priceperhourabsolute.png"><img class="aligncenter size-medium wp-image-166 colorbox-157" style="padding: 7px;" title="scalability_priceperhourabsolute" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_priceperhourabsolute-300x150.png" alt="" width="300" height="150" /></a></h5>
<p>Chart 2.2 shows another cost breakdown by hour of usage and total costs for Spot and On Demand instances.</p>
<p>To help illustrate the total pricing with a multi instance core group, Table 2.1 details the breakdown of total price per hour for Spot and On Demand instances for a 10 node cluster.</p>
<h5 style="margin-bottom: 5px; text-align: center;" dir="ltr">Table 2.1 Spot and On Demand Instance Price Calculation</h5>
<div style="text-align: left;" dir="ltr">
<table border="0" cellspacing="0" cellpadding="3">
<colgroup>
<col width="*" />
<col width="*" />
<col width="*" /></colgroup>
<tbody>
<tr>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Price Description </strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Spot Instance Price </strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>On Demand Price </strong></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Master Node</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotClusterPrice1">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandClusterPrice1">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Master Node EMR</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotClusterPrice2">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandClusterPrice2">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Core Node</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotClusterPrice3">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandClusterPrice3">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Core Node EMR</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotClusterPrice4">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandClusterPrice4">0</span></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">Total Price Per Hour</td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="spotClusterPrice5">0</span></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;"><span id="onDemandClusterPrice5">0</span></td>
</tr>
</tbody>
</table>
</div>
<div dir="ltr"></div>
<table style="margin-top: 20px;" border="0" cellspacing="0" cellpadding="3">
<tbody>
<tr>
<td style="background-color: #a3b8c9; text-align: center; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Spot Instance Price Per Hour</strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>On Demand Instance Price Per Hour</strong></td>
<td style="background-color: #a3b8c9; border-width: 1px; border-color: #80807f; border-style: solid;"><strong>Number Of Nodes</strong></td>
</tr>
<tr>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">
<input id="spotCluster1" type="text" value="0.08" /></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">
<input id="onDemandCluster1" type="text" value="0.17" /></td>
<td style="background-color: #f7fbfd; border-width: 1px; border-color: #80807f; border-style: solid;">
<input id="numberOfNodesCluster1" type="text" value="10" /></td>
</tr>
</tbody>
</table>
<p style="text-align: left;">
<input id="clusterCalculate" type="button" value="Recalculate" /> (Try out your own prices)</p>
<p><script type="text/javascript">// <![CDATA[
       var $j = jQuery.noConflict();  $j(function(){   $j("#clusterCalculate").click(function(){     calculateClusterPrice();   });   $j().ready(function(){     calculateClusterPrice();   }); }); function calculateClusterPrice() {   var emrPrice = 0.03;  onDemandPrice = parseFloat($j('#onDemandCluster1').val());  spotPrice = parseFloat($j('#spotCluster1').val());  nodes = parseInt($j('#numberOfNodesCluster1').val());  $j('#onDemandClusterPrice1').html(formatCurrency(onDemandPrice));   $j('#spotClusterPrice1').html(formatCurrency(onDemandPrice));   $j('#onDemandClusterPrice2').html(formatCurrency(emrPrice));    $j('#spotClusterPrice2').html(formatCurrency(emrPrice));   $j('#onDemandClusterPrice3').html(formatCurrency(onDemandPrice * nodes) + " (" + nodes + " nodes * " + formatCurrency(onDemandPrice) + " Spot Price)");   $j('#spotClusterPrice3').html(formatCurrency(spotPrice * nodes) + " (" + nodes + " nodes * " + formatCurrency(onDemandPrice) + " On Demand Price)");    $j('#onDemandClusterPrice4').html(formatCurrency(emrPrice * nodes) + " (" + nodes + " nodes * " + formatCurrency(emrPrice) + ")");   $j('#spotClusterPrice4').html(formatCurrency(emrPrice * nodes) + " (" + nodes + " nodes * " + formatCurrency(emrPrice) + ")");   $j('#onDemandClusterPrice5').html(formatCurrency(onDemandPrice + emrPrice + (onDemandPrice * nodes) + (emrPrice * nodes)));   $j('#spotClusterPrice5').html(formatCurrency(onDemandPrice + emrPrice + (spotPrice * nodes) + (emrPrice * nodes))); } function formatCurrency(num) { num = isNaN(num) || num === '' || num === null ? 0.00 : num;  return "$" +   parseFloat(num).toFixed(2); }
// ]]&gt;</script><br />
As you can see, you can calculate the current and even project the cost of a cluster.  There is a new company, <a href="http://www.cloudability.com/">Cloudability</a>, who has made it their mission to make not just cluster, EMR and EC2 price reporting more simple but look for ways to improve it (now in beta). Cloudability can even send you a daily or weekly Email showing the charges for that period.  You can check out their website and sign up for a free account.  Although I was unable to use Cloudability for this project, I look forward to using it in my next projects.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 2.3 Cost To Run 100,000,000 Character Groups At Various Numbers of Nodes<br />
<a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_pricepergroup.png"><img class="aligncenter size-medium wp-image-165 colorbox-157" style="padding: 7px;" title="scalability_pricepergroup" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_pricepergroup-300x150.png" alt="" width="300" height="150" /></a></h5>
<p>In Chart 2.3, I break down the cost by how much it costs to process 100,000,000 character groups.  For Chart 2.3, the lower the number the better.  Once again, the Spot instance pricing shines.  In this case, the Spot instances price variations are quite flat and the On Demand varies much more.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 2.4 Total Character Groups At Various Numbers of Nodes Pro-rated to 5 Hours</h5>
<p style="text-align: center;"><a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_totalgroups.png"><img class="aligncenter size-medium wp-image-167 colorbox-157" style="padding: 7px;" title="scalability_totalgroups" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_totalgroups-300x150.png" alt="" width="300" height="150" /></a><br />
Chart 2.4 shows the power of creating a multi-node cluster.  With 20 nodes in the cluster, 477,987,913,067 character groups can be run in a 5 hour period.</p>
<p>I want to reiterate that there are no code changes necessary for creating a large cluster like this.  I only needed to make EMR configuration changes when creating the cluster.  Also, cluster configuration changes can be made to a live or running cluster.  You can add or remove core instances to increase or decrease the performance of a cluster.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 2.5 Percent Of Linear Scalability From Actual Growth At Various Numbers of Nodes<br />
<a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_percent.png"><img class="aligncenter size-medium wp-image-164 colorbox-157" style="padding: 7px;" title="scalability_percent" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_percent-300x150.png" alt="" width="300" height="150" /></a></h5>
<p>Now let’s get in to the scalability of EMR and Hadoop.  A 1 node cluster is assumed to be the most efficient possible in an EMR cluster.  As you can see, Chart 2.5 recognizes that with a 100% efficiency for a 1 node cluster.  All subsequent cluster size efficiencies are calculated as number of character groups for 1 node, times the number of nodes in the cluster.  A 2-5 node cluster has very similar loss of efficiency at about 5%.  A 10 and 20 node cluster have a loss of efficiency at 13% and 16% respectively.</p>
<p>For anyone who has created a distributed system, they will recognize 84% as a phenomenal level of scalability. This really shows that EMR and Hadoop are living up to the hype as revolutionary technologies.  With no code changes and simple configuration changes, you can easily scale an application.</p>
<h5 style="margin-bottom: 5px; text-align: center;">Chart 2.6 Actual Scalability With Projected Linear Growth Pro-rated to 5 Hours<br />
<a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_absolute.png"><img class="aligncenter size-medium wp-image-162 colorbox-157" style="padding: 7px;" title="scalability_absolute" src="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_absolute-300x150.png" alt="" width="300" height="150" /></a><a href="http://www.jesse-anderson.com/wp-content/uploads/2012/02/scalability_priceperhourabsolute.png"><br />
</a></h5>
<p>Chart 2.6 presents another breakdown of the scalability showing the absolute or actual values and the calculated values at 100% efficiency.  Once again, we see a very gradual decline in cluster node sizes 2-5.  There is a much more obvious decline on 10 and 20 nodes.</p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Million Monkeys On EMR</h2>
<p>In my original run of the Million Monkeys Project, I tried to use the Micro Instance EC2 to run the project.  The project needed more RAM than was available on the micro instance and I had to move it to my home computer.  Many reporters and commenters asked me how long the project would take if I ran it to completion on EMR.  This time, thanks to Amazon, I have the resources to run the project on a multi-node EMR cluster.</p>
<p>The instance testing and scalability testing really lead up to this test.  In the instance testing, I wanted to find the EC2 instance type with the best bang for the buck.  Next, I took that best EC2 instance (Hi-CPU Medium) and wanted to see what amount of efficiency I was losing when running a 20 node cluster.  From there, I created a 20 node Hi-CPU Medium cluster that ran the Million Monkeys code for a prolonged period of time.  I wanted to see how long it would take a 20 node cluster to recreate the original project.</p>
<p>For a little perspective, the <a href="http://www.jesse-anderson.com/2011/09/a-few-million-monkeys-randomly-recreate-shakespeare/">original Million Monkeys project</a> recreated every work of Shakespeare after running 7.5 trillion character groups and ran for 46 days.  For these prolonged tests, I actually ran the 20 node cluster twice.  The first time ran 12 trillion character groups in 5 days 17 hours.  The second time ran 25.7 trillion character groups in 11 days 15 hours.  Each one ran about 2.2 trillion character groups per day.  Given the random nature of the problem, we can only extrapolate how long the original project would have taken.  With these performance numbers, it would have taken 3 days 9 hours to complete the original project.</p>
<p>The cluster cost about $45.44 per day to run.  I ran the cluster with the configuration as shown in 20 node scalability testing above with the master instance group as one Hi-CPU Medium instance running On Demand.  The other 20 nodes are Hi CPU Medium instances running with a Spot price of $0.09 per hour.  The 5 day run cost $317.96.  The 11 day run total cost was $528.25.  If I hadn’t used Spot instances, the 11 day total cost would have been $1,514.87.  Once again, Spot pricing really shines because I achieved the same goal with almost $1,000 in savings.</p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Thoughts and Caveats</h2>
<p>Previously, I mentioned that the Million Monkeys code is a good metric of CPU and memory.  There is less I/O than might be run in other MapReduce tasks.  I spent some time and effort to reduce the amount of I/O in code.  To reduce the amount of I/O, I used a Bloom Filter in the Map task.  The <a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom Filter</a> is created once and saved in S3.  All future Map tasks simply read the Bloom Filter file and run all processing against it.  Once the Reduce tasks is run, a 3.5 MB text file is loaded into memory for the final existence checks.  Depending on the MapReduce task, a Map tasks may need to read in gigabytes or even terabytes of data for processing.  Another key difference for <a href="http://aws.amazon.com/ec2/instance-types/">EC instances</a> is their I/O performance.  For MapReduce tasks that require high I/O performance, a High-CPU medium instance with moderate I/O performance may not have the best cost to performance ratio.</p>
<p>Earlier in the project, I used the AWS web user interface to create EMR jobs.  It was a bit of a pain to setup the command line interface’s (CLI) various keys.  Once I set up the CLI, it made the testing much easier and I wish I would have used the CLI sooner.  It was much easier to repeat a job.  The EMR API can be used to spawn your cluster programmatically.  Here is the command line that I used to spawn the job:</p>
<pre>./elastic-mapreduce --create --name "Monkeys Scalability 5 Hour Test 20 Node"   
--instance-group master --instance-type c1.medium --instance-count 1
--instance-group core --instance-type c1.medium --instance-count 20 --bid-price 0.08  
--jar s3://monkeys2/monkeys.jar --arg timelimit=5h --arg iterationsize=1
--arg memory=-Xmx1024m --arg -Dmapred.max.split.size=12000
--arg -Dmapred.min.split.size=10000</pre>
<p>I would like to break down what this command is doing.  It is creating a new EMR cluster with the job name &#8220;Monkeys Scalability 5 Hour Test 20 Node.”  The master instance group will be made up of one High-CPU Medium instance.  The core instance group will be made up of 20 nodes with a bid price of $0.08 per hour per instance.  It will be running a custom jar located in S3 at s3://monkeys2/monkeys.jar.  The rest of the arguments are for the Million Monkeys code itself.</p>
<p>The performance could be improved by spending some time tuning and looking at configuration changes as all tests used defaults.  For the duration of this project, I did not spend time optimizing the jobs and used only default settings, except maximum Java heap memory (-xmx).</p>
<p>Although I kept my bid prices in nice round cents, you can bid in fractions of a cent like $0.085.</p>
<p>For the curious, I used <a href="http://www.jfree.org/jfreechart/">JFreeChart</a> for the charts and graphs on this page with a customized color scheme.</p>
<h2 style="margin-top: 10px; margin-bottom: 5px;" dir="ltr">Problems</h2>
<p>Hadoop and EMR jobs are usually geared towards very large input files.  In the case of the Million Monkeys project, the input files are very small, usually a few KB.  This presented an issue when I started running the EMR cluster with multiple nodes.  When I compared the multi-node results to the single node results, there was barely any improvement in total character groups.  In some cases, a multi-node cluster did worse than a single node cluster.  After a LOT of Googling and guesses, I finally found a small input file workaround by specifying the max and min split sizes for the input file.  Here is the command so as to save a future person lots of Googling:</p>
<pre>./elastic-mapreduce --create --name "Monkeys Scalability 5 Hour Test 20 Node"   
--instance-group master --instance-type c1.medium --instance-count 1
--instance-group core --instance-type c1.medium --instance-count 20 --bid-price 0.08  
--jar s3://monkeys2/monkeys.jar --arg timelimit=5h --arg iterationsize=1
--arg memory=-Xmx1024m --arg -Dmapred.max.split.size=12000
--arg -Dmapred.min.split.size=10000</pre>
<p>The split workaround stopped working (I never figured out why).  I looked around for a better solution and found the <a href="http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html">NLineInputFormat </a>class.  Had I known about this class when I first wrote the code, I would have used it.  It is a much better fit for the type of input I am using for the Million Monkeys project.</p>
<p>When you are budgeting for your AWS project, make sure you bake in some time and money for running down some issues.  You may run in to some issues with multiple nodes that did not happen on a single development computer.</p>
<h2 dir="ltr"><a name="conclusion"></a>Conclusion</h2>
<p>EMR is a great, cost effective way to get an enterprise Hadoop cluster going.  It is also easier to get an enterprise Hadoop cluster up and running than a Do-It-Yourself method.  An EMR cluster solves the many problems of creating an enterprise cluster like hardware specs, uptime and configuration.  Until you have dealt with the pain of redundancy and enterprise hardware requirements, you don’t know how much time and effort EC2 and S3 save.  With EMR, you simply have to start the cluster and all of these issues are taken care of.</p>
<p>I also showed how EMR and Hadoop make scaling easy.  You do not have to convince your boss to buy a $2,000 to $4,000 server(s); you can simply add more EC2 instances to the core instance group or change the instance type to one with more ECUs.  This can be done on a temporary basis to accommodate higher usage or a gradual increase in capacity.  Without changing the code, I was able to scale the cluster to 20 nodes.</p>
<p>EMR clusters can be run at Amazon Web Service’s various locations around the world.  AWS has 3 in the United States, 1 in Ireland, 1 in Singapore, 1 in Tokyo and 1 in Brazil.  Separate EMR clusters could be used in conjunction with geographic sharding or simply choosing the nearest location to the client.</p>
<p>Spot instances also show great promise in further reducing the price per hour of an EMR cluster.  For the 20 node tests, I reduced total cost per hour from $2.20 to $1.30, a 41% decrease.  During one of the 20 node speed runs, I saved $1,000 by using Spot instances.  If you decide to use Spot instances, make sure your code can handle its instances being taken away as the market price increases.</p>
<p>I think this project shows there is true substance to the hype and buzz around Hadoop and EMR.  Anyone who has created their own distributed system knows that achieving 84% efficiency is an impressive feat.  There are a great number of use cases that can make efficient use of Hadoop and EMR.  Paired with EMR, you can easily run a cost efficient, enterprise level, cluster that can run around the world.</p>
<p>Full Disclosure: Amazon supported this project with AWS credit.  I would like to thank Jeff Barr and Alan Mock from Amazon for their help in making this project possible.</p>
<p>Copyright © Jesse Anderson 2012.  All Rights Reserved.  All text, graphs and charts on this page are licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jesse-anderson.com/2012/02/ec2-performance-spot-instance-roi-and-emr-scalability/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

