DuraCloud Release 0.7 : Distributed Compute Services
This page last changed on Oct 28, 2010 by bbranan.
IntroductionDue to the potential for DuraCloud users to store large number of files in the DuraCloud system, it becomes necessary to be able to run processing jobs over large data sets. This can be done in a variety of ways, but two primary distinctions come in the separation of services which take advantage of the capabilities of underlying cloud provider offerings, and services which require a DuraCloud-provided strategy for managing a distributed processing environment. Cloud Provider offeringsAmazonAmazon's Elastic Map Reduce offering makes use of the Hadoop project available from Apache. Amazon's service manages the server cluster on which Hadoop is run. Users provide the code necessary to actually perform the necessary computations using the Map/Reduce algorithm. DuraCloud makes use of this capability in the Bulk Image Transformer, the Duplicate on Demand service, and the Bulk Bit Integrity Checker. A DuraCloud service which uses Amazon's Elastic Map Reduce capability like the Bulk Image Transformer is made up of three parts:
Microsoft AzureMicrosoft's Dryad project, which is still in the research phase, is said to be a very generic graph generation and processing engine which has the potential to be very powerful. Dryad should be able to handle the Map/Reduce algorithm as well as many other types of processing algorithms. Of course, with this added flexibility comes added complexity. As noted here, Dryad is not yet available on Azure. So far there do not appear to be any published tests using Dryad. A good comparison of Hadoop and Dryad According to this article, Microsoft may be considering the use of Hadoop within Azure. DuraCloud Distributed ProcessingMore to come... |
![]() |
Document generated by Confluence on Jan 26, 2011 19:44 |