This page last changed on Oct 28, 2010 by bbranan.
Introduction
The DuraCloud application provides a set of services which can be deployed and used for a variety of purposes, primarily to process the content which has been loaded into DuraCloud storage. The following list of services describes how each service is expected to be used and the options available for tailoring the service to your needs.
Note that all services currently have a "Location" configuration option which is intended to allow for the deployment of services at varying locations. At the moment, however, services can only be deployed on the primary service instance. As this configuration option is consistent across all services it will not be included in the listing for each service.
Duplicate on Upload
Description:
The Duplicate on Upload service provides a way to ensure that the content added to DuraCloud is stored with at least two storage providers. The Duplicate on Upload service performs on-ingest duplication of content. This means that once the Duplicate on Upload service is deployed, it watches for all content that is added to your DuraCloud account, determines if it should be copied to another DuraCloud store, and if so, performs the copy. All content that is copied will be placed in an identically named space in the secondary storage location.
Configuration Options:
- Duplicate from this store: The primary storage location which DuraCloud will monitor for file additions. When files are added to this store, they will be copied to the secondary store.
- Duplication to this store: The secondary store where content will be copied after it has been added to the primary store.
Duplicate on Demand
Description:
The Duplicate On Demand service provides a simple way to duplicate content from one space to another. This service is primarily focused on allowing the duplication of content from the primary storage provider to a secondary provider. To begin, a source space is chosen, along with a store and space to which content will be duplicated. The service then performs a copy of all content and metadata in the source space to the duplication space, creating the space if necessary. When the service has completed its work, a results file will be stored in the chosen space and a set of files (primarily logs) created as part of the process will be stored in the work space.
Configuration Options:
- Source Space: DuraCloud space where source files can be found
- Replicate to this store: DuraCloud store to which content will be copied
- Replicate to this space: DuraCloud space where content will be copied
- Store results file in this space: DuraCloud space (on the primary store) where results file will be placed
- Working Space: DuraCloud space used to store processing information necessary to run the job as well as log files generated as part of the job flow
- Number of Server Instances: The number of servers to use to perform the replication task.
- Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.
Image Server
Description:
The Image Server provides a viewer for image files through use of the Djatoka image server. While this service is geared towards serving JPEG 2000 images, it supports multiple image file types by converting them to JPEG 2000 format on the fly.
Note that the current implementation of this service requires that spaces be set to OPEN in order to use the viewer to view image files.
Configuration Options:
None
Media Streamer
Description:
The Media Streamer provides streaming capabilities for video and audio files. The service takes advantage of Amazon Cloudfront streaming, so files to be streamed must be within a space on an Amazon provider. Also, all media to be streamed by this service needs to be within a single space.
Amazon Cloudfront streaming uses the Flash Media Server to host streaming files over RTMP. File formats supported include MP3, MP4 and FLV among others. For a full listing of supported file types see the Flash Media Server documentation.
Configuration Options:
- Source Media Space: The DuraCloud space where the source video and audio files to be streamed are stored. The Media Streamer service attempts to stream all files in this space.
- Viewer Space: A DuraCloud space where example viewer files will be stored. After the service has started, this space will include a playlist including all items in the source media space as well as example html and javascript files which can be used to display a viewer.
Bit Integrity Checker
Description:
The Bit Integrity Checker provides the ability to verify that the content held within DuraCloud has maintained its bit integrity. There are five modes of operation.
Modes:
- Verify the bit integrity of a list of items
- Verify the bit integrity of an entire space
- Generate bit integrity information for a list of items
- Generate bit integrity information for an entire space
- Compare two different bit integrity reports
Configuration Options:
- Mode 1
- "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
- "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
- "Space with input listing" : space holding the list of items over which to run the service
- "Input listing name" : item name of list of items over which to run the service
- "Output space" : destination space of service outputs
- "Output listing name" : destination item name of MD5s listing
- "Output report name" : destination item name of fixity report
- "Store" : underlying storage provider over which service will run
- Mode 2
- "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
- "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
- "Space with input listing" : space holding the list of items over which to run the service
- "Space containing content items" : source space of items over which to run the service
- "Input listing name" : item name of list of items over which to run the service
- "Output space" : destination space of service outputs
- "Output listing name" : destination item name of MD5s listing
- "Output report name" : destination item name of fixity report
- "Store" : underlying storage provider over which service will run
- Mode 3
- "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
- "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
- "Space with input listing" : space holding the list of items over which to run the service
- "Input listing name" : item name of list of items over which to run the service
- "Output space" : destination space of service outputs
- "Output listing name" : destination item name of MD5s listing
- "Store" : underlying storage provider over which service will run
- Mode 4
- "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
- "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
- "Space containing content items" : source space of items over which to run the service
- "Output space" : destination space of service outputs
- "Output listing name" : destination item name of MD5s listing
- "Store" : underlying storage provider over which service will run
- Mode 5
- "Space with input listing" : space holding first list of MD5s
- "Space with second input listing" : space holding second list of MD5s
- "Input listing name" : item name of first list of MD5s
- "Second input listing name" : item name of second list of MD5s
- "Output space" : destination space of service outputs
- "Output report name" : destination item name of fixity report
- "Store" : underlying storage provider over which service will run
Bulk Bit Integrity Checker
Description:
The Bulk Bit Integrity Checker provides a simple way to determine checksums (MD5s) for all content items in any particular space by leveraging an Amazon Hadoop cluster. This service is designed for large datasets (+10GB).
Configuration Options:
- Source Space: DuraCloud space where source files are stored
- Destination Space: DuraCloud space where report file will be placed
- Working Space: DuraCloud space used to store processing information necessary to run the job as well as log files generated as part of the job flow
- Number of Server Instances: The number of servers to use to perform the MD5 generation task.
- Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.
Image Transformer
Description:
The Image Transformer provides a simple way to transform relatively small numbers of image files from one format to another.
Note that the ImageMagick service must be deployed prior to using the Image Transformer
Configuration Options:
- Source Space: DuraCloud space where source image files are stored
- Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the conversion process
- Destination Format: The image format to which the source files will be transformed
- Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
- Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
- Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.
Bulk Image Transformer
Description:
The Bulk Image Transformer provides a simple way to transform image files from one format to another in bulk. This service uses Amazon's Elastic Map Reduce capability to run the image transformation task within a Hadoop cluster.
Configuration Options:
- Source Space: DuraCloud space where source image files are stored
- Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the transformation process
- Working Space: DuraCloud space used to store processing information necessary to run the job as well as log files generated as part of the job flow
- Destination Format: The image format to which the source files will be transformed
- Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
- Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
- Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.
- Number of Server Instances: The number of servers to use to perform the image transformation task.
- Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.
Note that there have been issues discovered during testing of the Bulk Image Transformer. If you choose to run this service, it is recommended that the size of images being used be kept under 100MB. The likelihood of success appears to increase with server size, and number of servers being set to 3 or more is recommended. If you do run this service, please note the data set and configuration and make us aware of the outcome.
System Transformer Utility
Description:
The System Transformer Utility deploys the ImageMagick application on a DuraCloud service instance, which allows other services to take advantage of its features. The Image Transformer requires that this service be deployed in order to operate correctly.
Configuration Options:
None
System WebApp Utility
Description:
The System WebApp Utility coordinates the installation, de-installation, startup and shutdown of Apache Tomcat servers on a DuraCloud service instance. These Tomcat servers are created to allow other DuraCloud services to deploy web applications. The Image Server requires that this service be deployed in order to operate correctly.
Configuration Options:
None
Comments on services names:
- Replication Service - change to: Replication on Ingest
- JPEG 2000 Image Viewer Service - change to: Image viewer
- Media Streaming Service - change to: media streaming
- Fixity Service - change to: Integrity checking
- Image Conversion Service - change to: file conversion
- Bulk Image Conversion Service - change to: Bulk file conversion
- ImageMagick Service - needs rework
- Web App Utility Service - can we drop this totally and run in the background
- Amazon Fixity Service - change to: bulk integrity checking
- Replication On Demand Service - Change to: Replication on demand

Posted by mbkimpton at Oct 22, 2010 14:41
|
I agree with Michele almost entirely. My only issue is renaming the image conversion service to just file conversion. That would imply that you could also convert other file types, such as .doc, .pdf, .spss, etc. and this is currently not the case. Do we intend to offer a more general file conversion service at some point and/or incorporate it into the current conversion service offering?
Also, is it possible for the following two services to be started on instance start-up/initialization? Since most/all of the other services rely on them, we don't want to give the pilots the ability to accidentally undeploy one of these when they have other dependent services running.
- ImageMagick Service (If this service is only tied to one other service (i believe image viewing), then can it be coupled to deploy when this other service is deployed?)
- Web App Utility Service
...as a side note, I'm not fond of the word "ingest." Does replication on upload make more sense? Is it technically accurate?

Posted by csmith at Oct 22, 2010 15:56
|
In response to the suggestion of renaming "Amazon Fixity Service" to "Bulk Integrity Checking", actually the service only works over Amazon-hosted content. To me, it makes it much clearer what to expect of the service by having the name "Amazon" included. Even though I created the service itself, I know that if it did not include "Amazon" in its name I would come at it expecting to be able to verify my Rackspace content.

Posted by awoods at Oct 22, 2010 16:20
|
None of the current services work on another provider, is that true? So I think we need to discuss how to make this clear to the users, that the services only work on Amazon-hosted content.

Posted by csmith at Oct 22, 2010 18:40
|
I'm coming at this thinking that the names of the services should reflect their capability today, rather than what we hope they will do in the future. If the capability of a service changes, we can rename it. Some thoughts based on this:
- It makes sense to use the name "Replication on Ingest" for the current Replication Service because it only works on ingest right now. We'll likely need to change the name once the service can also make changes based on updates and deletes. Since Carissa notes not particularly liking the term ingest (and I'm not a particular fan of "upload"), perhaps "Replication on Add" would work. This would allow us to change it to "Replication on Add/Update/Delete" in the future.
- Andrew's comments about the Amazon Fixity Service bring out that it's important to indicate a limitation of that service to a single provider. Perhaps "Bulk Integrity Checking for Amazon" would work here?
- The image conversion services can currently handle only images and no other file types, so those services should retain the word "image" rather than "file".
- We are using the term "replication" for two services with different purposes. The Replication On Demand service does no more than copy files from one place to another, while the Replication Service provides an on-going update service, which I feel is more compatible with the term replication. Perhaps Replication On Demand should instead be "Copy On Demand"?
I agree that the ImageMagick and Web App Utility services should be deployed automatically, so we wouldn't need a name at all because they would not show up on the services list. This will not be the case for the 0.7 release, though, so we will need to name these. Perhaps ImageMagick could be "Image Conversion Utility" and Web App Utility could be "Image Viewer Utility", just to make the dependencies more clear in the meantime?
So, to summarize:
Old Name |
New Name |
Replication Service |
Replication on Add |
JPEG 2000 Image Viewer Service |
Image Viewer |
Media Streaming Service |
Media Streaming |
Fixity Service |
Integrity Checking |
Image Conversion Service |
Image Conversion |
Bulk Image Conversion Service |
Bulk Image Conversion |
Amazon Fixity Service |
Bulk Integrity Checking for Amazon |
Replication On Demand Service |
Copy On Demand |
ImageMagick Service |
Image Conversion Utility |
Web App Utility Service |
Image Viewer Utility |

Posted by bbranan at Oct 22, 2010 20:20
|
A couple of the services do work across providers, namely the Replication service, the Image Viewer service. You're right that the others require user content to be at Amazon though. You raise a good question, though, about how we should express the limitations of the current set of services.

Posted by bbranan at Oct 22, 2010 20:36
|
"Name that service" feedback: Tried to come up with jargon-less names that could be understood w/o knowing anything about repositories/preservation/archiving:
- Replication Service - Content Copier
- JPEG 2000 Image Viewer Service - Portfolio Browse
- Media Streaming Service - (agree w Michele) Media Streaming
- Fixity Service - Digital Bit Checker
- Image Conversion Service - Format Changer
- Bulk Image Conversion Service - change to: Bulk Format Changer
- ImageMagick Service - (don't understand what this is) needs rework
- Web App Utility Service - (agree w Michele) can we drop this totally and run in the background
- Amazon Fixity Service - Bulk Digital Bit Checker
- Replication On Demand Service - Content Copier on Demand

Posted by carolmintonmorris at Oct 25, 2010 13:11
|
The marcom has collectively come up with alternative names for the services. This was done by reviewing everyones suggestions and also reflecting on how a broader audience might interpret these names. Our final suggestions are posted:
- Replication Service - Multi copier on upload
- JPEG 2000 Image Viewer Service - Image Viewer
- Media Streaming Service - Media Streamer
- Fixity Service -Bit Integrity Checker
- Image Conversion Service - Image Transformer
- Bulk Image Conversion Service - change to: Bulk Image transformer
- ImageMagick Service -goes away
- Web App Utility Service - goes away
- Amazon Fixity Service - Bulk bit integrity checker ( if it does not run on other providers we should grey out this service on those providers, we all concurred not to have "amazon" as part of the name)
- Replication On Demand Service - Multi copier on Demand

Posted by mbkimpton at Oct 25, 2010 16:38
|
Echoing a remark from Bill, I'm not sure about "Multi-copier". Maybe "Duplicate"?
Also, I'm a little unsure about "bulk" - what sense are we trying to convey? "Optimized"? "Fast"? "High Volume"? "Retroactive?" or maybe we should be prefixing the non-bulk versions: "Ingest bit integrity checker".

Posted by bradm at Oct 26, 2010 13:23
|
Final Names
Old Name |
New Name |
Amazon Fixity Service |
Bulk Bit Integrity Checker |
Bulk Image Conversion Service |
Bulk Image Transformer |
Fixity Service |
Bit Integrity Checker |
Image Conversion Service |
Image Transformer |
ImageMagick Service |
System Transformer Utility |
JPEG 2000 Image Viewer Service |
Image Server |
Media Streaming Service |
Media Streamer |
Replication On Demand Service |
Duplicate on Demand |
Replication Service |
Duplicate on Upload |
Web App Utility Service |
System WebApp Utility |

Posted by bbranan at Oct 26, 2010 21:52
|
|