Hierarchical Archive and the Job Hub

Data Lifecycle Management in Zoom

From 7.3 onwards, Zoom offers a more sophisticated data lifecycle management (DLM) experience. On the data ingestion side, the more tightly integrated VideoFX and VideoLX modules have become much easier to set up as well as pack more powerful new features like support for image sequences, etc. Post ingestion, the search is the most important and oft-used function that people work with, and Zoom 7.3 offers a significant improvement for Search – again in ease of deployment as well as big performance improvements and key bug fixes.

From Zoom 7.3 onwards, the assets that are ingested in Zoom can be managed and stored at various hierarchical levels/tiers. With the introduction of this overarching archive mechanism, including support for cloud storage, Zoom now boasts of a comprehensive array of end of lifecycle options too. All of these features are available out of the box, without having to depend on complex integrations using Perl or other scripts.

Hierarchical Archive

Zoom inherently has two distinct types of assets. The first type covers all assets that are directly ingested into the repository, and reside in there (direct-ingest assets). The second type is the set of assets that are indexed, and their proxies get ingested into the repository; the actual assets remain on an external storage location and get referenced from this location seamlessly by the Zoom applications (external assets). 

For a while now, Zoom has supported archive of direct-ingest assets from the server database (typically a fast SAN) to a different disk location (typically a slower / less-expensive NAS). In 7.3, Zoom introduces support for an extended archive service based on hierarchical archiving. The new changes allow the users to move data to different storage locations (including cloud) using the familiar Asset Browser interface while offering several flexible options regarding the selection of asset type (whether direct or proxy) and asset resolutions (if proxy, whether high-res or mid-res/mezzanine). Thus a comprehensive and seamless archive experience is made available out of the box, for both direct and external assets. 

To accomplish this, Zoom uses a new custom service called the Job Hub.

The Job Hub

The Evolphin Job Hub, or the Hub, is a central framework to execute a variety of jobs. It supports a REST-based interface for job submission as well as tracking. The hub is capable of moving data across different realms (like SAN, NAS, tape, cloud, etc). It also comes with built-in fail-safe mechanisms in the form of implicit integrity checks, retries to manage the network or other interruptions, etc. In addition, the hub also provides a basic tracker dashboard for the administrators’ convenience that can provide real-time progress updates as well as use analytics to understand and interpret trends. 

For more details see the relevant sections here

In 7.3, the proxy files created during VideoFX / VideoLX ingest operations of high-res assets will not get archived and will remain in the Zoom database. Only the high-res and mid-res/mezzanine assets from the PSAN locations will be archived. 

The Notion of Tiers

Let us now explore how Zoom stores and retrieves data on various layers or tiers. Zoom storage can be categorized into 3 distinct storage tiers as described below. One could think of the tiers as concentric circles around the Zoom Server, starting from tier-0 and growing outward. The ones closest to the center are the fastest to access; the ones farthest are the slowest to access.   

Tier 0

This is the Zoom database. All directly ingested assets’ data and all of Zoom’s database files reside in this tier.

In Zoom 7.3, tier-0 can only be a file-system storage type. 

Tier 1

Zoom Server treats two specific targets as Tier-1 or the first-level external storage. 

  • One category is the group of third-party mount-points (TPM), in which the external (indexed) assets are placed.
  • The second category is where the direct ingest assets are moved to when they are first archived. (Typically a slower NAS). This is viewed as the primary archive location, and from here they may get moved to Tier-2 storage. 
In Zoom 7.3, tier-1 can only be a file-system storage type. 

Tier 2 

This is the secondary storage where both direct assets and external assets could get archived to. Zoom 7.3 comes with built-in support for extended archive to Tier-2 targets. The Evolphin Job Hub is responsible for handling all data movement to and from this tier.  

As of Zoom 7.3, tier-2 can be either a file-system or an Amazon S3 bucket. 

In the future, more tiers will be supported; currently, Zoom only handles until tier-2. The movement of data is always from one tier to the next. There is currently no way to move directly from tier 0 to tier 2 or vice versa. 

How many Hubs do we need?

A single Zoom Server can integrate with any number of hubs. The decision about how many hubs we need will depend on how many geographical locations the external (high-res / mid-res) assets are distributed over. Using the VideoFx and VideoLx ingest engines, assets may be placed in PSANs across multiple locations. Therefore, if these assets are required to be archived, we would have to set up a hub in each of those locations so that the hub has physical access to the assets. 

In addition to the PSANs, it is also useful to set up a hub in the LAN where the Zoom Server’s internal archive location is. This hub could then handle the extended archive of all direct-ingest assets (by moving data from this Tier-1 storage to any Tier-2 storage configured). 

A single hub can service multiple TPM (PSAN) paths as long as it has physical access to them.