ID: 5655

Zoom Indexer

Zoom has an extensible framework using which it is possible to augment the off-the-shelf functionalities provided. One such enhancement is the ability to define a class of assets that physically reside outside of the Zoom system, but still can be referenced from within Zoom by means of association with proxies. The Indexer suite of scripts provide this functionality – of indexing assets from file system, creating proxies, and setting up necesssary reference mechanisms within the Zoom repository using configurable set of metadata. 

This post covers the installation and configuration details of the Indexer scripts. 

Indexer Action Items 

Given here are roughly the list of tasks executed by the Indexer. 

    • Iteration over the source location configured
    • Identification of the assets eligible to be indexed based on the configured file types / file name patterns 
    • Batch management
    • Extraction of thumbnail
    • Ingest of proxies along with reference information about the original assets into Zoom repository

Installation Instructions

    • Unzip indexer package to /home/evolphin/zoom-deploy
    • Configure /home/evolphin/zoom-deploy/Indexer/conf/config.ini
    • Create directories for indexing process if does not exists
            Staging Area 1 – Directory which will host the symlinks to source file

            Staging Area 2 – Indexer script will use this directory for batching.
            Staging Area 3 – Indexer script will use this directory to create thumbnail/proxies for the current batch
            Temporary Directory – On exit Indexer will cleanup this temporary directory if cleanupTmpDir is configured to 1
            Rejected Assets Directory – Directory which will host the assets rejected by indexer based on configured Exclude list

Config Help

 

SECTION NAME FIELD NAME=SAMPLE VALUE DESCRIPTION
ZOOMSERVER serverURL=http://192.168.0.162:8880 Zoom server host and port
  webminURL=http://192.168.0.162:8443 Zoom web management console host and port
  serverUsername=admin Zoom username
  serverPassword=admin Zoom Password for the username configured above
ENVIRONMENT zmPath=/home/evolphin/zoom/bin/zm Path to zoom executable
  zmInstallDir=/home/evolphin/zoom Zoom installation directory
  cleanupTmpDir = 1 Flag to enable or disable cleaning up the temp directory used by indexer
  tmpDir = /home/evolphin/zoom-deploy/tmp/filesInQueue Temporary directory used by Indexer
BATCHCONFIG batchSizeInMB = 1024 Maximum size of a batch
  maxFilesInBatch = 25 Maximum files in a batch
MIGRATOR staging1Dir = /home/evolphin/zoom-deploy/Indexer/staging1 Directory which hosts the symlinks of source files to migrate
 

staging2Dir = /home/evolphin/zoom-deploy/Indexer/staging2

Directory which hosts the symlinks of current batch to be migrated

  staging3Dir = /home/evolphin/zoom-deploy/Indexer/staging3 Directory which hosts the thumbnail/proxy files of the current batch
  /home/evolphin/zoom-deploy/Indexer/rejectedAssets

Directory which hosts the files rejected. File Types specified in Exclude list

 

zoomProjectPath = defproj

Project path in zoom to which files are to be migrated
 

retry = 3

No. of times to retry import on failures

 

replaceWith=_

Illegal characters in file path will be replaced with this character

 

IllegalChars =<<EOL
:


@
\*
\?
\+
\|
\\
\/
\<
\>
\r
\n
EOL

Illegal characters that has to be replaced in file path

 

Exclude =<<EOL
.*\/\..*$
^\~.*$
\.make\.state$
\.nse_depinfo$
\.old$
\.bak$
\.orig$
\.exe$
\.zip$
\.ln$

EOL

File types that should skip zoom import and move to rejected-assets directory
VIDEO useEncoder = 1 Flag to enable or disable encoder. If set to 0, video placeholder configured in PLACEHOLDER section will be used
  ENCODER = /home/evolphin/zoom/lib/imagemagick/ffmpeg Path to video encoder
  ENCODER-ARGS = -y -ss 00:00:01.000 -vframes 1 -an -dn -q:v 0 -vf scale=300:300/dar Input arguments to video encoder
IMAGE useEncoder = 1 Flag to enable or disable encoder. If set to 0, image placeholder configured in PLACEHOLDER section will be used
  ENCODER = /home/evolphin/zoom/lib/imagemagick/convert  Path to image encoder
  ENCODER-ARGS = -quiet -auto-orient -background white -thumbnail 300^ -flatten Input arguments to image encoder
MOUNTSPEC ingestMountPrefix = TPM: Common string to the TPM location which will be replaced with path from respective access points
  ingestFilesRoot = /home/evolphin/SourceFiles-Migration Source files root location
METADATA proxy = ZPIG:Proxy Metadata field to indicate if the asset is proxy
  legacyFilePath = ZPIG:Native File Path Metadata field to indicate the location of source file
FILEFORMATS

imageFiles=<<EOL
\.AI$
\.AEP$
\.ALI$
\.BMP$
\.C4D$
\.EPS$
\.FLA$
\.GIF$

EOL

 

List of image file formats used by Imagemagick for thumbnail extraction

 

deoFiles=<<EOL
\.AVI$
\.DV$
\.FLV$
\.M2V$
\.M4V$
\.MOV$

EOL

List of video file formats used by ffmpeg for thumbnail extraction
 

audioFiles=<<EOL
\.AAC$
\.AC3$
\.AIF$
\.AIFF$
\.AMR$

EOL

 List of file formats that are to be treated as audio files by Indexer
PLACEHOLDERS

image = /home/evolphin/zoom-deploy/PH/small/icon_image.jpg

Default placeholder for image file formats
  video = /home/evolphin/zoom-deploy/PH/small/video.png Default placeholder for video file formats
  audio = /home/evolphin/zoom-deploy/PH/small/audio.jpg Default placeholder for audio file formats
  default = /home/evolphin/zoom-deploy/PH/small/page_white.png Default placeholder for other file formats
THUMBNAILS image = /home/evolphin/zoom-deploy/PH/small/icon_image.jpg Default thumbnail for image file formats
  video = /home/evolphin/zoom-deploy/PH/small/video.jpg Default thumbnail for video file formats
  audio = /home/evolphin/zoom-deploy/PH/small/audio.jpg Default thumbnail for audio file formats
  default = /home/evolphin/zoom-deploy/PH/small/page_white.jpg Default thumbnail for other file formats

 

Setup Instructions

Maintain a checklist document to keep track of the underlined items listed below. This will help us confirm the integrity of the indexing operation, and also to track and trace any errors that we might encounter.
    • Make note of the source path to be indexed.
    • Make note of the destination project to which files will be imported
    • Make note of start date & exact time of indexing run
    • To start indexer from clean slate, clear out old logs, rejected assets directory. During subsequent runs, backup previous run’s logs, rejected assets directory
    • Make a note of the number of files at source. Use command "find-type f | wc -l"
    • Save the list of source files in a text file. It will be used to be check integrity. Use command "find-type f > sourceList.txt"
    • Create links to staging 1 for both Indexer and metadata Indexer. Use command “cp -as
    • Apply write permission recursively to all directories/files under staging1. Use command "chmod -R 764 "
    • Make a note of the number of symlinks at staging 1.Use command "find -type l | wc -l"
    • Save list of staging 1 files in text file. It will be used to be check integrity. Use command “find -type l > staging1Files
    • Verify the directory structure in staging 1.
    • Check log4perl.conf – Max size,count,log level. Recommend setting count and size sufficiently large, if periodic backups are not planned
    • Ascertain that the count of files in source and the count of symlinks in staging 1 are equal
Backup logs periodically

 

Execution

    • Run indexer. Use command “nohup ./indexer.pl > nohup.out
    • Make note of end date of indexing run
    • On successful completion of indexer, start metadata indexer. “nohup ./metadata.pl > nohup.out

Post-Indexing Analysis

    • Make note of number of files imported into Zoom repo during the run. Run in browser, "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=list&only-facets=true&source=zoom-path-to-check"
    • Make note of number of rejected files. Use command "find -type l|wc -l" on indexer machine
    • Make note of number of files unaccounted for. Take the difference in count before start and after completion
    • Make note of  number of files that do not have Native File Path applied. Run in browser, "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find&prop-name=ZPIG_Native File Path&contains=false&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/blankMetadata.txt"
    • Make note of  number of files that have Native File Path that is not resolving correctly on disk. Run in browser "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find&prop-name=ZPIG_Native File Path&check-exists=true&tpm-prefix=TPM:&tpm-org=source-file-root&get-absent=false&skip-purged=true&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/invalidMetadataList.txt"
    • Make note of number of 0-byte files. Run in browser, "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find-on-size&threshold=0&get-lesser=true&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/zero.txt"
    • Make note of  number of source files containing trailing spaces. In the list of source file paths fetched during start, pattern match for trailing spaces and fetch count
    • Make note of the number of new files that were added after initial symlink creation in staging 1. Take the difference in source file count before start and after completion
    • Make note of  number of files which were symlinked but got deleted before indexing could complete. In the list of source file paths fetched during start, check for file paths that do not resolve to disk