Zoom Indexer

Zoom has an extensible framework using which it is possible to augment the off-the-shelf functionalities provided. One such enhancement is the ability to define a class of assets that physically reside outside of the Zoom system, but still can be referenced from within Zoom by means of association with proxies. The Indexer suite of scripts provide this functionality – of indexing assets from file system, creating proxies, and setting up necesssary reference mechanisms within the Zoom repository using configurable set of metadata. 

This post covers the installation and configuration details of the Indexer scripts. 

Indexer Action Items 

Given here are roughly the list of tasks executed by the Indexer. 

  • Iteration over the source location configured
  • Identification of the assets eligible to be indexed based on the configured file types / file name patterns 
  • Batch management
  • Extraction of thumbnail
  • Ingest of proxies along with reference information about the original assets into Zoom repository

Installation Instructions

  • Unzip indexer package to /home/evolphin/zoom-deploy
  • Configure /home/evolphin/zoom-deploy/Indexer/conf/config.ini
  • Create directories for indexing process if does not exists
          Staging Area 1 – Directory which will host the symlinks to source file

     

          Staging Area 2 – Indexer script will use this directory for batching.
          Staging Area 3 – Indexer script will use this directory to create thumbnail/proxies for the current batch
          Temporary Directory – On exit Indexer will cleanup this temporary directory if cleanupTmpDir is configured to 1
          Rejected Assets Directory – Directory which will host the assets rejected by indexer based on configured Exclude list

Config Help

SECTION NAMEFIELD NAME=SAMPLE VALUEDESCRIPTION
ZOOMSERVERserverURL=http://192.168.0.162:8880Zoom server host and port
 webminURL=http://192.168.0.162:8443Zoom web management console host and port
 serverUsername=adminZoom username
 serverPassword=adminZoom Password for the username configured above
ENVIRONMENTzmPath=/home/evolphin/zoom/bin/zmPath to zoom executable
 zmInstallDir=/home/evolphin/zoomZoom installation directory
 cleanupTmpDir = 1Flag to enable or disable cleaning up the temp directory used by indexer
 tmpDir = /home/evolphin/zoom-deploy/tmp/filesInQueueTemporary directory used by Indexer
BATCHCONFIGbatchSizeInMB = 1024Maximum size of a batch
 maxFilesInBatch = 25Maximum files in a batch
MIGRATORstaging1Dir = /home/evolphin/zoom-deploy/Indexer/staging1Directory which hosts the symlinks of source files to migrate
 

staging2Dir = /home/evolphin/zoom-deploy/Indexer/staging2

Directory which hosts the symlinks of current batch to be migrated

 staging3Dir = /home/evolphin/zoom-deploy/Indexer/staging3Directory which hosts the thumbnail/proxy files of the current batch
 /home/evolphin/zoom-deploy/Indexer/rejectedAssets

Directory which hosts the files rejected. File Types specified in Exclude list

 

zoomProjectPath = defproj

Project path in zoom to which files are to be migrated
 

retry = 3

No. of times to retry import on failures

 

replaceWith=_

Illegal characters in file path will be replaced with this character

 

IllegalChars =<<EOL
:


@
\*
\?
\+
\|
\\
\/
\<
\>
\r
\n
EOL

Illegal characters that has to be replaced in file path

 

Exclude =<<EOL
.*\/\..*$
^\~.*$
\.make\.state$
\.nse_depinfo$
\.old$
\.bak$
\.orig$
\.exe$
\.zip$
\.ln$

EOL

File types that should skip zoom import and move to rejected-assets directory
VIDEOuseEncoder = 1Flag to enable or disable encoder. If set to 0, video placeholder configured in PLACEHOLDER section will be used
 ENCODER = /home/evolphin/zoom/lib/imagemagick/ffmpegPath to video encoder
 ENCODER-ARGS = -y -ss 00:00:01.000 -vframes 1 -an -dn -q:v 0 -vf scale=300:300/darInput arguments to video encoder
IMAGEuseEncoder = 1Flag to enable or disable encoder. If set to 0, image placeholder configured in PLACEHOLDER section will be used
 ENCODER = /home/evolphin/zoom/lib/imagemagick/convert Path to image encoder
 ENCODER-ARGS = -quiet -auto-orient -background white -thumbnail 300^ -flattenInput arguments to image encoder
MOUNTSPECingestMountPrefix = TPM:Common string to the TPM location which will be replaced with path from respective access points
 ingestFilesRoot = /home/evolphin/SourceFiles-MigrationSource files root location
METADATAproxy = ZPIG:ProxyMetadata field to indicate if the asset is proxy
 legacyFilePath = ZPIG:Native File PathMetadata field to indicate the location of source file
FILEFORMATS

imageFiles=<<EOL
\.AI$
\.AEP$
\.ALI$
\.BMP$
\.C4D$
\.EPS$
\.FLA$
\.GIF$

EOL

 

 

List of image file formats used by Imagemagick for thumbnail extraction

 

deoFiles=<<EOL
\.AVI$
\.DV$
\.FLV$
\.M2V$
\.M4V$
\.MOV$

EOL

List of video file formats used by ffmpeg for thumbnail extraction
 

audioFiles=<<EOL
\.AAC$
\.AC3$
\.AIF$
\.AIFF$
\.AMR$

EOL

 List of file formats that are to be treated as audio files by Indexer
PLACEHOLDERS

image = /home/evolphin/zoom-deploy/PH/small/icon_image.jpg

Default placeholder for image file formats
 video = /home/evolphin/zoom-deploy/PH/small/video.pngDefault placeholder for video file formats
 audio = /home/evolphin/zoom-deploy/PH/small/audio.jpgDefault placeholder for audio file formats
 default = /home/evolphin/zoom-deploy/PH/small/page_white.pngDefault placeholder for other file formats
THUMBNAILSimage = /home/evolphin/zoom-deploy/PH/small/icon_image.jpgDefault thumbnail for image file formats
 

video = /home/evolphin/zoom-deploy/PH/small/video.jpg

Default thumbnail for video file formats
  audio = /home/evolphin/zoom-deploy/PH/small/audio.jpg Default thumbnail for video file formats
  default = /home/evolphin/zoom-deploy/PH/small/page_white.jpg Default thumbnail for video file formats

Setup Instructions

Maintain a checklist document to keep track of the underlined items listed below. This will help us confirm the integrity of the indexing operation, and also to track and trace any errors that we might encounter.
  • Make note of the source path to be indexed.
  • Make note of the destination project to which files will be imported
  • Make note of start date & exact time of indexing run
  • To start indexer from clean slate, clear out old logs, rejected assets directory. During subsequent runs, backup previous run’s logs, rejected assets directory
  • Make a note of the number of files at source. Use command "find-type f | wc -l"
  • Save the list of source files in a text file. It will be used to be check integrity. Use command "find-type f > sourceList.txt"
  • Create links to staging 1 for both Indexer and metadata Indexer. Use command “cp -as
  • Apply write permission recursively to all directories/files under staging1. Use command "chmod -R 764 "
  • Make a note of the number of symlinks at staging 1.Use command "find -type l | wc -l"
  • Save list of staging 1 files in text file. It will be used to be check integrity. Use command “find -type l > staging1Files
  • Verify the directory structure in staging 1.
  • Check log4perl.conf – Max size,count,log level. Recommend setting count and size sufficiently large, if periodic backups are not planned
  • Ascertain that the count of files in source and the count of symlinks in staging 1 are equal
Backup logs periodically

Execution

  • Run indexer. Use command “nohup ./indexer.pl > nohup.out
  • Make note of end date of indexing run
  • On successful completion of indexer, start metadata indexer. “nohup ./metadata.pl > nohup.out

Post-Indexing Analysis

  • Make note of number of files imported into Zoom repo during the run. Run in browser, "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=list&only-facets=true&source=zoom-path-to-check"
  • Make note of number of rejected files. Use command "find -type l|wc -l" on indexer machine
  • Make note of number of files unaccounted for. Take the difference in count before start and after completion
  • Make note of  number of files that do not have Native File Path applied. Run in browser, "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find&prop-name=ZPIG_Native File Path&contains=false&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/blankMetadata.txt"
  • Make note of  number of files that have Native File Path that is not resolving correctly on disk. Run in browser "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find&prop-name=ZPIG_Native File Path&check-exists=true&tpm-prefix=TPM:&tpm-org=source-file-root&get-absent=false&skip-purged=true&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/invalidMetadataList.txt"
  • Make note of number of 0-byte files. Run in browser, "http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find-on-size&threshold=0&get-lesser=true&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/zero.txt"
  • Make note of  number of source files containing trailing spaces. In the list of source file paths fetched during start, pattern match for trailing spaces and fetch count
  • Make note of the number of new files that were added after initial symlink creation in staging 1. Take the difference in source file count before start and after completion
  • Make note of  number of files which were symlinked but got deleted before indexing could complete. In the list of source file paths fetched during start, check for file paths that do not resolve to disk

Leave a Comment