Zoom has an extensible framework using which it is possible to augment the off-the-shelf functionalities provided. One such enhancement is the ability to define a class of assets that physically reside outside of the Zoom system, but still can be referenced from within Zoom by means of association with proxies. The Indexer
suite of scripts provide this functionality – of indexing assets from file system, creating proxies, and setting up necesssary reference mechanisms within the Zoom repository using configurable set of metadata.
This post covers the installation and configuration details of the Indexer scripts.
Indexer Action Items
Given here are roughly the list of tasks executed by the Indexer.
- Iteration over the source location configured
- Identification of the assets eligible to be indexed based on the configured file types / file name patterns
- Batch management
- Extraction of thumbnail
- Ingest of proxies along with reference information about the original assets into Zoom repository
Installation Instructions
- Unzip indexer package to
/home/evolphin/zoom-deploy
- Configure
/home/evolphin/zoom-deploy/Indexer/conf/config.ini
- Create directories for indexing process if does not exists
Staging Area 1 – Directory which will host the symlinks to source fileStaging Area 2 – Indexer script will use this directory for batching.Staging Area 3 – Indexer script will use this directory to create thumbnail/proxies for the current batchTemporary Directory – On exit Indexer will cleanup this temporary directory if cleanupTmpDir is configured to 1Rejected Assets Directory – Directory which will host the assets rejected by indexer based on configured Exclude list
Config Help
SECTION NAME | FIELD NAME=SAMPLE VALUE | DESCRIPTION |
ZOOMSERVER | serverURL=http://192.168.0.162:8880 | Zoom server host and port |
webminURL=http://192.168.0.162:8443 | Zoom web management console host and port | |
serverUsername=admin | Zoom username | |
serverPassword=admin | Zoom Password for the username configured above | |
ENVIRONMENT | zmPath=/home/evolphin/zoom/bin/zm | Path to zoom executable |
zmInstallDir=/home/evolphin/zoom | Zoom installation directory | |
cleanupTmpDir = 1 | Flag to enable or disable cleaning up the temp directory used by indexer | |
tmpDir = /home/evolphin/zoom-deploy/tmp/filesInQueue | Temporary directory used by Indexer | |
BATCHCONFIG | batchSizeInMB = 1024 | Maximum size of a batch |
maxFilesInBatch = 25 | Maximum files in a batch | |
MIGRATOR | staging1Dir = /home/evolphin/zoom-deploy/Indexer/staging1 | Directory which hosts the symlinks of source files to migrate |
staging2Dir = /home/evolphin/zoom-deploy/Indexer/staging2 |
Directory which hosts the symlinks of current batch to be migrated | |
staging3Dir = /home/evolphin/zoom-deploy/Indexer/staging3 | Directory which hosts the thumbnail/proxy files of the current batch | |
/home/evolphin/zoom-deploy/Indexer/rejectedAssets |
Directory which hosts the files rejected. File Types specified in Exclude list | |
zoomProjectPath = defproj | Project path in zoom to which files are to be migrated | |
retry = 3 |
No. of times to retry import on failures | |
replaceWith=_ |
Illegal characters in file path will be replaced with this character | |
IllegalChars =<<EOL |
Illegal characters that has to be replaced in file path | |
Exclude =<<EOL EOL | File types that should skip zoom import and move to rejected-assets directory | |
VIDEO | useEncoder = 1 | Flag to enable or disable encoder. If set to 0, video placeholder configured in PLACEHOLDER section will be used |
ENCODER = /home/evolphin/zoom/lib/imagemagick/ffmpeg | Path to video encoder | |
ENCODER-ARGS = -y -ss 00:00:01.000 -vframes 1 -an -dn -q:v 0 -vf scale=300:300/dar | Input arguments to video encoder | |
IMAGE | useEncoder = 1 | Flag to enable or disable encoder. If set to 0, image placeholder configured in PLACEHOLDER section will be used |
ENCODER = /home/evolphin/zoom/lib/imagemagick/convert | Path to image encoder | |
ENCODER-ARGS = -quiet -auto-orient -background white -thumbnail 300^ -flatten | Input arguments to image encoder | |
MOUNTSPEC | ingestMountPrefix = TPM: | Common string to the TPM location which will be replaced with path from respective access points |
ingestFilesRoot = /home/evolphin/SourceFiles-Migration | Source files root location | |
METADATA | proxy = ZPIG:Proxy | Metadata field to indicate if the asset is proxy |
legacyFilePath = ZPIG:Native File Path | Metadata field to indicate the location of source file | |
FILEFORMATS |
imageFiles=<<EOL EOL |
List of image file formats used by Imagemagick for thumbnail extraction |
deoFiles=<<EOL EOL | List of video file formats used by ffmpeg for thumbnail extraction | |
audioFiles=<<EOL EOL | List of file formats that are to be treated as audio files by Indexer | |
PLACEHOLDERS |
image = /home/evolphin/zoom-deploy/PH/small/icon_image.jpg | Default placeholder for image file formats |
video = /home/evolphin/zoom-deploy/PH/small/video.png | Default placeholder for video file formats | |
audio = /home/evolphin/zoom-deploy/PH/small/audio.jpg | Default placeholder for audio file formats | |
default = /home/evolphin/zoom-deploy/PH/small/page_white.png | Default placeholder for other file formats | |
THUMBNAILS | image = /home/evolphin/zoom-deploy/PH/small/icon_image.jpg | Default thumbnail for image file formats |
video = /home/evolphin/zoom-deploy/PH/small/video.jpg | Default thumbnail for video file formats | |
audio = /home/evolphin/zoom-deploy/PH/small/audio.jpg | Default thumbnail for video file formats | |
default = /home/evolphin/zoom-deploy/PH/small/page_white.jpg | Default thumbnail for video file formats |
Setup Instructions
- Make note of the source path to be indexed.
- Make note of the destination project to which files will be imported
- Make note of start date & exact time of indexing run
- To start indexer from clean slate, clear out old logs, rejected assets directory. During subsequent runs, backup previous run’s logs, rejected assets directory
- Make a note of the number of files at source. Use command
"find-type f | wc -l"
- Save the list of source files in a text file. It will be used to be check integrity. Use command
"find-type f > sourceList.txt"
- Create links to staging 1 for both Indexer and metadata Indexer. Use command “cp -as “
- Apply write permission recursively to all directories/files under staging1. Use command
"chmod -R 764 "
- Make a note of the number of symlinks at staging 1.Use command
"find -type l | wc -l"
- Save list of staging 1 files in text file. It will be used to be check integrity. Use command “
find -type l > staging1Files
“ - Verify the directory structure in staging 1.
- Check log4perl.conf – Max size,count,log level. Recommend setting count and size sufficiently large, if periodic backups are not planned
- Ascertain that the count of files in source and the count of symlinks in staging 1 are equal
Execution
- Run indexer. Use command “
nohup ./indexer.pl > nohup.out
“ - Make note of end date of indexing run
- On successful completion of indexer, start metadata indexer. “
nohup ./metadata.pl > nohup.out
“
Post-Indexing Analysis
- Make note of number of files imported into Zoom repo during the run. Run in browser,
"http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=list&only-facets=true&source=zoom-path-to-check"
- Make note of number of rejected files. Use command
"find -type l|wc -l"
on indexer machine - Make note of number of files unaccounted for. Take the difference in count before start and after completion
- Make note of number of files that do not have Native File Path applied. Run in browser,
"http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find&prop-name=ZPIG_Native File Path&contains=false&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/blankMetadata.txt"
- Make note of number of files that have Native File Path that is not resolving correctly on disk. Run in browser
"http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find&prop-name=ZPIG_Native File Path&check-exists=true&tpm-prefix=TPM:&tpm-org=source-file-root&get-absent=false&skip-purged=true&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/invalidMetadataList.txt"
- Make note of number of 0-byte files. Run in browser,
"http://webmin-host:port/get?zm_username=zoom-username&zm_password=zoom-password&data=data&op=find-on-size&threshold=0&get-lesser=true&source=zoom-project&path=/home/evolphin/zoom-deploy/post-indexing-analysis/zero.txt"
- Make note of number of source files containing trailing spaces. In the list of source file paths fetched during start, pattern match for trailing spaces and fetch count
- Make note of the number of new files that were added after initial symlink creation in staging 1. Take the difference in source file count before start and after completion
- Make note of number of files which were symlinked but got deleted before indexing could complete. In the list of source file paths fetched during start, check for file paths that do not resolve to disk