ID: 5259

Advanced Curator Configuration with XML Specification

Advanced guide to the XML specification of the Curator service configuration

Minimum Zoom version supported: 6.0

This document describes the various configuration parameters of Curator.

The basic configuration that must be performed involves the following aspects. The associated config fields are also given for each step.

  • Enable or disable file content indexing.
  • Enable or disable non-content indexing.
  • Enable or disable file content searching.
  • Enable or disable non-content searching.
  • The separation of Index and Search enabling fields allows for no disruptions in searching, while indexes are getting created initially for existing assets.
  • Set the Curator service machine host name or IP.
    • Config field: host
  • Change, if required, the path to the Solr database directory on the Curator service machine.
  • Set the Zoom server replicas which this Curator server can connect to for indexing.
    • Only if Zoom HADR is enabled.
    • Config field: hadrServers
    • This allows deploying multiple Curator servers for different Zoom server clusters.
  • Adjust, the frequency for sending email notifications about failed indexing.

The Curator configuration is stored in the ftsSpec section of the server.xml file in the Zoom_Installation_Directory/conf/ directory. Direct editing of the XML file is useful when the Web Admin Control Panel cannot be used, or advanced parameters need to be changed.

Commonly used fields

  • fileContentIndex
    • Enable or disable file content indexing.
    • If true, indexing for file contents will resume from the last indexed revision.
    • If false, no further content indexing will be scheduled.
    • Default value is false.
  • nonFileContentIndex
    • Enable or disable non-content indexing.
    • If true, indexing for non-contents will resume from the last indexed revision.
    • If false, no further non-content indexing will be scheduled.
    • Default value is false.
  • fileContentSearch
    • Enable or disable file content searching.
    • If true, users will be able to search within file text content. Asset Browser’s Content Search and Search Everywhere will search content.
    • If false, file content search will be disabled. Asset Browser’s Content Search won’t work and Search Everywhere will skip content search.
    • Default value is false.
  • nonFileContentSearch
    • Enable or disable non-content searching.
    • If true, all non-content search will be performed through the Curator service.
    • If false, all non-content search will be performed within the Zoom service itself.
    • Default value is false.
  • host
    • Curator service machine host name or IP
    • This host name must be accessible from the Zoom service
    • Default value is 127.0.0.1
  • solrDataDir
    • Path to the Solr database directory on the Curator service machine; where the indexing information of the Zoom repository files is stored.
    • Default value is Zoom_Installation_Directory/db/solr-db.
  • failedDocsNotification
    • Frequency for sending email notifications to the super-admins about documents which failed to get indexed.
    • The value can be any one of:
      • daily
      • hourly
      • weekly
      • disable
  • hadrServers
    • This field needs to be added or configured only when Zoom HADR is set up.
    • Zoom HADR provides High-Availability and Disaster Recovery using multiple Zoom servers. The Curator service can also take advantage of this.
    • This field informs Curator of the Zoom servers it can connect to for indexing.
      It is recommended, though not essential, that the listed servers are in decreasing order of their HADR priority.
    • The Zoom servers are listed as ftswebserverspec fields.
    • Each ftswebserverspec contains the required details of one Zoom server. It consists of the following fields:
      • name
        • Name of the Zoom server
        • It must be the same as that in the Zoom server’s server.xml name field.
      • host
        • Zoom server machine host name or IP
        • This host name must be accessible from the Curator service.
      • port
        • Port number of Zoom server’s web service
        • Default: 8443.
      • sslPort
        • SSL port number of Zoom server’s web service.
        • Default: 9443
      • ssl
        • Flag to enable use of SSL to connect to the Zoom server.
        • The Zoom server must have SSL enabled for its web service for this to work.
    • See the example below for more clarity.

Seldom used fields

  • port
    • Curator service network port.
    • Default value is 8983.
  • sslPort
    • Curator service network port for SSL requests.
    • Default value is 8984.
  • sslEnabled
    • If true, Curator will accept both SSL and non-SSL requests.
    • If false, Curator will accept only non-SSL requests.
    • Default value is false
  • logLevel
    • Level of logging detail
    • Default value is INFO.

Rarely used fields for advanced performance tuning

  • maxThreadCount
    • Maximum number of threads that will run in parallel to index documents.
    • Default value is 8.
  • maxRetryCount
    • Maximum attempts Curator makes to index a document.
    • If Curator fails to index a document, then upon exceeding the number of retry attempts, that document will be marked as FAILED.
    • Documents that fail to index, may be viewed in the Web Admin Console, and re-attempted if requested.
    • Default value is 5.
  • waitTimeoutForCoresToLoad
    • Time (in milliseconds) upto which Curator service will wait for the indexes to load in memory.
    • If Curator fails to load the indexes, this can be set to a higher value. This may be required as the indexes grow in size.
    • Default value is 60000 milliseconds, i.e. 1 minute.
  • maxHighlightResult
    • Maximum number of highlighted snippets of search term occurrences in a document.
    • Default value is 100.
  • commitFrequency
    • Time interval (in milliseconds) after which there will be an index database hard commit for Solr
    • It is recommended that this is set higher than 15 seconds, i.e. 15000 milliseconds.
    • Default value is 25000 milliseconds, i.e. 25 seconds.
  • curatorConnectRetries
    • Max number of attempts Zoom will make to connect to Curator.
    • Default value is 10.
  • curatorConnectRetryGap
    • Time interval (in seconds) to wait and reconnect to Curator
    • Default value is 5 seconds.
  • schedulerFrequency
    • Time interval (in milliseconds) which Curator will fetch latest changes from zoom-server.
    • Default value is 30000 milliseconds.
  • maxRrnsPerQuery
    • Maximum number of repository revisions (RRN) to be fetched to index in a single query
    • Default value is 100.
  • maxFuidsPerQuery
    • Maximum number of assets to be fetched to index in a single query.
    • Default value is 10000.
  • maxMetadataTxnsPerQuery
    • Maximum number of non-checkin metadata operations to be fetched to index in a single query
    • Default value is 500.
  • curatorConnectionTimeout
    • Time (in seconds) after which curator will be considered unavailable for searching.
    • Default value is 120
    • Suppose Curator goes down at 4:00:00 PM. When a search query is performed at 4:00:20 PM, since Curator cannot be connected to, the query will fallback to using internal Zoom search if possible (non-content). Every search query performed till this timeout, will attempt connecting to Curator first, which will add a slight delay to the overall search time. After the configured time of 2 minutes, i.e. by about 4:02:00 PM, Zoom will deem Curator to be unavailable. Further search requests will not even attempt connecting to Curator, and hence save time; till the time Curator itself communicates to Zoom after coming back up.

Other rarely used fields

  • proxyPort
    • Currently not used
  • solrBaseContext
    • Curator service’s internal Solr web API URL base path.
    • Default value is /solr.
  • curatorEndPoint
    • Curator service’s web API URL base path, used to connect to it by Zoom.
    • Default value is /curator.
  • zoomEndPoint
    • Zoom service’s web API URL base path used by Curator.
    • Default value is /zoom/solr.

Note: This sample uses the following HADR setup

  • Zoom service hosts –
    1. 192.168.0.141
    2. 192.168.0.142
    3. 192.168.0.143
    4. 192.168.0.144
  • Preview service hosts –
    1. 192.168.0.145
    2. 192.168.0.146
    3. 192.168.0.147
  • Curator host(s) –
    1. 192.168.0.148

FTS Spec