ZooKeeper Deployment
Configuration and Deployment
ZooKeeper Admin Guide contains all information that is required to configure and deploy ZooKeeper. In short, follow these steps:
Set up a cluster with 2*F + 1 nodes,
where F is the number of nodes that can fail without affecting the cluster availability.Note: A dedicated transaction log SSD device is a key to consistently good performance. If you put the log on a busy device, it will negatively affect performance.
Configure SSL authentication (if required).
Define ongoing data directory cleanup.
Setup supervision to enable cluster self-healing.
Enable support for TTL nodes.
This feature is used by DLQ and if it isn't enabled, unused data accumulates in ZooKeeper. If you use official Zookeeper Docker image, you can enable TTL nodes support by adding the following section todocker-compose.yml
:docker-compose.yml Expand source
environment: - "JVMFLAGS=-Dzookeeper.extendedTypesEnabled=true"
Persistent Data
A standalone ZooKeeper requires persistent data. It stores temporal runtime state (autodiscovery information, locks, sequence, dead letter queue metainformation) but does not store master/transactional data.
For multi-region disaster recovery scenarios, you don't need to replicate standalone ZooKeeper volumes from one region to another. The state is recovered from the deployed applications and checkpointed state in Cassandra.
You can deploy a standalone ZooKeeper on Kubernetes or dedicated nodes; there are no specific Genesis restrictions. The default option for EIS is Kubernetes.
Hardware Requirements
We recommend 3 instance for the standalone ZooKeeper. The hardware requirements are as follows:
- 4GB RAM
- 4 vCPU
- 100GB SDD
Kubernetes deployment resource constraint example
resources:
limits:
cpu: 3500m
memory: 3072Mi
requests:
cpu: 2500m
memory: 2560Mi
volumeMounts:
- name: data
mountPath: /data
- name: config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "100Gi"
storageClassName: "managed-premium"
Monitoring
Use either of the following options to monitor Zookeeper (both manually and automatically):
The Four Letter Words Zookeeper commands
Manual Data Management
For manual CRUD operations, use either of the following tools:
Zookeeper Command Line Interface
Both of them are client-side tools, so a developer can install them on the workstation and use them to connect to the cluster.
ZooKeeper in Genesis
Dead Letter Queue
Dead Letter Queue uses ZooKeeper for keeping queues’ statuses (persistent znodes) and information about distributed locks (ephemeral znodes).
Queues’ statuses are kept in /Genesis/DeadLetterQueue/Queues
and they
contain JSON that describes the queue:
DLQ Attributes
Attribute name |
Correct values |
Meaning |
status |
FAILED PENDING |
Current status of a queue. See the DLQ page to find information about queues’ statuses. |
consumerGroup |
Can be any string, but usually, it’s a fully qualified name of the message handler. |
Stream message consumer group name. |
correlationId |
Can be any string, but usually, it’s a UUID. |
Message correlation ID, defined by MessageKey. |
streamName |
Can be any string. |
Name of the message’s stream. |
See Mapping between Kafka concepts and Stream Messaging API for details.
Distributed locks are presented by empty znodes:
Entity Linking
Entity Linking
Service in
the Domain
Framework uses
ZooKeeper for external link resolution, and stores URLs to endpoints for
loading entities by the links. These links are stored in path
/Genesis/EntityLinks
as persistent znodes with URLs as data:
The link should be resolvable and the endpoint should be up and running.
External Model
External Model
plugin in
Modeling API uses
ZooKeeper to store and broadcast information about models that is needed
for model discovery service. The info is stored as JSON data in
persistent znodes in path /Genesis/External
:
External Model Attributes
Attribute name | Correct values | Meaning |
---|---|---|
name | String | Entity DSL type name. |
version | String with integer | Model version |
resourceClass | String with the fully qualified name of a class | Subclass of ModeledResource that represents the entity in runtime |
location | URL | URL that is used to load the instance of the modeled resource . The link should be resolvable and the endpoint should be up and running. |
Jobs
Jobs use
ZooKeeper to implement executor locks for job executors. They store the
lock owner (application name) data in persistent znodes in the path
/Genesis/Lock/Executor
:
Kafka Connector
Kafka
Connector uses
ZooKeeper to implement distributed locks that are used to lock the Kafka
topic deployment to ensure that no more than one deployer deploys the
same Kafka topic at the same time. The lock info are persistent znodes
without data in the path /Genesis/Lock/Kafka
:
Metrics API
Metrics API stores
application nodes health check info as persistent znodes with JSON data
in the path /Genesis/HealthCheck
:
The payload is the result of com.codahale.metrics.health.HealthCheckRegistry#runHealthChecks() and it is defined individually for every application by using MetricRegistryFactory (see Codahale Metrics documentation for details). By default, infrastructure provides metrics for Lifecycle Framework and Stream Messaging.
Stream Messaging
Stream
Messaging
stores stream message metadata and checkpoint info as JSON data in
persistent znodes in the path /Genesis/Streams
:
Stream Messaging Attributes
Attribute name | Correct values | Meaning |
---|---|---|
streamName | Any string | Name of the message’s stream. See Mapping between Kafka concepts and Stream Messaging API for details. |
envelopeType -> typeName | Fully qualified class name | The class name of Stream message envelope class. It should be a subclass of MessageEnvelope. |
envelopeType -> parent -> typeName | Fully qualified class name | The class name of the parent class of the message envelope class. It can go several levels deep recursively until it comes to MessageEnvelope that is always the last parent. |
applicationName | String | Application name that published the metadata that is taken from the genesis.app.name Spring property. |
checkpointInfo -> partition | Integer | Kafka Topic partition number. |
checkpointInfo -> offset | Long | The offset of the record in the topic/partition. |
messageMetadata -> hostname | String | Name of Kafka channel created by DeploymentConfiguration#getHostChannel with using StreamNameFormatter. |
messageMetadata -> messageType -> typeName | Fully qualified class name | The class name of Stream message payload class. It should be a subclass of MessageBody. |
messageMetadata -> messageType -> parent -> typeName | Fully qualified class name | The class name of the parent class of the message payload. It can go several levels deep recursively until it comes to MessageBody that is always the last parent. |
Sequence Generator
Sequence Generator stores the current values of the generators as
persistent znodes in paths like /genesis-sequence/[keyspace]/[name]
(see SequenceGenerator Javadoc for details), for instance:
The payload contains current values of the sequences that are generated
by invoking methods
of SequenceGenerator.
Be aware that removing znodes doesn't break generators, because in that
case a generator restores itself from the Cassandra checkpoint and it
guarantees generating unique values after recovery. The only visible
consequence of removing sequence generators from ZooKeeper is a gap
between the last generated value before the removal and the first
generated value after the restoring, and this gap won't be greater than
the value of the genesis.sequence.checkpoint.size
Spring placeholder
that is 1000 by default.
Time Shifter
Time
Shifter
uses ZooKeeper to store current time shift
info.
The info is stored as JSON data in persistent znodes in
path /Genesis/Timeshift
:
Time Shift Info Attributes
Attribute name | Correct values | Meaning |
---|---|---|
shiftId | String | Unique identifier of the shift operation |
originalTime | String with ISO 8601 datetime | Original, non-shifted datetime of the server |
beforeShift | String with ISO 8601 datetime | Datetime before shift occured |
afterShift | String with ISO 8601 datetime | Datetime after shift occured |
Troubleshooting
Performance and Consistency Problems
When profiling shows that ZooKeeper is a bottleneck for all parts of the system, or if read/write operations are not consistent, you need to check Things to Avoid from the ZooKeeper Admin Guide.
Infinite Reconnect Loop
When the application is stuck in a reconnect loop spamming connected/disconnected messages in logs, that is usually a symptom of the following errors:
ZooKeeper IO/network issues with low session timeout configured.
Solution:- Check the ZooKeeper server network and disk latency.
- Increase session timeout matching your cluster performance.
The application is trying to download a big chunk of data and exceeds session timeout.
Solution:- Check ZooKeeper
/Genesis
subfolders for large nodes or a large number of children to find the cause. - Increase session timeout and jute.maxbuffer property.
- Check ZooKeeper
The application’s JVM is not healthy and is unable to process a request in time.
Solution:- Check application memory and CPU consumption.
- Increase application heap size if needed.
ExternalModelNotAvailableException Errors in Logs
Application logs ExternalModelNotAvailableException
during
startup/event-processing means that the external model was not
registered or resolved properly. This is usually a symptom of one of the
following:
Unresolvable hostname configured into
NM_HOST
environment variable orHOSTNAME
system property.
Solution:- Check
/Genesis/External
ZooKeeper folder contents to find the invalid configuration. - Update configuration so that it points to a resolvable node.
- Check
The target application is not yet started. Solution: Check the Kubernetes startup/readiness probe configuration to avoid exposing unstarted apps.
The target application was undeployed. You can ignore such errors as they are handled by the retry mechanism.
External Link Parsing Problems
JsonValueFormatError
The application fails to parse external links throwing
JsonValueFormatError
:
A link has an invalid structure. Solution: Make sure that the link structure corresponds to the link schema.
Link target is unregistered (only applicable to cross-microservice link resolution). Solution: check that link model is registered in
/Genesis/EntityLinks
.
InvocationError
The application fails to resolve link throwing "Cannot resolve external
link" InvocationError
:
- The link target is unresolvable.
Solution:- Check URI in the
/Genesis/EntityLinks
link model node. - Check targeted facade application
NM_HOST
environment variable andHOSTNAME
system properties.
- Check URI in the
Related Pages
Platform Javadocs
- ModeledResource
- StreamNameFormatter
- MessageBody
- MessageEnvelope
- MetricRegistryFactory
- DeploymentConfiguration
Genesis Documentation
- ZooKeeper Connector: includes Disaster Recovery (DR) details
- Kafka Connector
- Lifecycle
- Entity Linking Services
- Domain Framework
- Domain Framework Entity DSL
- Metrics API
- Modeling API
- [Plugins] External Model
- Mapping between Kafka concepts and Stream Messaging API
- Dead Letter Queue
- Configure SSL Authentication in ZooKeeper
- Define and schedule Jobs
- Stream Messaging
ZooKeeper Documentation
- Things to avoid when configuring Zookeeper
- ZooKeeper JMX
- ZooKeeper Administrator's Guide
- Connecting to ZooKeeper
- ZooKeeper cluster long term maintenance
- TTL nodes support