Thursday, January 19, 2017

ZooKeeper: Knowing the Basics

ZooKeeper can be run in two different modes:[1]
  • Standalone
    • is convenient for evaluation, development, and testing
  • Replicated
    • should be used in production
In this article, we will focus on running zookeeper in replicated mode and knowing its basics.

ZooKeeper Service 

Apache ZooKeeper can be used in distributed applications (e.g., Yahoo! Message Broker) to enable highly reliable distributed coordination.  First example is that you can use zookeeper for the high-availability of Spark standalone masters.[2] A standalone Spark Master can run with recovery mode enabled by using zookeeper and be able to recover state among the available swarm of masters. Another example is that HDFS NameNode HA can be enabled to allow a cluster to be configured such that a NameNode is not a single point of failure.[10] In these cases, HDFS relies on Zookeeper to manage the details of failover.

ZooKeeper Functionalities

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (or znodes), much like a file system. Here are the high-level descriptions of its functionalities:
  • Provides similar semantics as Google's Chubby for coordinating distributed systems, and being a consistent and highly available key-value store makes it an ideal cluster configuration store and directory of services
  • Is a centralized coordination service for
    • maintaining configuration data:
      • status information
      • configuration (e.g., security rules)
      • location information
    • naming
    • providing distributed synchronization
    • providing group services
  • Provides the following capabilities
    • Consensus
    • Group management
    • Presence protocols
  • Provides a variety of client bindings is available for a number of languages 
    • In the release
      • ships with C, Java, Perl and Python client bindings
    • From the community
      • check here for a list of client bindings that are available from the community but not yet included in the release

ZooKeeper Architecture

When run in replicated mode, the zookeeper service comprise of an Ensemble of servers.  A Zookeeper cluster, Ensemble, consists of a Leader node and followers. A leader node is chosen by consensus within the ensemble. If the leader fails another node will be elected as leader.

The design for the Ensemble requires that all know about each other.  Zookeeper servers maintain an in-memory image of the data tree along with a transaction logs and snapshots in a persistent store.  The downside to an in-memory database is that the size of the database that zookeeper can manage is limited by memory.

ZooKeeper Servers

To start ZooKeeper you need a configuration file which governs ZooKeeper's behavior. Here is a sample in conf/zoo.cfg:


The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file myid in the data directory (i.e., dataDir). That file contains the server number, in ASCII.

Note the two port numbers after each server name: " 2888" and "3888". Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.

For the details of other configuration parameters, read here.

ZooKeeper Clients

Clients only connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, get responses, gets watch events, and send heartbeats. If the TCP connection to the server breaks, the client will connect to a different server.

In ZooKeeper's configuration file, clientPort (e.g. 2181) is used to specify the port to listen for client connections.  There are two ways to connect to ZooKeeper service using:
  • telnet or nc[6] 
  • ZooKeeper Command Line Interface (CLI)[7] 

telnet or nc

You can issue the commands to ZooKeeper via telnet or nc, at the client port.  Each command is composed of four letters.  For example, command ruok can test if server is running in a non-error state. The server will respond with imok if it is running. Otherwise it will not respond at all.

$ echo ruok | nc 2181

For the full list of commands, read here.

ZooKeeper Command Line Interface (CLI)[7] is used to interact with the ZooKeeper ensemble for development purpose. It is useful for debugging and working around with different options.

To perform ZooKeeper CLI operations, first turn on your ZooKeeper server (“bin/ start”) and then, ZooKeeper client (“bin/”).

$ ./current/zookeeper-client/bin/
Connecting to localhost:2181

[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
        stat path [watch]
        set path data [version]
        ls path [watch]

[zk: localhost:2181(CONNECTED) 1]

For information on installing the client side libraries, refer to the Bindings section of the ZooKeeper Programmer's Guide.


  1. Getting Started (Zookeeper)
  2. Spark Standalone - Using ZooKeeper for High-Availability of Master
  3. How to run zookeeper's commands from bash?
  4. Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
  5. ZooKeeper Administrator's Guide
  6. [Apache ZooKeeper] command line Guide
  7. ZooKeeper - CLI
  8. ZooKeeper Client Bindings
  9. ZooKeeper Programer's Guide
  10. Blueprint Support for HA Clusters
  11. All Cloud-related articles on Xml and More

No comments: