Why use zookeeper




















The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories.

Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers. The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems.

The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client. ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble. The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store.

As long as a majority of the servers are available, the ZooKeeper service will be available. Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server. ZooKeeper is ordered. ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions.

Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives. ZooKeeper is fast. It is especially fast in "read-dominant" workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around The name space provided by ZooKeeper is much like that of a standard file system.

Every node in ZooKeeper's name space is identified by a path. Unlike standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. ZooKeeper was designed to store coordination data: status information, configuration, location information, etc.

We use the term znode to make it clear that we are talking about ZooKeeper data nodes. Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd]. ZooKeeper supports the concept of watches. Clients can set a watch on a znode.

A watch will be triggered and removed when the znode changes. When a watch is triggered, the client receives a packet saying that the znode has changed. If the connection between the client and one of the ZooKeeper servers is broken, the client will receive a local notification.

It is similar to DNS, but for nodes. This mechanism helps you in automatic fail recovery while connecting other distributed applications like Apache HBase. Distributed applications offer a lot of benefits, but they throw a few complex and hard-to-crack challenges as well. ZooKeeper framework provides a complete mechanism to overcome all the challenges.

Race condition and deadlock are handled using fail-safe synchronization approach. Another main drawback is inconsistency of data, which ZooKeeper resolves with atomicity. This process helps in Apache HBase for configuration management.

Ensure your application runs consistently. This approach can be used in MapReduce to coordinate queue to execute running threads. Zookeeper - Overview Advertisements. A session has a timeout period — decided by the client. On session expire, ephermal nodes are deleted. To keep sessions alive client sends pings also known as heartbeats. The client library takes care of heartbeats and session management. Though the failover is handled automatically by the client library, application can not remain agnostic of server reconnections because the operation might fail during switching to another server.

Let us say there are many servers which can respond to your request and there are many clients which might want the service. From time to time some of the servers will keep going down. How can all of the clients can keep track of the available servers? It is very easy using ZooKeeper as a central agency. The clients would simply query zookeeper for the most recent list of servers. Lets take a case of two servers and a client. Sequential consistency: Updates from any particular client are applied in the order Atomicity: Updates either succeed or fail.

Single system image: A client will see the same view of the system, The new server will not accept the connection until it has caught up. Durability: Once an update has succeeded, it will persist and will not be undone. Timeliness: Rather than allowing a client to see very stale data, a server would prefer shut down. But if you are using asynchronous API, the client provides a handle to the function that would be called once zooKeeper finishes the operation.

Whenever we execute a usual function or method, it does not go to the next step until the function has finished. Such methods are called synchronous. But when you need to do some work in the background you would require async or asynchronous functions. When you call an async function the control goes to the next step or you can say the function returns immediately. The function starts to work in the background.

Say, you want to create a Znode in zookeeper, if you call an async API function of Zookeeper, it will create Znode in the background while the control goes immediately to the next step. The async functions are very useful for doing work in parallel. Now, if something is running in the background and you want to be notified when the work is done.

In such cases, you define a function which you want to be called once the work is done. This function can be passed as an argument to the async API call and is called a callback. Similar to triggers in databases, ZooKeeper provides watches. The objective of watches is to get notified when znode changes in some way. Watchers are triggered only once. If you want recurring notifications, you will have re-register the watcher. The read operations such as exists, getChildren, getData may create watches.

Watches are triggered by write operations: create, delete, setData. Access control operations do not participate in watches. ACL is a combination of authentication scheme, an identity for that scheme, and a set of permissions. Though there are many usecases of ZooKeeper. The most common ones are: Building a reliable configuration service A Distributed Lock Service Only single process may hold the lock. It is important to know when not to use zookeeper.

All data is loaded in ram too. Also, there is a Network load for transfer all data to all nodes. In this post, we briefly understood some of the nitty-gritty details on various concepts included in the Zookeeper technology. In the next post , we will bolster our understanding of zookeeper with the help of a case study where will be able to appreciate the significance of zookeeper in a given distributed computing scenario.

To know more about CloudxLab courses, here you go! Skip to content.



0コメント

  • 1000 / 1000