Hector (API)
Hector is a high-level client API for Apache Cassandra. Named after Hector, a warrior of Troy in Greek mythology, it is a substitute for the Cassandra Java Client, or Thrift,[2] that is encapsulated by Hector.[3] It also has Maven repository access.[4]
History
[edit]As Cassandra is shipped with the low-level Thrift (protocol), there was a potential to develop a better protocol for application developers. Hector was developed by Ran Tavory as a high-level interface that overlays the shortcomings of Thrift. It is licensed with the MIT License that allows to use, modify, split and change the design.[dubious – discuss]
Features
[edit]The high-level features of Hector are[2]
- A high-level object oriented interface to Cassandra: It is mainly inspired by the Cassandra-java-client. The API is defined in the Keyspace interface.
- Connection pooling. As in high-scale applications, the usual pattern for DAOs is a large number of reads/writes. It is too expensive for clients to open new connections with each request. So, a client may easily run out of available sockets, if it operates fast enough. Hector provides connection pooling and a nice framework that manages the details.
- Failover support: As Cassandra is a distributed data store where hosts (nodes) may go down. Hector has its own failover policy.
Type | Comment |
---|---|
FAIL_FAST | If an error occurs, it fails |
ON_FAIL_TRY_ONE_NEXT_AVAILABLE | Tries one more host before giving up |
ON_FAIL_TRY_ALL_AVAILABLE | Tries all available hosts before giving up |
- JMX support: Hector exposes JMX for many important runtime metrics, such as number of available connections, idle connections, error statistics.
- Load balancing: A simple load balancing exists in the newer version.[5]
- Supports the command design pattern to allow clients to concentrate on their business logic and let Hector take care of the required plumbing.
Availability metrics
[edit]Hector exposes availability counters and statistics through JMX.[6]
Load balancing
[edit]Hector follows two load balancing policies with the LoadBalancingPolicy
interface. The default is called RoundRobinBalancingPolicy
and is a simple round-robin distribution algorithm. The LeastActiveBalancingPolicy
routes requests to the pools having the lowest number of active connections, ensuring a good spread of utilisation across the cluster. . [7]
Pooling
[edit]The ExhaustedPolicy
determines how the underlying client connection pools are controlled. Currently, three options are available:[8]
Type | Comment |
---|---|
WHEN_EXHAUSTED_FAIL | Fails acquisition when no more clients are available |
WHEN_EXHAUSTED_GROW | The pool is automatically increased to react to load increases |
WHEN_EXHAUSTED_BLOCK | Block on acquisition until a client becomes available (the default) |
Code examples
[edit]As an example, an implementation of a simple distributed hashtable over Cassandra is listed.
/** * Insert a new value keyed by key * @param key Key for the value * @param value the String value to insert */ public void insert(final String key, final String value) throws Exception { execute(new Command(){ public Void execute(final Keyspace ks) throws Exception { ks.insert(key, createColumnPath(COLUMN_NAME), bytes(value)); return null; } }); } /** * Get a string value. * @return The string value; null if no value exists for the given key. */ public String get(final String key) throws Exception { return execute(new Command(){ public String execute(final Keyspace ks) throws Exception { try { return string(ks.getColumn(key, createColumnPath(COLUMN_NAME)).getValue()); } catch (NotFoundException e) { return null; } } }); } /** * Delete a key from cassandra */ public void delete(final String key) throws Exception { execute(new Command(){ public Void execute(final Keyspace ks) throws Exception { ks.remove(key, createColumnPath(COLUMN_NAME)); return null; } }); }
References
[edit]- ^ "Releases · hector-client/Hector". GitHub.
- ^ a b Ran Tavory. "Hector – a Java Cassandra client". PrettyPrint.me. Retrieved 2011-03-23.
Out of the box Cassanra provides a raw thrift client, which is OK, but lacks many features essential to real world clients. I've built Hector to fill this gap.
Here are the high level features of Hector, currently hosted at github.- A high-level object oriented interface to cassandra.
- Failover support.
- Connection pooling.
- JMX support.
- Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing.
- ^ "Hector Client for Apache Cassandra: Encapsulation of Thrift API" (PDF). DataStax. Retrieved 2011-04-12.
Hector now completely encapsulates the Thrift API so developers have to deal only with the Hector client using familiar design patterns. The original API is still available for existing users to transition their current projects as well as for those who are comfortable working with Thrift.
- ^ "Hector Client for Apache Cassandra: Fully Mavenized" (PDF). DataStax. Retrieved 2011-04-12.
Since the beta release of Cassandra 0.7.0, Riptano has been offering maven repository access for dependencies required for Cassandra usage via Hector.
- ^ Ran Tavory. "Load balancing and improved failover in Hector". PrettyPrint.me. Retrieved 2011-03-23.
ve added a very simple load balancing feature, as well as improved failover behavior to Hector. Hector is a Java Cassandra client, to read more about it please see my previous post Hector – a Java Cassandra client. In version 0.5.0-6 I added poor-man's load balancing as well as improved failover behavior.
- ^ "Hector Client for Apache Cassandra: Availability of Metrics" (PDF). DataStax. Retrieved 2011-04-12.
To facilitate smoother operations and better awareness of performance characteristics, Hector exposes both availability counters and, optionally, performance statistics through JMX.
- ^ "Hector Client for Apache Cassandra: Basic Load Balancing" (PDF). DataStax. Retrieved 2011-04-12.
Hector provides for plugable load balancing through the
LoadBalancingPolicy
interface. Out of the box, two basic implementations are provided:LeastActiveBalancingPolicy
(the default) andRoundRobinBalancingPolicy
.LeastActiveBalancingPolicy
routes requests to the pools with the lowest number of active connections. This ensures a good spread of utilization across the cluster by sending requests to the machine that has the fewest connections.RoundRobinBalancingPolicy
implements a simple round-robin distribution algorithm. - ^ "Hector Client for Apache Cassandra: Configuration of Pooling" (PDF). DataStax. Retrieved 2011-04-12.
The behavior of the underlying pools of client connections can be controlled by the ExhaustedPolicy. […]