HBase
HBase is a clone of Google’s Bigtable, originally created for use with Hadoop (it’s actually a subproject of the Apache Hadoop project). In the way that Google’s Bigtable uses the Google File System (GFS), HBase provides database capabilities for Hadoop, allowing you to use it as a source or sink for MapReduce jobs. Unlike some other columnar databases that provide eventual consistency, HBase is strongly consistent.
Perhaps it is interesting to note that Microsoft is a contributor to HBase, following their acquisition of Powerset.
- Website: http://hbase.apache.org
- Orientation: Columnar
- Created: HBase was created at Powerset in 2007 and later donated to Apache.
- Implementation language: Java
- Distributed: Yes. You can run HBase in standalone, pseudodistributed, or fully distributed mode. Pseudodistributed mode means that you have several instances of HBase, but they’re all running on the same host.
- Storage: HBase provides Bigtable-like capabilities on top of the Hadoop File System.
- Schema: HBase supports unstructured and partially structured data. To do so, data is organized into column families (a term that appears in discussions of Apache Cassandra). You address an individual record, called a “cell” in HBase, with a combination of row key, column family, cell qualifier, and timestamp. As opposed to RDBMS, in which you must define your table well in advance, with HBase you can simply name a column family and then allow the cell qualifiers to be determined at runtime. This lets you be very flexible and supports an agile approach to development.
- Client: You can interact with HBase via Thrift, a RESTful service gateway, Protobuf (see “Additional Features” below), or an extensible JRuby shell.
- Open source: Yes (Apache License)
- Production use: HBase has been used at Adobe since 2008. It is also used at Twitter, Mahalo, StumbleUpon, Ning, Hulu, World Lingo, Detikcom in Indonesia, and Yahoo!.
- Additional features: Because HBase is part of the Hadoop project, it features tight integration with Hadoop. There is a set of convenience classes that allow you to easily execute MapReduce jobs using HBase as the backing data store.
No comments:
Post a Comment