hbase
  • Introduction
  • Concepts
  • Hbase Architecture
    • Region servers
      • Regions
      • HFile
      • WAL(HLog)
      • How the Components Work Together
      • BlockCache
      • Region Split
      • Store(HStore OR Memstore))
      • HBase Region Flush
    • Special Tables ROOT , META
    • Fault Tolerance( Failover)
    • Data Locality
    • ZooKeeper: The Coordinator
    • HBase HMaster
    • Read Load Balancing
    • HBase Minor Compaction
    • HBase Major Compaction
    • HBase Read Merge
  • Queries
  • HBase Shell
    • Guide Lines
    • overview-example
  • HBase Read
  • HBase Write
  • Known Use Cases
  • HBase Data Model
  • References
  • Hands on
  • Case Studies
  • Notes
Powered by GitBook
On this page

Was this helpful?

  1. Hbase Architecture

HBase Read Merge

PreviousHBase Major CompactionNextQueries

Last updated 5 years ago

Was this helpful?

the KeyValue cells corresponding to one row can be in multiple places, row cells already persisted are in Hfiles, recently updated cells are in the MemStore, and recently read cells are in the Block cache. So when you read a row, how does the system get the corresponding cells to return? A Read merges Key Values from the block cache, MemStore, and HFiles in the following steps:

  1. First, the scanner looks for the Row cells in the Block cache - the read cache. Recently Read Key Values are cached here, and Least Recently Used are evicted when memory is needed.

  2. Next, the scanner looks in the MemStore, the write cache in memory containing the most recent writes.

  3. If the scanner does not find all of the row cells in the MemStore and Block Cache, then HBase will use the Block Cache indexes and bloom filters to load HFiles into memory, which may contain the target row cells.

there may be many HFiles per MemStore, which means for a read, multiple files may have to be examined, which can affect the performance. This is called read amplification.