HBase 0.94になってメトリクスがいっぱい追加されたらしい

該当のJIRAは下記です。
https://issues.apache.org/jira/browse/HBASE-5533

JMXを使ってHBaseをモニタリングするということは一般的に行われていると思います。
馬本でも10.4 JMXのところに記述があります。hbase-env.shを修正してjmxのportを開けてそこにアクセスして各種メトリクスを取得します。

メトリクスを取得するツールとしてJMXToolkitというものが紹介されています。
ソースを取得してビルドしてヘルプを表示させるとこんな感じ。

$ git clone https://github.com/larsgeorge/jmxtoolkit.git
$ cd jmxtoolkit/
$ ant
$ java -jar build/hbase-0.20.4-dev-jmxtoolkit.jar -h
Usage: JMXToolkit [-a <action>] [-c <user>] [-p <password>] [-u url] [-f <config>] [-o <object>]
 [-e regexp] [-i <extends>] [-q <attr-oper>] [-w <check>] [-m <message>] [-x] [-l] [-v] [-h]

	-a <action>	Action to perform, can be one of the following (default: query)

			create	Scan a JMX object for available attributes
			query	Query a set of attributes from the given objects
			check	Checks a given value to be in a valid range (see -w below)
			encode	Helps creating the encoded messages (see -m and -w below)
			walk	Walk the entire remote object list

	-c <user>	The user role to authenticate with (default: controlRole)
	-p <password>	The password to authenticate with (default: password)
	-u <url>	The JMX URL (default: service:jmx:rmi:///jndi/rmi://localhost:10001/jmxrmi)
	-f <config>	The config file to use (default: none)
	-o <object>	The JMX object query (default: none)
	-e <regexp>	The regular expression to match (default: none)
	-i <extends>	The name of the object that is inherited from (default: none)
	-q <attr-oper>	The attribute or operation to query (default: none)
	-w <check>	Used with -a check to define thresholds (default: none)

		Format: <ok-exitcode>[:<ok-message>] \
		        |<warn-exitcode>:[<warn-message>]:<warn-value>[:<warn-comparator>] \
		        [|<error-exitcode>:[<error-message>]:<error-value>[:<error-comparator>]]

		Example for Nagios and DFS used (in %):

		        0:OK%3A%20%7B0%7D|2:WARN%3A%20%7B0%7D:80:>=|1:FAIL%3A%20%7B0%7D:95:>

		Notes: Messages are URL-encoded to allow for any character being used. The current value
		       can be placed with {0} in the message. Allowed comparators: <,<=,=,==,>=,>

	-m <message>	The message to encode for further use (default: none)
	-x		Output config to console (do not write back to -f <config>)
	-l		Ignore missing attributes, do not throw an error
	-v		Verbose output
	-h		Prints this help

jmxtoolkitを使ってHBaseのリージョンサーバーのメトリクスを取得するとこんな感じになります。

$  java -jar hbase-0.20.4-dev-jmxtoolkit.jar  -u "service:jmx:rmi:///jndi/rmi://ホスト名:ポート番号/jmxrmi" -o "hadoop:name=RegionServerStatistics,service=RegionServer" | tr ' ' '\n'
totalStaticBloomSizeKB:0
totalStaticIndexSizeKB:0
blockCacheFree:415338528
compactionSizeNumOps:0
compactionSizeAvgTime:0
compactionSizeMinTime:-1
compactionSizeMaxTime:0
numPutsWithoutWAL:0
memstoreSizeMB:0
regions:1
blockCacheCount:0
blockCacheHitRatio:0
flushQueueSize:0
atomicIncrementTimeNumOps:0
atomicIncrementTimeAvgTime:0
atomicIncrementTimeMinTime:-1
atomicIncrementTimeMaxTime:0
fsReadLatencyNumOps:0
fsReadLatencyAvgTime:0
fsReadLatencyMinTime:-1
fsReadLatencyMaxTime:0
blockCacheHitCachingRatio:0
blockCacheHitCount:0
hdfsBlocksLocalityIndex:100
mbInMemoryWithoutWAL:0
writeRequestsCount:0
slowHLogAppendTimeNumOps:0
slowHLogAppendTimeAvgTime:0
slowHLogAppendTimeMinTime:-1
slowHLogAppendTimeMaxTime:0
compactionTimeNumOps:0
compactionTimeAvgTime:0
compactionTimeMinTime:-1
compactionTimeMaxTime:0
fsWriteLatencyNumOps:0
fsWriteLatencyAvgTime:0
fsWriteLatencyMinTime:-1
fsWriteLatencyMaxTime:0
blockCacheSize:3436512
readRequestsCount:0
rootIndexSizeKB:0
blockCacheHitRatioPastNPeriods:0
fsPreadLatencyNumOps:0
fsPreadLatencyAvgTime:0
fsPreadLatencyMinTime:-1
fsPreadLatencyMaxTime:0
flushTimeNumOps:0
flushTimeAvgTime:0
flushTimeMinTime:-1
flushTimeMaxTime:0
checksumFailuresCount:0
blockCacheMissCount:0
blockCacheHitCachingRatioPastNPeriods:0
slowHLogAppendCount:0
fsWriteSizeNumOps:0
fsWriteSizeAvgTime:0
fsWriteSizeMinTime:-1
fsWriteSizeMaxTime:0
storefiles:1
regionSplitFailureCount:0
blockCacheEvictedCount:0
storefileIndexSizeMB:0
fsSyncLatencyNumOps:0
fsSyncLatencyAvgTime:0
fsSyncLatencyMinTime:-1
fsSyncLatencyMaxTime:0
stores:1
compactionQueueSize:0
flushSizeNumOps:0
flushSizeAvgTime:0
flushSizeMinTime:-1
flushSizeMaxTime:0
regionSplitSuccessCount:0
fsWriteLatencyHistogram_num_ops:0
fsWriteLatencyHistogram_min:0
fsWriteLatencyHistogram_max:0
fsWriteLatencyHistogram_mean:0.0
fsWriteLatencyHistogram_std_dev:0.0
fsWriteLatencyHistogram_median:0.0
fsWriteLatencyHistogram_75th_percentile:0.0
fsWriteLatencyHistogram_95th_percentile:0.0
fsWriteLatencyHistogram_99th_percentile:0.0
requests:0.0
fsPreadLatencyHistogram_num_ops:0
fsPreadLatencyHistogram_min:0
fsPreadLatencyHistogram_max:0
fsPreadLatencyHistogram_mean:0.0
fsPreadLatencyHistogram_std_dev:0.0
fsPreadLatencyHistogram_median:0.0
fsPreadLatencyHistogram_75th_percentile:0.0
fsPreadLatencyHistogram_95th_percentile:0.0
fsPreadLatencyHistogram_99th_percentile:0.0
fsReadLatencyHistogram_num_ops:0
fsReadLatencyHistogram_min:0
fsReadLatencyHistogram_max:0
fsReadLatencyHistogram_mean:0.0
fsReadLatencyHistogram_std_dev:0.0
fsReadLatencyHistogram_median:0.0
fsReadLatencyHistogram_75th_percentile:0.0
fsReadLatencyHistogram_95th_percentile:0.0
fsReadLatencyHistogram_99th_percentile:0.0
updatesBlockedSeconds_num_ops:31482
updatesBlockedSeconds_min:0
updatesBlockedSeconds_max:0
updatesBlockedSeconds_mean:0.0
updatesBlockedSeconds_std_dev:0.0
updatesBlockedSeconds_median:0.0
updatesBlockedSeconds_75th_percentile:0.0
updatesBlockedSeconds_95th_percentile:0.0
updatesBlockedSeconds_99th_percentile:0.0
updatesBlockedSecondsHighWater_num_ops:31482
updatesBlockedSecondsHighWater_min:0
updatesBlockedSecondsHighWater_max:0
updatesBlockedSecondsHighWater_mean:0.0
updatesBlockedSecondsHighWater_std_dev:0.0
updatesBlockedSecondsHighWater_median:0.0
updatesBlockedSecondsHighWater_75th_percentile:0.0
updatesBlockedSecondsHighWater_95th_percentile:0.0
updatesBlockedSecondsHighWater_99th_percentile:0.0

いっぱいあって何をどうモニタリングすればいいのかさっぱりわからんですが(汗、fsReadLatencyHistogram_95th_percentileのようにヒストグラムとかパーセンタイルが出てくるので統計勉強しないとダメですね。。。