HBase 0.94になってメトリクスがいっぱい追加されたらしい
該当のJIRAは下記です。
https://issues.apache.org/jira/browse/HBASE-5533
JMXを使ってHBaseをモニタリングするということは一般的に行われていると思います。
馬本でも10.4 JMXのところに記述があります。hbase-env.shを修正してjmxのportを開けてそこにアクセスして各種メトリクスを取得します。
メトリクスを取得するツールとしてJMXToolkitというものが紹介されています。
ソースを取得してビルドしてヘルプを表示させるとこんな感じ。
$ git clone https://github.com/larsgeorge/jmxtoolkit.git $ cd jmxtoolkit/ $ ant $ java -jar build/hbase-0.20.4-dev-jmxtoolkit.jar -h Usage: JMXToolkit [-a <action>] [-c <user>] [-p <password>] [-u url] [-f <config>] [-o <object>] [-e regexp] [-i <extends>] [-q <attr-oper>] [-w <check>] [-m <message>] [-x] [-l] [-v] [-h] -a <action> Action to perform, can be one of the following (default: query) create Scan a JMX object for available attributes query Query a set of attributes from the given objects check Checks a given value to be in a valid range (see -w below) encode Helps creating the encoded messages (see -m and -w below) walk Walk the entire remote object list -c <user> The user role to authenticate with (default: controlRole) -p <password> The password to authenticate with (default: password) -u <url> The JMX URL (default: service:jmx:rmi:///jndi/rmi://localhost:10001/jmxrmi) -f <config> The config file to use (default: none) -o <object> The JMX object query (default: none) -e <regexp> The regular expression to match (default: none) -i <extends> The name of the object that is inherited from (default: none) -q <attr-oper> The attribute or operation to query (default: none) -w <check> Used with -a check to define thresholds (default: none) Format: <ok-exitcode>[:<ok-message>] \ |<warn-exitcode>:[<warn-message>]:<warn-value>[:<warn-comparator>] \ [|<error-exitcode>:[<error-message>]:<error-value>[:<error-comparator>]] Example for Nagios and DFS used (in %): 0:OK%3A%20%7B0%7D|2:WARN%3A%20%7B0%7D:80:>=|1:FAIL%3A%20%7B0%7D:95:> Notes: Messages are URL-encoded to allow for any character being used. The current value can be placed with {0} in the message. Allowed comparators: <,<=,=,==,>=,> -m <message> The message to encode for further use (default: none) -x Output config to console (do not write back to -f <config>) -l Ignore missing attributes, do not throw an error -v Verbose output -h Prints this help
jmxtoolkitを使ってHBaseのリージョンサーバーのメトリクスを取得するとこんな感じになります。
$ java -jar hbase-0.20.4-dev-jmxtoolkit.jar -u "service:jmx:rmi:///jndi/rmi://ホスト名:ポート番号/jmxrmi" -o "hadoop:name=RegionServerStatistics,service=RegionServer" | tr ' ' '\n' totalStaticBloomSizeKB:0 totalStaticIndexSizeKB:0 blockCacheFree:415338528 compactionSizeNumOps:0 compactionSizeAvgTime:0 compactionSizeMinTime:-1 compactionSizeMaxTime:0 numPutsWithoutWAL:0 memstoreSizeMB:0 regions:1 blockCacheCount:0 blockCacheHitRatio:0 flushQueueSize:0 atomicIncrementTimeNumOps:0 atomicIncrementTimeAvgTime:0 atomicIncrementTimeMinTime:-1 atomicIncrementTimeMaxTime:0 fsReadLatencyNumOps:0 fsReadLatencyAvgTime:0 fsReadLatencyMinTime:-1 fsReadLatencyMaxTime:0 blockCacheHitCachingRatio:0 blockCacheHitCount:0 hdfsBlocksLocalityIndex:100 mbInMemoryWithoutWAL:0 writeRequestsCount:0 slowHLogAppendTimeNumOps:0 slowHLogAppendTimeAvgTime:0 slowHLogAppendTimeMinTime:-1 slowHLogAppendTimeMaxTime:0 compactionTimeNumOps:0 compactionTimeAvgTime:0 compactionTimeMinTime:-1 compactionTimeMaxTime:0 fsWriteLatencyNumOps:0 fsWriteLatencyAvgTime:0 fsWriteLatencyMinTime:-1 fsWriteLatencyMaxTime:0 blockCacheSize:3436512 readRequestsCount:0 rootIndexSizeKB:0 blockCacheHitRatioPastNPeriods:0 fsPreadLatencyNumOps:0 fsPreadLatencyAvgTime:0 fsPreadLatencyMinTime:-1 fsPreadLatencyMaxTime:0 flushTimeNumOps:0 flushTimeAvgTime:0 flushTimeMinTime:-1 flushTimeMaxTime:0 checksumFailuresCount:0 blockCacheMissCount:0 blockCacheHitCachingRatioPastNPeriods:0 slowHLogAppendCount:0 fsWriteSizeNumOps:0 fsWriteSizeAvgTime:0 fsWriteSizeMinTime:-1 fsWriteSizeMaxTime:0 storefiles:1 regionSplitFailureCount:0 blockCacheEvictedCount:0 storefileIndexSizeMB:0 fsSyncLatencyNumOps:0 fsSyncLatencyAvgTime:0 fsSyncLatencyMinTime:-1 fsSyncLatencyMaxTime:0 stores:1 compactionQueueSize:0 flushSizeNumOps:0 flushSizeAvgTime:0 flushSizeMinTime:-1 flushSizeMaxTime:0 regionSplitSuccessCount:0 fsWriteLatencyHistogram_num_ops:0 fsWriteLatencyHistogram_min:0 fsWriteLatencyHistogram_max:0 fsWriteLatencyHistogram_mean:0.0 fsWriteLatencyHistogram_std_dev:0.0 fsWriteLatencyHistogram_median:0.0 fsWriteLatencyHistogram_75th_percentile:0.0 fsWriteLatencyHistogram_95th_percentile:0.0 fsWriteLatencyHistogram_99th_percentile:0.0 requests:0.0 fsPreadLatencyHistogram_num_ops:0 fsPreadLatencyHistogram_min:0 fsPreadLatencyHistogram_max:0 fsPreadLatencyHistogram_mean:0.0 fsPreadLatencyHistogram_std_dev:0.0 fsPreadLatencyHistogram_median:0.0 fsPreadLatencyHistogram_75th_percentile:0.0 fsPreadLatencyHistogram_95th_percentile:0.0 fsPreadLatencyHistogram_99th_percentile:0.0 fsReadLatencyHistogram_num_ops:0 fsReadLatencyHistogram_min:0 fsReadLatencyHistogram_max:0 fsReadLatencyHistogram_mean:0.0 fsReadLatencyHistogram_std_dev:0.0 fsReadLatencyHistogram_median:0.0 fsReadLatencyHistogram_75th_percentile:0.0 fsReadLatencyHistogram_95th_percentile:0.0 fsReadLatencyHistogram_99th_percentile:0.0 updatesBlockedSeconds_num_ops:31482 updatesBlockedSeconds_min:0 updatesBlockedSeconds_max:0 updatesBlockedSeconds_mean:0.0 updatesBlockedSeconds_std_dev:0.0 updatesBlockedSeconds_median:0.0 updatesBlockedSeconds_75th_percentile:0.0 updatesBlockedSeconds_95th_percentile:0.0 updatesBlockedSeconds_99th_percentile:0.0 updatesBlockedSecondsHighWater_num_ops:31482 updatesBlockedSecondsHighWater_min:0 updatesBlockedSecondsHighWater_max:0 updatesBlockedSecondsHighWater_mean:0.0 updatesBlockedSecondsHighWater_std_dev:0.0 updatesBlockedSecondsHighWater_median:0.0 updatesBlockedSecondsHighWater_75th_percentile:0.0 updatesBlockedSecondsHighWater_95th_percentile:0.0 updatesBlockedSecondsHighWater_99th_percentile:0.0
いっぱいあって何をどうモニタリングすればいいのかさっぱりわからんですが(汗、fsReadLatencyHistogram_95th_percentileのようにヒストグラムとかパーセンタイルが出てくるので統計勉強しないとダメですね。。。