hiveserver2とFull GC

昔からhiveserver2はあんまり安定してない気がしますが、最近ちょっと不調でたまに下記のようなエラーが出て接続できないときがありました。

hiveserver2 log

2017-11-10 01:18:38,430 ERROR [CuratorFramework-0]: curator.ConnectionState (ConnectionState.java:checkTimeouts(201)) - Connection timed out for connection string (...) and timeout (15000) / elapsed (27689)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:779)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:58)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:265)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
2017-11-10 01:20:20,112 INFO  [HiveServer2-Handler-Pool: Thread-7717854]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(316)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V1
2017-11-10 01:20:15,609 INFO  [main-SendThread(...)]: zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server ... Will not attempt to authenticate using SASL (unknown error)
2017-11-10 01:19:57,850 INFO  [org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@1b8aaeab]: common.JvmPauseMonitor (JvmPauseMonitor.java:run(193)) - Detected pause in JVM or host machine (eg GC): pause of approximately 4750ms
GC pool 'PS MarkSweep' had collection(s): count=3 time=16270ms
2017-11-10 01:19:47,440 WARN  [ResponseProcessor for block BP-1300916934-10.133.205.214-1498027028735:blk_1430709533_370841656]: hdfs.DFSClient (DFSOutputStream.java:run(952)) - DFSOutputStream ResponseProcessor exception  for block BP-1300916934-10.133.205.214-1498027028735:blk_1430709533_370841656
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2464)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:843)

https://community.hortonworks.com/questions/51735/hiveserver-2-zookeeper-discovery-connection-timeou.html
と同じっぽいんですが、GCに時間がかかりZookeeperとのsessionがtimeoutしたのかなと。

確かにこの時間帯にmap数が1万を超えるでかいジョブがあって、なんでそんなmap数なのかというとfluent-plugin-webhdfsで書き込むときにappend noにしていてそのせいでファイル数が多いからですね。

append trueにするとArrayIndexOutOfBoundsExceptionがでることがあってnoにしてます。
http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Cannot-get-a-file-on-HDFS-becouse-of-quot-java-lang/td-p/53169
https://issues.apache.org/jira/browse/HDFS-11797

困ったなあと思ってて、設定いじるのもいいけど、それより1回のバッチでどかんとやるよりこまめにバッチ回して1回で処理するファイル数を減らす方がいいだろうということでそうしました。
その結果Full GCも起きなくなってSTWもなくなりめでたしめでたし。

HDP 2.6だとデフォルトではgcログが取られないので、下記を追加して取るようにしました。

      if [ "$SERVICE" = "hiveserver2" ]; then
        export HADOOP_OPTS="$HADOOP_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/var/log/hive/gc-${SERVICE}.log-`date +'%Y%m%d%H%M'`"
      fi

https://community.hortonworks.com/content/supportkb/49404/how-to-setup-gc-log-for-hiveserver2.html

バッチ変える前は下記のようにFull GCが発生してました。2回起きてますね。

2017-11-16T00:52:22.671+0900: 47249.284: [GC (Allocation Failure) [PSYoungGen: 6918137K->72176K(6012928K)] 7981039K->2023320K(8730624K), 0.5683705 secs] [Times: user=3.29 sys=12.50, real=0.57 secs]
2017-11-16T00:52:32.438+0900: 47259.050: [GC (Allocation Failure) [PSYoungGen: 6012912K->524788K(6465536K)] 7964056K->2807209K(9183232K), 0.1574991 secs] [Times: user=3.64 sys=0.65, real=0.15 secs]
2017-11-16T00:52:32.595+0900: 47259.208: [Full GC (Ergonomics) [PSYoungGen: 524788K->0K(6465536K)] [ParOldGen: 2282420K->2185571K(5553664K)] 2807209K->2185571K(12019200K), [Metaspace: 102986K->101087K(1150976K)], 4.4533082 secs] [Times: user=112.93 sys=0.78, real=4.46 secs]
2017-11-16T00:52:47.234+0900: 47273.847: [GC (Allocation Failure) [PSYoungGen: 5940736K->524790K(5734400K)] 8126307K->2964402K(11288064K), 0.1517864 secs] [Times: user=3.36 sys=0.82, real=0.15 secs]
2017-11-16T00:52:55.600+0900: 47282.213: [GC (Allocation Failure) [PSYoungGen: 5734390K->890342K(6099968K)] 8174002K->3661994K(11653632K), 0.2500997 secs] [Times: user=3.94 sys=3.00, real=0.25 secs]
2017-11-16T00:53:03.919+0900: 47290.532: [GC (Allocation Failure) [PSYoungGen: 6099942K->890364K(4796416K)] 8871594K->4367760K(10350080K), 0.3153721 secs] [Times: user=6.27 sys=2.49, real=0.31 secs]
2017-11-16T00:53:10.345+0900: 47296.958: [GC (Allocation Failure) [PSYoungGen: 4796412K->1405980K(5312512K)] 8273808K->4883376K(10866176K), 0.1721397 secs] [Times: user=4.61 sys=0.15, real=0.18 secs]
2017-11-16T00:53:16.567+0900: 47303.180: [GC (Allocation Failure) [PSYoungGen: 5312028K->1677820K(4380672K)] 8789424K->5401912K(9934336K), 0.2496912 secs] [Times: user=6.51 sys=0.42, real=0.25 secs]
2017-11-16T00:53:21.011+0900: 47307.623: [GC (Allocation Failure) [PSYoungGen: 4380668K->2038332K(4741632K)] 8104760K->5762424K(10295296K), 0.2283467 secs] [Times: user=6.27 sys=0.06, real=0.23 secs]
2017-11-16T00:57:50.867+0900: 47577.479: [GC (Allocation Failure) [PSYoungGen: 4741180K->2134990K(4578816K)] 8465272K->5907156K(10132480K), 0.9619241 secs] [Times: user=4.24 sys=22.59, real=0.96 secs]
2017-11-16T00:59:40.294+0900: 47686.906: [GC (Allocation Failure) [PSYoungGen: 4465102K->1578231K(4660224K)] 8237268K->5951856K(10213888K), 0.7729100 secs] [Times: user=6.00 sys=15.55, real=0.78 secs]
2017-11-16T00:59:47.430+0900: 47694.043: [GC (Allocation Failure) [PSYoungGen: 3908343K->651882K(4660224K)] 8281968K->6069434K(10213888K), 0.3658849 secs] [Times: user=7.25 sys=2.90, real=0.36 secs]
2017-11-16T00:59:47.796+0900: 47694.409: [Full GC (Ergonomics) [PSYoungGen: 651882K->441154K(4660224K)] [ParOldGen: 5417552K->5553660K(9667584K)] 6069434K->5994814K(14327808K), [Metaspace: 101091K->100992K(1150976K)], 11.4661588 secs] [Times: user=297.51 sys=2.60, real=11.47 secs]
2017-11-16T01:00:04.212+0900: 47710.825: [GC (Allocation Failure) [PSYoungGen: 2771266K->109925K(4660224K)] 8324926K->6103217K(14327808K), 0.2945256 secs] [Times: user=7.64 sys=0.53, real=0.30 secs]
2017-11-16T01:00:09.922+0900: 47716.534: [GC (Allocation Failure) [PSYoungGen: 2440037K->205643K(4660224K)] 8433329K->6206800K(14327808K), 0.1023913 secs] [Times: user=2.02 sys=0.72, real=0.10 secs]

GCViewerだとこんな感じ。日付も出してくれると嬉しい。。

Ambari Metricsだとこんな感じ。上がヒープ使用量で下がGCによる停止時間です。


Heap Usageの黄色のグラフがMemHeapUsedなんですけど、これってallocateじゃないかなあ。
あと粒度が5分おきでかつ停止時間の数値が累計なのでちょっとわかりにくい。