Hive 0.11で遭遇した問題をメモっておく

Hive 0.11.0へのupgrageを行うため開発環境でいろいろ試しているのだがすんなりとはいかなかった部分があるのでメモっておく。

問題が再現する最小のクエリを準備するのが面倒なのでそこは略してスタックトレースだけのっけときます。


HiveとHBaseを連携した状態でHBaseのデータをHiveから外部テーブルとして扱いかつjoinがあるようなクエリで下記のようなエラーが発生した。

org.apache.hadoop.hive.ql.parse.SemanticException: Big Table Alias is null
        at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinLocalWork(MapJoinProcessor.java:219)
        at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:244)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinResolver.java:303)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:552)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:743)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:112)
        at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8399)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8741)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:707)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
org.apache.hadoop.hive.ql.parse.SemanticException: Generate New MapJoin Opertor Exeception Big Table Alias is null
        at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:254)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinResolver.java:303)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:552)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:743)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
        at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
        at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:112)
        at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8399)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8741)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:707)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
FAILED: SemanticException Generate Map Join Task Error: Generate New MapJoin Opertor Exeception Big Table Alias is null

スタックトレースだけなら
https://issues.apache.org/jira/browse/HIVE-4342
と同じっぽいけど別にUNIONつかったクエリじゃなくて遭遇した。ていうかHive 0.11でなおってるはずだし。

Hiveのソース読むとhive.auto.convert.join.noconditionaltaskをfalseにすればエラーになった箇所を通らなそうなのでやってみたらうまくいった。


これもHBaseがらみでinsert〜selectしたら下記のエラーが出た。

java.lang.NegativeArraySizeException: -1
        at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
        at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
        at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

https://issues.apache.org/jira/browse/HIVE-4520
が該当のJIRAっぽい。

Hiveと連携するHBaseクライアントのバージョンが0.94.4だと発生したけどHBase 0.94.9だと発生しなかった。


以下はHBaseと関係無くてjoinのなかにサブクエリを書くと発生した。

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
        ... 8 more
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
        ... 15 more

スタックトレースだけなら
https://issues.apache.org/jira/browse/HIVE-4650
と同じ。

別にHortonWorksを使っているわけじゃないけど遭遇した。

JIRAのコメントを見るとhive.auto.convert.join = falseにすると大丈夫だったとあったので試したら確かに大丈夫だった。


どうもHive 0.11でmapjoinを自動的にやるようにオプションが変わっているっぽいのだが、そこでことごとく問題に遭遇している感じ。