Отложенное выполнение запросов Explain в кластерах Prestosql

У меня есть два типа кластеров prestosql: на экземплярах aws и на Kubernetes. У Prestosql на K8s есть странная проблема с запросами EXPLAIN, поскольку это занимает много времени ~ 2-3 минуты по сравнению с 2-3 секундами в экземпляре.

Запрос остается на WAITING_FOR_RESOURCES около 2 минут, а затем выполняется очень быстро. Также есть исключение в журналах сервера

2020-12-23T05:25:01.930Z    ERROR   Query-20201223_052431_00004_pxqak-276   io.prestosql.cost.CachingStatsProvider  Error occurred when computing stats for query 20201223_052431_00004_pxqak
io.prestosql.spi.PrestoException: HIVE_METASTORE_ERROR
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getMetastorePartitionColumnStatistics(ThriftHiveMetastore.java:461)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getPartitionColumnStatistics(ThriftHiveMetastore.java:438)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getPartitionStatistics(ThriftHiveMetastore.java:389)
    at io.prestosql.plugin.hive.metastore.thrift.BridgingHiveMetastore.getPartitionStatistics(BridgingHiveMetastore.java:110)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.lambda$loadPartitionColumnStatistics$6(CachingHiveMetastore.java:360)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.loadPartitionColumnStatistics(CachingHiveMetastore.java:353)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.access$100(CachingHiveMetastore.java:89)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore$1.loadAll(CachingHiveMetastore.java:179)
    at com.google.common.cache.CacheLoader$1.loadAll(CacheLoader.java:207)
    at io.prestosql.cost.JoinStatsRule.doCalculate(JoinStatsRule.java:81)
    at io.prestosql.cost.JoinStatsRule.doCalculate(JoinStatsRule.java:48)
    at io.prestosql.cost.SimpleStatsRule.calculate(SimpleStatsRule.java:39)
    at io.prestosql.cost.ComposableStatsCalculator.calculateStats(ComposableStatsCalculator.java:82)
    at io.prestosql.cost.ComposableStatsCalculator.calculateStats(ComposableStatsCalculator.java:70)
    at io.prestosql.cost.CachingStatsProvider.getGroupStats(CachingStatsProvider.java:103)
    at io.prestosql.cost.CachingStatsProvider.getStats(CachingStatsProvider.java:72)
    at io.prestosql.cost.JoinStatsRule.doCalculate(JoinStatsRule.java:81)
    at io.prestosql.cost.JoinStatsRule.doCalculate(JoinStatsRule.java:48)
    at io.prestosql.cost.SimpleStatsRule.calculate(SimpleStatsRule.java:39)
    at io.prestosql.cost.ComposableStatsCalculator.calculateStats(ComposableStatsCalculator.java:82)
    at io.prestosql.cost.ComposableStatsCalculator.calculateStats(ComposableStatsCalculator.java:70)
    at io.prestosql.cost.CachingStatsProvider.getGroupStats(CachingStatsProvider.java:103)
    at io.prestosql.cost.CachingStatsProvider.getStats(CachingStatsProvider.java:72)
    at io.prestosql.cost.CostCalculatorWithEstimatedExchanges.calculateJoinExchangeCost(CostCalculatorWithEstimatedExchanges.java:233)
    at io.prestosql.cost.CostCalculatorWithEstimatedExchanges.calculateJoinCostWithoutOutput(CostCalculatorWithEstimatedExchanges.java:208)
    at io.prestosql.sql.planner.iterative.rule.DetermineJoinDistributionType.getJoinNodeWithCost(DetermineJoinDistributionType.java:180)
    at io.prestosql.sql.planner.iterative.rule.DetermineJoinDistributionType.addJoinsWithDifferentDistributions(DetermineJoinDistributionType.java:116)
    at io.prestosql.sql.planner.iterative.rule.DetermineJoinDistributionType.getCostBasedJoin(DetermineJoinDistributionType.java:98)
    at io.prestosql.sql.planner.iterative.rule.DetermineJoinDistributionType.apply(DetermineJoinDistributionType.java:74)
    at io.prestosql.sql.planner.iterative.rule.DetermineJoinDistributionType.apply(DetermineJoinDistributionType.java:49)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.transform(IterativeOptimizer.java:165)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.exploreNode(IterativeOptimizer.java:140)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:105)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:190)
    at com.google.common.cache.LocalCache.loadAll(LocalCache.java:4058)
    at com.google.common.cache.LocalCache.getAll(LocalCache.java:4021)
    at com.google.common.cache.LocalCache$LocalLoadingCache.getAll(LocalCache.java:4972)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.getAll(CachingHiveMetastore.java:255)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.getPartitionStatistics(CachingHiveMetastore.java:330)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.lambda$loadPartitionColumnStatistics$6(CachingHiveMetastore.java:360)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.loadPartitionColumnStatistics(CachingHiveMetastore.java:353)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.access$100(CachingHiveMetastore.java:89)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore$1.loadAll(CachingHiveMetastore.java:179)
    at com.google.common.cache.CacheLoader$1.loadAll(CacheLoader.java:207)
    at com.google.common.cache.LocalCache.loadAll(LocalCache.java:4058)
    at com.google.common.cache.LocalCache.getAll(LocalCache.java:4021)
    at com.google.common.cache.LocalCache$LocalLoadingCache.getAll(LocalCache.java:4972)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.getAll(CachingHiveMetastore.java:255)
    at io.prestosql.plugin.hive.metastore.cache.CachingHiveMetastore.getPartitionStatistics(CachingHiveMetastore.java:330)
    at io.prestosql.plugin.hive.HiveMetastoreClosure.getPartitionStatistics(HiveMetastoreClosure.java:88)
    at io.prestosql.plugin.hive.metastore.SemiTransactionalHiveMetastore.getPartitionStatistics(SemiTransactionalHiveMetastore.java:256)
    at io.prestosql.plugin.hive.statistics.MetastoreHiveStatisticsProvider.getPartitionsStatistics(MetastoreHiveStatisticsProvider.java:126)
    at io.prestosql.plugin.hive.statistics.MetastoreHiveStatisticsProvider.lambda$new$0(MetastoreHiveStatisticsProvider.java:104)
    at io.prestosql.plugin.hive.statistics.MetastoreHiveStatisticsProvider.getTableStatistics(MetastoreHiveStatisticsProvider.java:146)
    at io.prestosql.plugin.hive.HiveMetadata.getTableStatistics(HiveMetadata.java:695)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:107)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:190)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:107)
    at io.prestosql.sql.planner.iterative.IterativeOptimizer.optimize(IterativeOptimizer.java:96)
    at io.prestosql.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:196)
    at io.prestosql.sql.analyzer.QueryExplainer.getLogicalPlan(QueryExplainer.java:182)
    at io.prestosql.sql.analyzer.QueryExplainer.getPlan(QueryExplainer.java:121)
    at io.prestosql.sql.rewrite.ExplainRewrite$Visitor.getQueryPlan(ExplainRewrite.java:137)
    at io.prestosql.sql.rewrite.ExplainRewrite$Visitor.visitExplain(ExplainRewrite.java:115)
    at io.prestosql.sql.rewrite.ExplainRewrite$Visitor.visitExplain(ExplainRewrite.java:65)
    at io.prestosql.sql.tree.Explain.accept(Explain.java:80)
    at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.prestosql.sql.rewrite.ExplainRewrite.rewrite(ExplainRewrite.java:62)
    at io.prestosql.sql.rewrite.StatementRewrite.rewrite(StatementRewrite.java:57)
    at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:80)
    at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:75)
    at io.prestosql.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:221)
    at io.prestosql.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:180)
    at io.prestosql.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:97)
    at io.prestosql.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:732)
    at io.prestosql.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:119)
    at io.prestosql.$gen.Presto_330____20201223_050837_2.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: MetaException(message:null)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_statistics_req_result$get_partitions_statistics_req_resultStandardScheme.read(ThriftHiveMetastore.java)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_statistics_req_result$get_partitions_statistics_req_resultStandardScheme.read(ThriftHiveMetastore.java)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_statistics_req_result.read(ThriftHiveMetastore.java)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_statistics_req(ThriftHiveMetastore.java:4013)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_statistics_req(ThriftHiveMetastore.java:4000)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient.getPartitionColumnStatistics(ThriftHiveMetastoreClient.java:227)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.lambda$getPartitionColumnStatistics$16(FailureAwareThriftMetastoreClient.java:191)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.runWithHandle(FailureAwareThriftMetastoreClient.java:394)
    at io.prestosql.plugin.hive.metastore.thrift.FailureAwareThriftMetastoreClient.getPartitionColumnStatistics(FailureAwareThriftMetastoreClient.java:191)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$getMetastorePartitionColumnStatistics$15(ThriftHiveMetastore.java:453)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:42)
    at io.prestosql.plugin.hive.util.RetryDriver.run(RetryDriver.java:130)
    at io.prestosql.plugin.hive.metastore.thrift.ThriftHiveMetastore.getMetastorePartitionColumnStatistics(ThriftHiveMetastore.java:451)
    ... 156 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more
    Suppressed: MetaException(message:null)
        ... 170 more

Я пробовал изменить значения hive.metastore.partition-batch-size.max и hive.metastore-cache-ttl


person brickman    schedule 23.12.2020    source источник


Ответы (1)


Похоже, что при медленном развертывании вызов метастора get_partitions_statistics_req по какой-то причине завершается ошибкой и повторяется. Повторные попытки, вероятно, потребуют все время ожидания. Поскольку Presto по умолчанию игнорирует подобные сбои при вычислении статистики, запрос в конечном итоге работает.

Ошибка происходит на стороне Hive, поэтому вам необходимо проверить журналы хранилища метаданных, чтобы понять причину сбоя, поскольку она не распространяется на стороне Presto.

На стороне Presto вы все еще можете применить некоторые изменения конфигурации в качестве временного решения:

  • отключить статистику для коннектора Hive с помощью свойства конфигурации hive.table-statistics-enabled
  • сократить время, затрачиваемое на повторные попытки вызова хранилища метаданных, с помощью свойства конфигурации hive.metastore.thrift.client.max-retry-time
  • сделать ваши запросы громкими с помощью глобального свойства конфигурации optimizer.ignore-stats-calculator-failures=false (маловероятно, что вы хотите)
person Piotr Findeisen    schedule 26.12.2020