Развертывание в YARN с помощью Spring Cloud Dataflow

При развертывании потока в удаленном кластере YARN я получаю следующую ошибку в пользовательском интерфейсе YARN:

Diagnostics: File file:///dataflow/apps/stream/app/application.properties does not exist

Этот файл существует на стороне сервера Dataflow и содержит следующие данные:

#Thu Dec 01 10:32:39 CET 2016
spring.yarn.applicationVersion=app
spring.cloud.deployer.yarn.version=1.0.2.RELEASE

spring.hadoop.resourceManagerHost=hmaprb.my-domain.com

Насколько я понимаю, эта ошибка возникает из-за развернутого контейнера, который также пытается получить доступ к файлу конфигурации. Я не могу понять, когда этот файл конфигурации нужно было скопировать в YARN?

Это может быть очевидно, но это очень сложно отладить, не зная об этом. Кроме того, вот журналы YARN, если это поможет:

2016-12-06 12:20:44,439 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 148106
2016-12-06 12:20:44,539 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 148106 submitted by user tcozien
2016-12-06 12:20:44,539 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1478697416091_148106
2016-12-06 12:20:44,539 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1478697416091_148106 State change from NEW to NEW_SAVING
2016-12-06 12:20:44,539 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1478697416091_148106
2016-12-06 12:20:44,539 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=tcozien  IP=10.191.40.250    OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1478697416091_148106
2016-12-06 12:20:44,593 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Storing info for app: application_1478697416091_148106 at: /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1478697416091_148106/application_1478697416091_148106
2016-12-06 12:20:44,683 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1478697416091_148106 State change from NEW_SAVING to SUBMITTED
2016-12-06 12:20:44,716 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1478697416091_148106 from user: tcozien, in queue: default, currently num of applications: 5
2016-12-06 12:20:44,717 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1478697416091_148106 State change from SUBMITTED to ACCEPTED
2016-12-06 12:20:44,717 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1478697416091_148106_000001
2016-12-06 12:20:44,717 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from NEW to SUBMITTED
2016-12-06 12:20:44,717 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1478697416091_148106_000001 to scheduler from user: tcozien
2016-12-06 12:20:44,717 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from SUBMITTED to SCHEDULED
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e49_1478697416091_148106_01_000001 Container Transitioned from NEW to ALLOCATED
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=tcozien  OPERATION=AM Allocated Container    TARGET=SchedulerApp RESULT=SUCCESS  APPID=application_1478697416091_148106  CONTAINERID=container_e49_1478697416091_148106_01_000001
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e49_1478697416091_148106_01_000001 of capacity <memory:2048, vCores:1, disks:0.0> on host hmaprb.my-domain.com:41610, which has 25 containers, <memory:51200, vCores:25, disks:12.0> used and <memory:71680, vCores:5, disks:3.0> available after allocation
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : hmaprb.my-domain.com:41610 for container : container_e49_1478697416091_148106_01_000001
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e49_1478697416091_148106_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1478697416091_148106_000001
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1478697416091_148106 AttemptId: appattempt_1478697416091_148106_000001 MasterContainer: Container: [ContainerId: container_e49_1478697416091_148106_01_000001, NodeId: hmaprb.my-domain.com:41610, NodeHttpAddress: hmaprb.my-domain.com:8042, Resource: <memory:2048, vCores:1, disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.11.129.57:41610 }, ]
2016-12-06 12:20:45,349 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from SCHEDULED to ALLOCATED_SAVING
2016-12-06 12:20:45,350 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Storing info for attempt: appattempt_1478697416091_148106_000001 at: /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1478697416091_148106/appattempt_1478697416091_148106_000001
2016-12-06 12:20:45,464 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from ALLOCATED_SAVING to ALLOCATED
2016-12-06 12:20:45,464 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1478697416091_148106_000001
2016-12-06 12:20:45,465 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_e49_1478697416091_148106_01_000001, NodeId: hmaprb.my-domain.com:41610, NodeHttpAddress: hmaprb.my-domain.com:8042, Resource: <memory:2048, vCores:1, disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.11.129.57:41610 }, ] for AM appattempt_1478697416091_148106_000001
2016-12-06 12:20:45,465 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container_e49_1478697416091_148106_01_000001 : $JAVA_HOME/bin/java,,-Dspring.config.location=servers.yml,-jar,spring-cloud-deployer-yarn-appdeployerappmaster-@[email protected],--spring.cloud.deployer.yarn.appmaster.artifact=/dataflow//artifacts/cache/,1><LOG_DIR>/Appmaster.stdout,2><LOG_DIR>/Appmaster.stderr
2016-12-06 12:20:45,465 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1478697416091_148106_000001
2016-12-06 12:20:45,465 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1478697416091_148106_000001
2016-12-06 12:20:45,484 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_e49_1478697416091_148106_01_000001, NodeId: hmaprb.my-domain.com:41610, NodeHttpAddress: hmaprb.my-domain.com:8042, Resource: <memory:2048, vCores:1, disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.11.129.57:41610 }, ] for AM appattempt_1478697416091_148106_000001
2016-12-06 12:20:45,484 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from ALLOCATED to LAUNCHED
2016-12-06 12:20:46,347 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e49_1478697416091_148106_01_000001 Container Transitioned from ACQUIRED to RUNNING
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e49_1478697416091_148106_01_000001 Container Transitioned from RUNNING to COMPLETED
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_e49_1478697416091_148106_01_000001 in state: COMPLETED event:FINISHED
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=tcozien  OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  APPID=application_1478697416091_148106  CONTAINERID=container_e49_1478697416091_148106_01_000001
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_e49_1478697416091_148106_01_000001 of capacity <memory:2048, vCores:1, disks:0.0> on host hmaprb.my-domain.com:41610, which currently has 29 containers, <memory:59392, vCores:29, disks:14.5> used and <memory:63488, vCores:1, disks:0.5> available, release resources=true
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1478697416091_148106_000001 released container container_e49_1478697416091_148106_01_000001 on node: host: hmaprb.my-domain.com:41610 #containers=29 available=<memory:63488, vCores:1, disks:0.5> used=<memory:59392, vCores:29, disks:14.5> with event: FINISHED
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1478697416091_148106_000001 with final state: FAILED, and exit status: -1000
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from LAUNCHED to FINAL_SAVING
2016-12-06 12:20:51,547 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1478697416091_148106_000001 at: /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1478697416091_148106/appattempt_1478697416091_148106_000001
2016-12-06 12:20:51,741 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1478697416091_148106_000001
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1478697416091_148106_000001
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478697416091_148106_000001 State change from FINAL_SAVING to FAILED
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1478697416091_148106 with final state: FAILED
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1478697416091_148106 State change from ACCEPTED to FINAL_SAVING
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1478697416091_148106
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1478697416091_148106_000001 is done. finalState=FAILED
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1478697416091_148106 requests cleared
2016-12-06 12:20:51,742 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for app: application_1478697416091_148106 at: /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1478697416091_148106/application_1478697416091_148106
2016-12-06 12:20:51,907 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1478697416091_148106 failed 1 times due to AM Container for appattempt_1478697416091_148106_000001 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hmaprb.my-domain.com:8088/cluster/app/application_1478697416091_148106Then, click on links to logs of each attempt.
2016-12-06 12:20:51,907 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1478697416091_148106 State change from FINAL_SAVING to FAILED
2016-12-06 12:20:51,907 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=tcozien  OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED   PERMISSIONS=Application application_1478697416091_148106 failed 1 times due to AM Container for appattempt_1478697416091_148106_000001 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hmaprb.my-domain.com:8088/cluster/app/application_1478697416091_148106Then, click on links to logs of each attempt.
Failing this attempt. Failing the application.  APPID=application_1478697416091_148106
2016-12-06 12:20:51,907 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1478697416091_148106,name=scdstream:app:offer,user=tcozien,queue=root.tcozien,state=FAILED,trackingUrl=http://hmaprb.my-domain.com:8088/cluster/app/application_1478697416091_148106,appMasterHost=N/A,startTime=1481026844538,finishTime=1481026851742,finalStatus=FAILED,memorySeconds=12701,vcoreSeconds=6,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0\, disks:0.0>,applicationType=DATAFLOW

person Alexandre FILLATRE    schedule 06.12.2016    source источник


Ответы (1)


Я бы проверил, что такое hdfs fsUri в servers.yml, поскольку file:///dataflow/apps/stream/app/application.properties неверно, потому что он должен искать это с hdfs. В hadoop fs по умолчанию используется локальный fs, поэтому я думаю, что оттуда исходит ошибка.

person Janne Valkealahti    schedule 08.12.2016