In a busy environment your MapReduce tasks maybe often killed to release the cluster resources to run high priority applications.
By default, when the preemption is enabled (yarn.resourcemanager.scheduler.monitor.enable
is set to true
in yarn-site.xml
) Capacity Scheduler monitors resources every 3 seconds and kills selected containers if they do not gracefully terminate within 15 seconds after receiving a terminate request.
These default settings maybe too aggressive, and you can change them to allow MapReduce tasks to run longer before preemption. From the above example, you can see that the Map task was killed 3 times and 2 times it was killed just after about 30 seconds of execution.
You can edit yarn-site.xml
and set yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill
to a higher value (time in milliseconds, the default is 15000) to allow your MapReduce tasks to complete their work:
<property> <name>yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill</name> <value>300000</value> </property>