diff options
author | chris meyers <chris.meyers.fsu@gmail.com> | 2018-12-17 21:38:39 +0100 |
---|---|---|
committer | chris meyers <chris.meyers.fsu@gmail.com> | 2019-01-02 18:17:28 +0100 |
commit | db2bb19d659329d77322fd9d4e42e46c8fa3c960 (patch) | |
tree | eb56e122f9c7f19cc6bf4e70552b1411d61e0da9 | |
parent | Merge pull request #2917 from AlanCoding/relaunch_sjt (diff) | |
download | awx-db2bb19d659329d77322fd9d4e42e46c8fa3c960.tar.xz awx-db2bb19d659329d77322fd9d4e42e46c8fa3c960.zip |
add docs for task manager node decider
-rw-r--r-- | docs/capacity.md | 2 | ||||
-rw-r--r-- | docs/task_manager_system.md | 2 |
2 files changed, 3 insertions, 1 deletions
diff --git a/docs/capacity.md b/docs/capacity.md index ceee3c42c3..26372790e6 100644 --- a/docs/capacity.md +++ b/docs/capacity.md @@ -13,7 +13,7 @@ other Groups. Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see clustering.md). When the Task Manager is preparing its graph to determine which Group a Job will run on it will commit the capacity of -an Instance Group to a job that hasn't or isn't ready to start yet. +an Instance Group to a job that hasn't or isn't ready to start yet. (see task_manager_system.md) Finally, if only one Instance is available, in smaller configurations, for a Job to run the Task Manager will allow that Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that Jobs diff --git a/docs/task_manager_system.md b/docs/task_manager_system.md index 697474b02d..5f9cd9a09d 100644 --- a/docs/task_manager_system.md +++ b/docs/task_manager_system.md @@ -28,6 +28,8 @@ The `schedule()` function is ran (a) periodically by a background task and (b) o | successful | Job finished with ansible-playbook return code 0. | | failed | Job finished with ansible-playbook return code other than 0. | | error | System failure. | +### Node Affinity Decider +The Task Manager decides what exact node a job will run on. It does so by considering user-configured (1) group execution policy and (2) capacity. First, the set of groups on which a job _can_ run on is constructed (see clustering.md). The groups are traversed until a node within that group is found. The node with the largest remaining capacity that is idle is chosen first. If there are no idle nodes, then the node with the largest remaining capacity >= the job capacity requirements is chosen. ## Code Composition The main goal of the new task manager is to run in our HA environment. This translates to making the task manager logic run on any tower node. To support this we need to remove any reliance on state between task manager schedule logic runs. We had a secondary goal in mind of designing the task manager to have limited/no access to the database for the future federation feature. This secondary requirement combined with performance needs led us to create partial models that wrap dict database model data. |