add docs for task manager node decider

author: chris meyers <chris.meyers.fsu@gmail.com> 2018-12-17 21:38:39 +0100
committer: chris meyers <chris.meyers.fsu@gmail.com> 2019-01-02 18:17:28 +0100
commit: db2bb19d659329d77322fd9d4e42e46c8fa3c960 (patch)
tree: eb56e122f9c7f19cc6bf4e70552b1411d61e0da9
parent: Merge pull request #2917 from AlanCoding/relaunch_sjt (diff)
download: awx-db2bb19d659329d77322fd9d4e42e46c8fa3c960.tar.xz
awx-db2bb19d659329d77322fd9d4e42e46c8fa3c960.zip
2 files changed, 3 insertions, 1 deletions
diff --git a/docs/capacity.md b/docs/capacity.md
index ceee3c42c3..26372790e6 100644
--- a/docs/capacity.md
+++ b/docs/capacity.md
@@ -13,7 +13,7 @@ other Groups.
 
 Instance Groups (not Instances themselves) can be assigned to be used by Jobs at various levels (see clustering.md).
 When the Task Manager is preparing its graph to determine which Group a Job will run on it will commit the capacity of
-an Instance Group to a job that hasn't or isn't ready to start yet.
+an Instance Group to a job that hasn't or isn't ready to start yet. (see task_manager_system.md)
 
 Finally, if only one Instance is available, in smaller configurations, for a Job to run the Task Manager will allow that
 Job to run on the Instance even if it would push the Instance over capacity. We do this as a way to guarantee that Jobs
diff --git a/docs/task_manager_system.md b/docs/task_manager_system.md
index 697474b02d..5f9cd9a09d 100644
--- a/docs/task_manager_system.md
+++ b/docs/task_manager_system.md
@@ -28,6 +28,8 @@ The `schedule()` function is ran (a) periodically by a background task and (b) o
 | successful | Job finished with ansible-playbook return code 0.                                                                  |
 | failed     | Job finished with ansible-playbook return code other than 0.                                                       |
 | error      | System failure.                                                                                                    |
+### Node Affinity Decider
+The Task Manager decides what exact node a job will run on. It does so by considering user-configured (1) group execution policy and (2) capacity. First, the set of groups on which a job _can_ run on is constructed (see clustering.md). The groups are traversed until a node within that group is found. The node with the largest remaining capacity that is idle is chosen first. If there are no idle nodes, then the node with the largest remaining capacity >= the job capacity requirements is chosen.
 
 ## Code Composition
 The main goal of the new task manager is to run in our HA environment. This translates to making the task manager logic run on any tower node. To support this we need to remove any reliance on state between task manager schedule logic runs. We had a secondary goal in mind of designing the task manager to have limited/no access to the database for the future federation feature. This secondary requirement combined with performance needs led us to create partial models that wrap dict database model data.
author	chris meyers <chris.meyers.fsu@gmail.com>	2018-12-17 21:38:39 +0100
committer	chris meyers <chris.meyers.fsu@gmail.com>	2019-01-02 18:17:28 +0100
commit	db2bb19d659329d77322fd9d4e42e46c8fa3c960 (patch)
tree	eb56e122f9c7f19cc6bf4e70552b1411d61e0da9
parent	Merge pull request #2917 from AlanCoding/relaunch_sjt (diff)
download	awx-db2bb19d659329d77322fd9d4e42e46c8fa3c960.tar.xz awx-db2bb19d659329d77322fd9d4e42e46c8fa3c960.zip