From 517933318586943a11d376848f21fdda9580770e Mon Sep 17 00:00:00 2001 From: TVo Date: Thu, 15 Feb 2024 07:19:52 -0700 Subject: Added mesh ingress content to instances chapter. (#14854) * Added mesh ingress content to instances chapter. * Changes to incorp feedback from @TheRealHaoLiu. * Added mesh ingress content to instances chapter. * Line 75 Co-authored-by: Seth Foster * Line 117 Co-authored-by: Seth Foster * Line 126 Co-authored-by: Seth Foster * Wording changes and update graphics --------- Co-authored-by: Seth Foster --- docs/docsite/rst/administration/instances.rst | 208 ++++++++++++++------- docs/docsite/rst/common/images/download-icon.png | Bin 0 -> 3735 bytes .../images/instances-execution-node-download.png | Bin 0 -> 104216 bytes ...nces-job-template-using-remote-execution-ig.png | Bin 0 -> 93609 bytes .../rst/common/images/instances_associate_peer.png | Bin 44348 -> 136925 bytes .../images/instances_awx_task_pods_hopnode.png | Bin 67956 -> 52871 bytes .../rst/common/images/instances_create_details.png | Bin 44222 -> 108531 bytes .../rst/common/images/instances_create_new.png | Bin 65543 -> 119424 bytes .../rst/common/images/instances_health_check.png | Bin 79525 -> 145062 bytes .../images/instances_health_check_pending.png | Bin 63401 -> 90308 bytes .../rst/common/images/instances_install_bundle.png | Bin 42781 -> 116533 bytes .../rst/common/images/instances_list_view.png | Bin 101346 -> 212807 bytes .../images/instances_mesh_ingress_topology.png | Bin 0 -> 61091 bytes .../rst/common/images/instances_peers_tab.png | Bin 47186 -> 123023 bytes .../topology-viewer-instance-with-errors.png | Bin 84474 -> 243956 bytes docs/docsite/rst/userguide/glossary.rst | 13 -- 16 files changed, 136 insertions(+), 85 deletions(-) create mode 100644 docs/docsite/rst/common/images/download-icon.png create mode 100644 docs/docsite/rst/common/images/instances-execution-node-download.png create mode 100644 docs/docsite/rst/common/images/instances-job-template-using-remote-execution-ig.png create mode 100644 docs/docsite/rst/common/images/instances_mesh_ingress_topology.png (limited to 'docs') diff --git a/docs/docsite/rst/administration/instances.rst b/docs/docsite/rst/administration/instances.rst index f25957f04e..07a2eccf8c 100644 --- a/docs/docsite/rst/administration/instances.rst +++ b/docs/docsite/rst/administration/instances.rst @@ -51,81 +51,116 @@ Prerequisites - To manage instances from the AWX user interface, you must have System Administrator or System Auditor permissions. -Manage instances ------------------ +Common topologies +------------------ -Click **Instances** from the left side navigation menu to access the Instances list. +Instances make up the network of devices that communicate with one another. They are the building blocks of an automation mesh. These building blocks serve as nodes in a mesh topology. There are several kinds of instances: -.. image:: ../common/images/instances_list_view.png - :alt: List view of instances in AWX - -The Instances list displays all the current nodes in your topology, along with relevant details: ++-----------+-----------------------------------------------------------------------------------------------------------------+ +| Node Type | Description | ++===========+=================================================================================================================+ +| Control | Nodes that run persistent Ansible Automation Platform services, and delegate jobs to hybrid and execution nodes | ++-----------+-----------------------------------------------------------------------------------------------------------------+ +| Hybrid | Nodes that run persistent Ansible Automation Platform services and execute jobs | +| | (not applicable to operator-based installations) | ++-----------+-----------------------------------------------------------------------------------------------------------------+ +| Hop | Used for relaying across the mesh only | ++-----------+-----------------------------------------------------------------------------------------------------------------+ +| Execution | Nodes that run jobs delivered from control nodes (jobs submitted from the user’s Ansible automation) | ++-----------+-----------------------------------------------------------------------------------------------------------------+ -- **Host Name** +Simple topology +~~~~~~~~~~~~~~~~ -.. _node_statuses: +One of the ways to expand job capacity is to create a standalone execution node that can be added to run alongside the Kubernetes deployment of AWX. These machines will not be a part of the AWX Kubernetes cluster. The control nodes running in the cluster will connect and submit work to these machines via Receptor. The machines are registered in AWX as type "execution" instances, meaning they will only be used to run AWX jobs, not dispatch work or handle web requests as control nodes do. -- **Status** indicates the state of the node: +Hop nodes can be added to sit between the control plane of AWX and standalone execution nodes. These machines will not be a part of the AWX Kubernetes cluster and they will be registered in AWX as node type "hop", meaning they will only handle inbound and outbound traffic for otherwise unreachable nodes in a different or more strict network. - - **Installed**: a node that has successfully installed and configured, but has not yet passed the periodic health check - - **Ready**: a node that is available to run jobs or route traffic between nodes on the mesh. This replaces the previously “Healthy” node state used in the mesh topology - - **Provisioning**: a node that is in the process of being added to a current mesh, but is awaiting the job to install all of the packages (currently not yet supported and is subject to change in a future release) - - **Deprovisioning**: a node that is in the process of being removed from a current mesh and is finishing up jobs currently running on it - - **Unavailable**: a node that did not pass the most recent health check, indicating connectivity or receptor problems - - **Provisioning Failure**: a node that failed during provisioning (currently not yet supported and is subject to change in a future release) - - **De-provisioning Failure**: a node that failed during deprovisioning (currently not yet supported and is subject to change in a future release) +Below is an example of an AWX task pod with two execution nodes. Traffic to execution node 2 flows through a hop node that is setup between it and the control plane. -- **Node Type** specifies whether the node is a control, hop, execution node, or hybrid (not applicable to operator-based installations). See :term:`node` for further detail. -- **Capacity Adjustment** allows you to adjust the number of forks in your nodes -- **Used Capacity** indicates how much capacity has been used -- **Actions** allow you to enable or disable the instance to control whether jobs can be assigned to it +.. image:: ../common/images/instances_awx_task_pods_hopnode.png + :alt: AWX task pod with a hop node between the control plane of AWX and standalone execution nodes. -From this page, you can add, remove or run health checks on your nodes. Use the check boxes next to an instance to select it to remove or run a health check against. When a button is grayed-out, you do not have permission for that particular action. Contact your Administrator to grant you the required level of access. If you are able to remove an instance, you will receive a prompt for confirmation, like the one below: -.. image:: ../common/images/instances_delete_prompt.png - :alt: Prompt for deleting instances in AWX. +An example of a simple topology may look like the following: -.. note:: +.. list-table:: + :widths: 20 30 10 20 15 + :header-rows: 1 - You can still remove an instance even if it is active and jobs are running on it. AWX will attempt to wait for any jobs running on this node to complete before actually removing it. + * - Instance type + - Hostname + - Listener port + - Peers from control nodes + - Peers + * - Control plane + - awx-task-65d6d96987-mgn9j + - 27199 + - True + - [] + * - Hop node + - awx-hop-node + - 27199 + - True + - [] + * - Execution node + - awx-example.com + - n/a + - False + - ["hop node"] -Click **Remove** to confirm. -.. _health_check: -If running a health check on an instance, at the top of the Details page, a message displays that the health check is in progress. +Mesh topology +~~~~~~~~~~~~~~ -.. image:: ../common/images/instances_health_check.png - :alt: Health check for instances in AWX +Mesh ingress is a feature that allows remote nodes to connect inbound to the control plane. This is especially useful when creating remote nodes in restricted networking environments that disallow inbound traffic. -Click **Reload** to refresh the instance status. -.. note:: +.. image:: ../common/images/instances_mesh_ingress_topology.png + :alt: Mesh ingress architecture showing the peering relationship between nodes. - Health checks are ran asynchronously, and may take up to a minute for the instance status to update, even with a refresh. The status may or may not change after the health check. At the bottom of the Details page, a timer/clock icon displays next to the last known health check date and time stamp if the health check task is currently running. - .. image:: ../common/images/instances_health_check_pending.png - :alt: Health check for instance still in pending state. +An example of a topology that uses mesh ingress may look like the following: -The example health check shows the status updates with an error on node 'one': +.. list-table:: + :widths: 20 30 10 20 15 + :header-rows: 1 -.. image:: ../common/images/topology-viewer-instance-with-errors.png - :alt: Health check showing an error in one of the instances. + * - Instance type + - Hostname + - Listener port + - Peers from control nodes + - Peers + * - Control plane + - awx-task-xyz + - 27199 + - True + - [] + * - Hop node + - awx-hop-node + - 27199 + - True + - [] + * - Execution node + - awx-example.com + - n/a + - False + - ["hop node"] +In order to create a mesh ingress for AWX, see the `Mesh Ingress `_ chapter of the AWX Operator Documentation for information on setting up this type of topology. The last step is to create a remote execution node and add the execution node to an instance group in order for it to be used in your job execution. Whatever execution environment image used to run a playbook needs to be accessible for your remote execution node. Everything you are using in your playbook also needs to be accessible from this remote execution node. -Add an instance ----------------- +.. image:: ../common/images/instances-job-template-using-remote-execution-ig.png + :alt: Job template using the instance group with the execution node to run jobs. -One of the ways to expand capacity is to create an instance. Standalone execution nodes can be added to run alongside the Kubernetes deployment of AWX. These machines will not be a part of the AWX Kubernetes cluster. The control nodes running in the cluster will connect and submit work to these machines via Receptor. The machines are registered in AWX as type "execution" instances, meaning they will only be used to run AWX jobs, not dispatch work or handle web requests as control nodes do. -Hop nodes can be added to sit between the control plane of AWX and standalone execution nodes. These machines will not be a part of the AWX Kubernetes cluster and they will be registered in AWX as node type "hop", meaning they will only handle inbound and outbound traffic for otherwise unreachable nodes in a different or more strict network. -Below is an example of an AWX task pod with two execution nodes. Traffic to execution node 2 flows through a hop node that is setup between it and the control plane. +.. _ag_instances_add: -.. image:: ../common/images/instances_awx_task_pods_hopnode.png - :alt: AWX task pod with a hop node between the control plane of AWX and standalone execution nodes. +Add an instance +---------------- -To create an instance in AWV: +To create an instance in AWX: 1. Click **Instances** from the left side navigation menu of the AWX UI. @@ -147,17 +182,6 @@ An instance has several attributes that may be configured: - Check the **Managed by Policy** box to allow policy to dictate how the instance is assigned. - Check the **Peers from control nodes** box to allow control nodes to peer to this instance automatically. Listener port needs to be set if this is enabled or the instance is a peer. -In the example diagram above, the configurations are as follows: - -+------------------+---------------+--------------------------+--------------+ -| instance name | listener_port | peers_from_control_nodes | peers | -+==================+===============+==========================+==============+ -| execution node 1 | 27199 | true | [] | -+------------------+---------------+--------------------------+--------------+ -| hop node | 27199 | true | [] | -+------------------+---------------+--------------------------+--------------+ -| execution node 2 | null | false | ["hop node"] | -+------------------+---------------+--------------------------+--------------+ 3. Once the attributes are configured, click **Save** to proceed. @@ -193,7 +217,7 @@ Upon successful creation, the Details of the one of the created instances opens. all: hosts: remote-execution: - ansible_host: 18.206.206.34 + ansible_host: ansible_user: # user provided ansible_ssh_private_key_file: ~/.ssh/id_rsa @@ -234,32 +258,72 @@ You can remove an instance by clicking **Remove** in the Instances page, or by s 10. To view a graphical representation of your updated topology, refer to the :ref:`ag_topology_viewer` section of this guide. -Using a custom Receptor CA ---------------------------- +Manage instances +----------------- -The control nodes on the K8S cluster will communicate with execution nodes via mutual TLS TCP connections, running via Receptor. Execution nodes will verify incoming connections by ensuring the x509 certificate was issued by a trusted Certificate Authority (CA). +Click **Instances** from the left side navigation menu to access the Instances list. -You may choose to provide your own CA for this validation. If no CA is provided, AWX operator will automatically generate one using OpenSSL. +.. image:: ../common/images/instances_list_view.png + :alt: List view of instances in AWX -Given custom ``ca.crt`` and ``ca.key`` stored locally, run the following: +The Instances list displays all the current nodes in your topology, along with relevant details: -:: +- **Host Name** - kubectl create secret tls awx-demo-receptor-ca \ - --cert=/path/to/ca.crt --key=/path/to/ca.key +.. _node_statuses: -The secret should be named ``{AWX Custom Resource name}-receptor-ca``. In the above, the AWX Custom Resource name is "awx-demo". Replace "awx-demo" with your AWX Custom Resource name. +- **Status** indicates the state of the node: -If this secret is created after AWX is deployed, run the following to restart the deployment: + - **Installed**: a node that has successfully installed and configured, but has not yet passed the periodic health check + - **Ready**: a node that is available to run jobs or route traffic between nodes on the mesh. This replaces the previously “Healthy” node state used in the mesh topology + - **Provisioning**: a node that is in the process of being added to a current mesh, but is awaiting the job to install all of the packages (currently not yet supported and is subject to change in a future release) + - **Deprovisioning**: a node that is in the process of being removed from a current mesh and is finishing up jobs currently running on it + - **Unavailable**: a node that did not pass the most recent health check, indicating connectivity or receptor problems + - **Provisioning Failure**: a node that failed during provisioning (currently not yet supported and is subject to change in a future release) + - **De-provisioning Failure**: a node that failed during deprovisioning (currently not yet supported and is subject to change in a future release) -:: +- **Node Type** specifies whether the node is a control, hop, execution node, or hybrid (not applicable to operator-based installations). See :term:`node` for further detail. +- **Capacity Adjustment** allows you to adjust the number of forks in your nodes +- **Used Capacity** indicates how much capacity has been used +- **Actions** allow you to enable or disable the instance to control whether jobs can be assigned to it + +From this page, you can add, remove or run health checks on your nodes. Use the check boxes next to an instance to select it to remove or run a health check against. When a button is grayed-out, you do not have permission for that particular action. Contact your Administrator to grant you the required level of access. If you are able to remove an instance, you will receive a prompt for confirmation, like the one below: + +.. image:: ../common/images/instances_delete_prompt.png + :alt: Prompt for deleting instances in AWX. - kubectl rollout restart deployment awx-demo +.. note:: + + You can still remove an instance even if it is active and jobs are running on it. AWX will attempt to wait for any jobs running on this node to complete before actually removing it. + +Click **Remove** to confirm. + +.. _health_check: +If running a health check on an instance, at the top of the Details page, a message displays that the health check is in progress. + +.. image:: ../common/images/instances_health_check.png + :alt: Health check for instances in AWX + +Click **Reload** to refresh the instance status. .. note:: - Changing the receptor CA will sever connections to any existing execution nodes. These nodes will enter an *Unavailable* state, and jobs will not be able to run on them. You will need to download and re-run the install bundle for each execution node. This will replace the TLS certificate files with those signed by the new CA. The execution nodes will then appear in a *Ready* state after a few minutes. + Health checks are ran asynchronously, and may take up to a minute for the instance status to update, even with a refresh. The status may or may not change after the health check. At the bottom of the Details page, a timer/clock icon displays next to the last known health check date and time stamp if the health check task is currently running. + + .. image:: ../common/images/instances_health_check_pending.png + :alt: Health check for instance still in pending state. + +The example health check shows the status updates with an error on node 'one': + +.. image:: ../common/images/topology-viewer-instance-with-errors.png + :alt: Health check showing an error in one of the instances. + + +Using a custom Receptor CA +--------------------------- + +Refer to the AWX Operator Documentation, `Custom Receptor CA `_ for detail. Using a private image for the default EE diff --git a/docs/docsite/rst/common/images/download-icon.png b/docs/docsite/rst/common/images/download-icon.png new file mode 100644 index 0000000000..2529682406 Binary files /dev/null and b/docs/docsite/rst/common/images/download-icon.png differ diff --git a/docs/docsite/rst/common/images/instances-execution-node-download.png b/docs/docsite/rst/common/images/instances-execution-node-download.png new file mode 100644 index 0000000000..b605bb12a7 Binary files /dev/null and b/docs/docsite/rst/common/images/instances-execution-node-download.png differ diff --git a/docs/docsite/rst/common/images/instances-job-template-using-remote-execution-ig.png b/docs/docsite/rst/common/images/instances-job-template-using-remote-execution-ig.png new file mode 100644 index 0000000000..1c9173b5f8 Binary files /dev/null and b/docs/docsite/rst/common/images/instances-job-template-using-remote-execution-ig.png differ diff --git a/docs/docsite/rst/common/images/instances_associate_peer.png b/docs/docsite/rst/common/images/instances_associate_peer.png index 397d7c2916..60cfad6122 100644 Binary files a/docs/docsite/rst/common/images/instances_associate_peer.png and b/docs/docsite/rst/common/images/instances_associate_peer.png differ diff --git a/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png b/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png index c9b65e64dc..5682c34be3 100644 Binary files a/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png and b/docs/docsite/rst/common/images/instances_awx_task_pods_hopnode.png differ diff --git a/docs/docsite/rst/common/images/instances_create_details.png b/docs/docsite/rst/common/images/instances_create_details.png index 0ceefefaf5..2e282d5d19 100644 Binary files a/docs/docsite/rst/common/images/instances_create_details.png and b/docs/docsite/rst/common/images/instances_create_details.png differ diff --git a/docs/docsite/rst/common/images/instances_create_new.png b/docs/docsite/rst/common/images/instances_create_new.png index 5e69c6f7f5..288c2287e5 100644 Binary files a/docs/docsite/rst/common/images/instances_create_new.png and b/docs/docsite/rst/common/images/instances_create_new.png differ diff --git a/docs/docsite/rst/common/images/instances_health_check.png b/docs/docsite/rst/common/images/instances_health_check.png index 0918b5f635..1086237e1b 100644 Binary files a/docs/docsite/rst/common/images/instances_health_check.png and b/docs/docsite/rst/common/images/instances_health_check.png differ diff --git a/docs/docsite/rst/common/images/instances_health_check_pending.png b/docs/docsite/rst/common/images/instances_health_check_pending.png index 3a8cd7a29d..c19fc94fe9 100644 Binary files a/docs/docsite/rst/common/images/instances_health_check_pending.png and b/docs/docsite/rst/common/images/instances_health_check_pending.png differ diff --git a/docs/docsite/rst/common/images/instances_install_bundle.png b/docs/docsite/rst/common/images/instances_install_bundle.png index e45f79da85..3041d6b097 100644 Binary files a/docs/docsite/rst/common/images/instances_install_bundle.png and b/docs/docsite/rst/common/images/instances_install_bundle.png differ diff --git a/docs/docsite/rst/common/images/instances_list_view.png b/docs/docsite/rst/common/images/instances_list_view.png index e66eb45726..5da391986e 100644 Binary files a/docs/docsite/rst/common/images/instances_list_view.png and b/docs/docsite/rst/common/images/instances_list_view.png differ diff --git a/docs/docsite/rst/common/images/instances_mesh_ingress_topology.png b/docs/docsite/rst/common/images/instances_mesh_ingress_topology.png new file mode 100644 index 0000000000..63144158ae Binary files /dev/null and b/docs/docsite/rst/common/images/instances_mesh_ingress_topology.png differ diff --git a/docs/docsite/rst/common/images/instances_peers_tab.png b/docs/docsite/rst/common/images/instances_peers_tab.png index c75c17de8d..ae145bb93a 100644 Binary files a/docs/docsite/rst/common/images/instances_peers_tab.png and b/docs/docsite/rst/common/images/instances_peers_tab.png differ diff --git a/docs/docsite/rst/common/images/topology-viewer-instance-with-errors.png b/docs/docsite/rst/common/images/topology-viewer-instance-with-errors.png index 214e65301f..f1e0163b55 100644 Binary files a/docs/docsite/rst/common/images/topology-viewer-instance-with-errors.png and b/docs/docsite/rst/common/images/topology-viewer-instance-with-errors.png differ diff --git a/docs/docsite/rst/userguide/glossary.rst b/docs/docsite/rst/userguide/glossary.rst index c0cf2749b0..f55659f69c 100644 --- a/docs/docsite/rst/userguide/glossary.rst +++ b/docs/docsite/rst/userguide/glossary.rst @@ -90,19 +90,6 @@ Glossary Node A node corresponds to entries in the instance database model, or the ``/api/v2/instances/`` endpoint, and is a machine participating in the cluster / mesh. The unified jobs API reports ``awx_node`` and ``execution_node`` fields. The execution node is where the job runs, and AWX node interfaces between the job and server functions. - +-----------+-----------------------------------------------------------------------------------------------------------------+ - | Node Type | Description | - +-----------+-----------------------------------------------------------------------------------------------------------------+ - | Control | Nodes that run persistent Ansible Automation Platform services, and delegate jobs to hybrid and execution nodes | - +-----------+-----------------------------------------------------------------------------------------------------------------+ - | Hybrid | Nodes that run persistent Ansible Automation Platform services and execute jobs | - | | (not applicable to operator-based installations) | - +-----------+-----------------------------------------------------------------------------------------------------------------+ - | Hop | Used for relaying across the mesh only | - +-----------+-----------------------------------------------------------------------------------------------------------------+ - | Execution | Nodes that run jobs delivered from control nodes (jobs submitted from the user’s Ansible automation) | - +-----------+-----------------------------------------------------------------------------------------------------------------+ - Notification Template An instance of a notification type (Email, Slack, Webhook, etc.) with a name, description, and a defined configuration. -- cgit v1.2.3