Hey DevOps, DevSecOps Engineers, SRE new bees here I am going to share the Learnings which I've executed every day found that this the best and new tip to improve the performance of ansible playbook executions sorted out and listed here.
Planning and designing automation with Ansible
- Most common DevOps tool used for Planning and Designing is Confluence page
- Design document must contain a clear "Objective" - where you will be describe why you wish to do automation on what area
- Tracking purpose always use a ticketing tool entry preferred tool Jira
- The design can be breakdown into two levels
- High level design where we will detail about what each task need to be covered
- Low level design where we discuss in-depth ideology on each task along with the possible constraints
- Usage of global variables (AWX UI use extra vars, host_vars, group_vars etc) discuss their necessity
- AWX/Tower Job template construct possible options as input to handle overall objective, if not sufficient then chain with Workflow constructs
- The execution of Job template every possible option that is a valid to consider it as Test Case
- References Each design document may have some researching requirements that may include your internal company confluence pages or external specific Ansible/AWX technical knowledge articles to help the overall objective.
Playbook Tricks and tips
- Ansible playbook directory the directory layout always have the playbooks in the Project directory preferable because Ansible/AWX tower can read the vars from that path to parallel directories such as group_vars, vars etc.
- Playbook description : First thing fist always add a description comment and in that please mention how this playbook works. Other SRE/DevOps engineer should be able to run without asking you how to run this.
- In the description also include extra variable in order to execute the playbook in happy path also give clarity on mandatory or optional variables.
- While defining the extra variables first list all mandatory variables and then go for optional
- Better to include tags used in the playbook and their purpose so that end user can easily select for task execution or skip the tasks under certain conditions
- Please add comment before every newly introduced task, that should be highlight the detail process task will do.
- Writing your Task always have named task, that should give brief understanding about what you will be doing in that task,
- Task name must have title cased
- Sometime you might be copying the task from other playbooks re-check that name is appropriate or not.
- If a task have some critical logic must add comment on top of the task the purpose should be described
- Manage Facts : Ansible's gather_facts directive implies time consuming operations so it is a good practice to disable it, when it is not specifically needed in your play.
- Don'ts during execution: Do not use LIMIT option as much as possible there could be some plays which will skipped if you use LIMIT option on AWX templates. To resolve this, better option is to use hosts: value assigned to a define variable like targets and have a default value as 'all' or 'localhost as per your play need. When you use the hostvars in a play this limit value could be causes an issue to collect facts from the hostvars.
- I also recommend to specifically set "any_errors_fatal: true" in all the plays where we can expect/catch the ERRORs during the execution.
- If you already defined a play with any_errors_fatal: false then DON'T define the ignore_errors to the same play.
- When you build the email notification logic in a play for successful flow, always ensure that you must have failure email as well. You can limit the target audience in case of failure
- Encrypt the sensitive data content inside a playbook with ansible-vault command
- Always double check your inventory list with ansible-inventory command with list, graph options
- In production if all possible use cases tested in non-production then better to reduce the verbose level or suppress the logs this can be done from the task level by using `no_log: "{{ log_suppression| default(true) }}"` at the end of the task definition. Here log_suppression is an Ansible variable this can be changed at the time of execution the value can be either 'true' or 'false'.
- While dealing with the when condition in Ansible number validation use the int filter, do not use character value comparisons ( x == '0' ) instead provide the numerical value( x == 0 ).
- While working with the lineinfile module, if same playbook is triggered from multiple AWX Consoles there could be a race condition and it can be fixed with throttle option.
use Ansible's setting "throttle" set to 1 in the task where lineinfile is executed
"The throttle keyword limits the number of workers for a particular task. It can be set at the block and task level.
Use throttle to restrict tasks that may be CPU-intensive or interact with a rate-limiting API"
Example:
- name: Updating hostname and uptime in days in file CSV lineinfile: path: /tmp/uptime-report.csv line: "{{ inventory_hostname }},{{ box_uptime }}" throttle: 1 delegate_to: localhost
AWX Admins Tricks
- When you install AWX BETTER version always prefer to use latest -1 version that could be stable and consistent during your installation process.
- If you ae using existing AWX/Ansible job templates to test various Automations requirements/enhancements that we work on:
- We usually need to alter the Job Templates temporarily to performing testing (i.e. we change the Projects, and besides we may add some Limits or Skip tags, or we may use some EXTRA Variables like 'block_reboot' or 'log_suppression')
- AWX Smart Inventory creation is simple to use only thing you need to know how to use the regex (regular expression) that will bring the combination that satisfies to the existing host list and then it will create the new inventory out of existing inventory hosts.
To get all hosts
name.regex:.*
To get hosts that start with ec2name.regex:^ec2
To get hosts that contains 'prod'name.regex:prod
- Problem: there are good chances, that we may forget to undo all/some of these temporary changes, resulting in broke jobs or incomplete runs
- Solution(Best Practice): Always create a copy of the job you want to use for testing and alter that copy according to the needs, then just delete the temporary job once test finished
- making a copy of a job is trivial and will ensure we are not breaking anything in the existing system.
- The latest AWX versions have different weird behavior! AWX 17.1.0 inventory creation from the source control is allowed only when the inventory yaml file-permissions have executable!
## To set the permission in git git ls-files --stage git update-index --chmod=+x 'name-of-shell-script' git commit -m "made a file executable" git push
AWX Workflow flow control better to use "On Success" because we usually want to stop remaining actions/(in my scenario reboots) when there is a failure encountered (this may cause the affect to multiple/all remaining servers).
No comments:
Post a Comment