Skip to content

Commit

Permalink
Avoid false positive for Ansible failure (#21193)
Browse files Browse the repository at this point in the history
* Allow all kind of Ansible error to propagate

Issue is about Ansible deployment stage for jobs using
qe-sap-deployment.
Avoid that Ansible error that are not `Failure:` or `Fatal:` result in
false positive. Propagate error to test module.
  • Loading branch information
mpagot authored Feb 17, 2025
1 parent 316bb4c commit 060533b
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 69 deletions.
25 changes: 18 additions & 7 deletions lib/qesapdeployment.pm
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ our @EXPORT = qw(
qesap_ansible_script_output
qesap_ansible_fetch_file
qesap_ansible_reg_module
qesap_ansible_error_detection
qesap_create_ansible_section
qesap_remote_hana_public_ips
qesap_wait_for_ssh
Expand Down Expand Up @@ -2427,9 +2426,13 @@ sub qesap_az_diagnostic_log {
qesap_terrafom_ansible_deploy_retry( error_log=>$error_log )
error_log - ansible error log file name
Retry to deploy terraform + ansible
Return 0: we manage the failure properly
Return 1: something went wrong or we do not know what to do with the failure
Retry to deploy terraform + ansible. This function is only expected to be called if a previous `qesap.py`
execution returns a non zero exit code. If this function is called after a successful execution,
the qesap_ansible_error_detection will not find anything wrong in the log, wrongly concluding that
an unknown error is in the log and skipping the retry and this function will return 1.
.
Return 0: this function manage the failure properly, perform a retry and retry was a successful deployment
Return 1: something went wrong or this function does not know what to do with the failure
=over
Expand All @@ -2445,6 +2448,7 @@ sub qesap_terrafom_ansible_deploy_retry {
foreach (qw(error_log provider)) { croak "Missing mandatory $_ argument" unless $args{$_}; }

my $detected_error = qesap_ansible_error_detection(error_log => $args{error_log});
record_info('qesap_terrafom_ansible_deploy_retry', "detected error:$detected_error");
my @ret;
# 3: no sudo password
if ($detected_error eq 3) {
Expand Down Expand Up @@ -2505,6 +2509,13 @@ sub qesap_terrafom_ansible_deploy_retry {
record_info('ANSIBLE RETRY PASS');
$detected_error = 0;
}
else {
# qesap_ansible_error_detection return:
# - 0 (unknown error),
# - 1 (generic fatal, not something we can solve by retry) or something else...
# Mark both as a failure in the retry, allowing error to be propagated.
$detected_error = 1;
}
return $detected_error;
}

Expand All @@ -2515,8 +2526,8 @@ sub qesap_terrafom_ansible_deploy_retry {
Inspect the provided Ansible log and search for known issue in the log
Also provide a nice record_info to summarize the error
Return:
- 0: no errors
- 1: unknown generic error
- 0: unable to detect errors
- 1: generic fatal error
- 2: reboot timeout
- 3: no sudo password
Expand Down Expand Up @@ -2561,7 +2572,7 @@ sub qesap_ansible_error_detection {
qesap_test_postfail()
Post fail tasks suitable for post_fail_hook of the test modules.
This API is mainly designed for qesap regresstion test modules.
This API is mainly designed for qesap regression test modules.
=over
Expand Down
95 changes: 34 additions & 61 deletions t/09_qesapdeployment.t
Original file line number Diff line number Diff line change
Expand Up @@ -534,9 +534,9 @@ subtest '[qesap_file_find_string] success' => sub {
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);
my @calls;
# internally the function is using grep to search for a specific
# error string. Here the result of the grep.
my $log = 'ERROR OUTPUT: "msg": "Timed out waiting for last boot time check (timeout=600)",';
# Create a mock to replace the script_run
# error string. Here is an example of grep result.
# 'ERROR OUTPUT: "msg": "Timed out waiting for last boot time check (timeout=600)",';

# The mock will return, within the function under test,
# the result of the grep. grep return 0 in case of string match
$qesap->redefine(script_run => sub { push @calls, $_[0]; return 0; });
Expand Down Expand Up @@ -1228,17 +1228,46 @@ subtest '[qesap_add_server_to_hosts]' => sub {
ok((any { qr/sed.*\/etc\/hosts/ } @calls), 'AWS Region matches');
};

subtest '[qesap_terrafom_ansible_deploy_retry] no Ansible failures, no retry' => sub {
subtest '[qesap_terrafom_ansible_deploy_retry] no or unknown Ansible failures, no retry, error' => sub {
# Simulate to call the qesap_terraform_ansible_deploy_retry but
# error_detection does not find and known error in the log. It is something could
# happen if this function is called after a failure of some kind error_detection
# does not know, or if calling this function after a successful deployment.
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);
my $qesap_execute_calls = 0;

# 0: unable to detect errors
$qesap->redefine(qesap_ansible_error_detection => sub { return 0; });
$qesap->redefine(qesap_execute => sub { $qesap_execute_calls++; return; });
$qesap->redefine(record_info => sub { note(join(' ', 'RECORD_INFO -->', @_)); });

my $ret = qesap_terrafom_ansible_deploy_retry(error_log => 'CORAL', provider => 'NEMO');

ok $ret eq 0, "Return of qesap_terrafom_ansible_deploy_retry '$ret' is expected 0";
ok $ret eq 1, "Return of qesap_terrafom_ansible_deploy_retry '$ret' is expected 1";
ok $qesap_execute_calls eq 0, "qesap_execute() never called (qesap_execute_calls: $qesap_execute_calls expected 0)";
};

subtest '[qesap_terrafom_ansible_deploy_retry] no or unknown Ansible failures, no retry, error. More layers' => sub {
# Like previous test but only mock testapi
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);
my $qesap_execute_calls = 0;
my @calls;

# Simulate we never find the string in the Ansible log file
# Simulate grep within qesap_file_find_string, within qesap_ansible_error_detection.
$qesap->redefine(script_run => sub { push @calls, $_[0]; return 1; });

$qesap->redefine(script_output => sub {
push @calls, shift;
return ''; });

$qesap->redefine(qesap_execute => sub { $qesap_execute_calls++; return; });
$qesap->redefine(record_info => sub { note(join(' ', 'RECORD_INFO -->', @_)); });

my $ret = qesap_terrafom_ansible_deploy_retry(error_log => 'CORAL', provider => 'NEMO');

note("\n C--> " . join("\n C--> ", @calls));
ok $ret eq 1, "Return of qesap_terrafom_ansible_deploy_retry '$ret' is expected 1";
ok $qesap_execute_calls eq 0, "qesap_execute() never called (qesap_execute_calls: $qesap_execute_calls expected 0)";
};

Expand Down Expand Up @@ -1298,62 +1327,6 @@ subtest '[qesap_terrafom_ansible_deploy_retry] reboot timeout Ansible failures'
ok $qesap_execute_calls eq 3, "qesap_execute() never called (qesap_execute_calls: $qesap_execute_calls expected 3)";
};

subtest '[qesap_ansible_error_detection] no error' => sub {
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);
my @input;

# Simulate we never find the string in the Ansible log file
$qesap->redefine(qesap_file_find_string => sub {
my (%args) = @_;
push @input, $args{file};
return 0; });
$qesap->redefine(script_output => sub {
push @input, shift;
return ''; });
$qesap->redefine(record_info => sub { note(join(' ', 'RECORD_INFO -->', @_)); });

my $ret = qesap_ansible_error_detection(error_log => 'SHELL');

note("\n I--> " . join("\n I--> ", @input));
ok $ret eq 0, "If no error detected return '$ret' is 0";
ok((any { qr/SHELL/ } @input), 'At least one of the serach is on the provided input file SHELL');
};

subtest '[qesap_ansible_error_detection] generic error' => sub {
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);

# The specific search with qesap_file_find_string
# return nothing, but ...
$qesap->redefine(qesap_file_find_string => sub { return 0; });

# The generic one return something
$qesap->redefine(script_output => sub { return 'ALGA'; });
$qesap->redefine(record_info => sub { note(join(' ', 'RECORD_INFO -->', @_)); });

my $ret = qesap_ansible_error_detection(error_log => 'SHELL');
ok $ret eq 1, "If generic error detected return '$ret' is 1";
};

subtest '[qesap_ansible_error_detection] specific error' => sub {
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);

# The specific search with qesap_file_find_string
# return nothing, but ...
$qesap->redefine(qesap_file_find_string => sub {
my (%args) = @_;
if ($args{search_string} eq 'Timed out waiting for last boot time check') {
return 1;
}
return 0; });

# The generic one return something
$qesap->redefine(script_output => sub { return ''; });
$qesap->redefine(record_info => sub { note(join(' ', 'RECORD_INFO -->', @_)); });

my $ret = qesap_ansible_error_detection(error_log => 'SHELL');
ok $ret eq 2, "If generic error detected return '$ret' is 2";
};

subtest '[qesap_test_postfail]' => sub {
my $qesap = Test::MockModule->new('qesapdeployment', no_auto => 1);

Expand Down
3 changes: 2 additions & 1 deletion tests/sles4sap/qesapdeployment/deploy.pm
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ sub run {
logname => 'qesap_exec_ansible.log.txt',
verbose => 1,
timeout => 3600);
record_info('ANSIBLE RESULT', "ret0:$ret[0] ret1:$ret[1]");
my $find_cmd = join(' ',
'find',
'/tmp/results/',
Expand All @@ -57,7 +58,7 @@ sub run {
#enter_cmd("rm $log");
}
if ($ret[0]) {
# Retry to deploy terraform + ansible
record_info("Retry to deploy terraform + ansible");
if (qesap_terrafom_ansible_deploy_retry(error_log => $ret[1], provider => $provider)) {
die "Retry failed, original ansible return: $ret[0]";
}
Expand Down

0 comments on commit 060533b

Please sign in to comment.