Hi. We (cc @nimrodshn) are trying out cluster-operator according to README, in fake=false mode.
We got MachineSet and Machine objects being created but we don't get any AWS instances.
Machine status remains at:
status:
lastUpdated: null
providerStatus: null
Looking at pod logs it seems AWS credentials didn't make it into openshift-ansible:
```TASK [openshift_aws : fetch master instances] **********************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_aws/tasks/setup_master_group.yml:10
Wednesday 11 July 2018 07:50:04 +0000 (0:00:00.033) 0:00:03.289 ********
Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/amazon/ec2_instance_facts.py
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: default
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 && sleep 0'
FAILED - RETRYING: fetch master instances (20 retries left).Result was: {
"attempts": 1,
"changed": false,
"instances": [],
"invocation": {
"module_args": {
"aws_access_key": null,
"aws_secret_key": null,
"ec2_url": null,
"filters": {
"instance-state-name": "running",
"tag:clusterid": "nshneor-gfv8m",
"tag:host-type": "master"
},
"instance_ids": [],
"profile": null,
"region": "us-east-1",
"security_token": null,
"validate_certs": true
}
},
"retries": 21
}
The pod has AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY set:
nshneor@dhcp-2-169 ~/workspace/go/src/github.com/openshift/cluster-operator (master) $ oc describe pods master-nshneor-gfv8m-nqts5-gcnk8
Name: master-nshneor-gfv8m-nqts5-gcnk8
Namespace: myproject
Node: localhost/10.35.2.169
Start Time: Wed, 11 Jul 2018 10:46:12 +0300
Labels: controller-uid=798a762e-84de-11e8-a192-28d2448581b1
job-name=master-nshneor-gfv8m-nqts5
Annotations: openshift.io/scc=restricted
Status: Running
IP: 172.17.0.4
Controlled By: Job/master-nshneor-gfv8m-nqts5
Containers:
install-masters:
Container ID: docker://31a09cd730e09b7e739654cc0fdc497a2d2e569f1142ceba566a38599b993e99
Image: cluster-operator-ansible:canary
Image ID: docker://sha256:2f0c518288260d1f0026dcc12129fa359b4909c4fbdaab83680d7e62fe295e25
Port: <none>
State: Running
Started: Wed, 11 Jul 2018 10:49:59 +0300
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 11 Jul 2018 10:48:02 +0300
Finished: Wed, 11 Jul 2018 10:49:45 +0300
Ready: True
Restart Count: 2
Environment:
INVENTORY_FILE: /ansible/inventory/hosts
ANSIBLE_HOST_KEY_CHECKING: False
OPTS: -vvv --private-key=/ansible/ssh/privatekey.pem -e @/ansible/inventory/vars
AWS_ACCESS_KEY_ID: <set to the key 'awsAccessKeyId' in secret 'nshneor-aws-creds'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'awsSecretAccessKey' in secret 'nshneor-aws-creds'> Optional: false
PLAYBOOK_FILE: /usr/share/ansible/openshift-ansible/playbooks/cluster-operator/aws/install_masters.yml
Mounts:
/ansible/inventory/ from inventory (rw)
/ansible/ssh/ from sshkey (rw)
/ansible/ssl/ from sslkey (rw)
/var/run/secrets/kubernetes.io/serviceaccount from cluster-installer-token-fvrqc (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
inventory:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: master-nshneor-gfv8m-nqts5
Optional: false
sshkey:
Type: Secret (a volume populated by a Secret)
SecretName: nshneor-ssh-key
Optional: false
sslkey:
Type: Secret (a volume populated by a Secret)
SecretName: nshneor-certs
Optional: false
cluster-installer-token-fvrqc:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-installer-token-fvrqc
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m default-scheduler Successfully assigned master-nshneor-gfv8m-nqts5-gcnk8 to localhost
Normal SuccessfulMountVolume 4m kubelet, localhost MountVolume.SetUp succeeded for volume "inventory"
Normal SuccessfulMountVolume 4m kubelet, localhost MountVolume.SetUp succeeded for volume "sshkey"
Normal SuccessfulMountVolume 4m kubelet, localhost MountVolume.SetUp succeeded for volume "sslkey"
Normal SuccessfulMountVolume 4m kubelet, localhost MountVolume.SetUp succeeded for volume "cluster-installer-token-fvrqc"
Warning BackOff 36s kubelet, localhost Back-off restarting failed container
Normal Pulled 24s (x3 over 4m) kubelet, localhost Container image "cluster-operator-ansible:canary" already present on machine
Normal Created 23s (x3 over 4m) kubelet, localhost Created container
The secrets do exist:
Name: nshneor-aws-creds
Namespace: myproject
Labels: <none>
Annotations:
Type: Opaque
Data
====
awsAccessKeyId: 20 bytes
awsSecretAccessKey: 40 bytes
Name: nshneor-ssh-key
Namespace: myproject
Labels: app=cluster-operator
Annotations:
Type: Opaque
Data
====
ssh-privatekey: 1674 bytes
How can we troubleshoot it further?
Hi. We (cc @nimrodshn) are trying out cluster-operator according to README, in fake=false mode.
We got
MachineSetandMachineobjects being created but we don't get any AWS instances.Machinestatus remains at:Looking at pod logs it seems AWS credentials didn't make it into openshift-ansible:
The pod has AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY set:
The secrets do exist:
How can we troubleshoot it further?