Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of Cluster creation stay in Deploying state #884

Open
sbusso opened this issue Feb 6, 2025 · 11 comments
Open

Status of Cluster creation stay in Deploying state #884

sbusso opened this issue Feb 6, 2025 · 11 comments

Comments

@sbusso
Copy link

sbusso commented Feb 6, 2025

Bug description

I deployed a 1 instance cluster to Hetzner and the operation stayed in in_progress despite the deployment being finished

TASK [deploy-finish : Connection info] *****************************************
ok: [10.0.1.1] => {
    "msg": {
        "address": {
            "primary": "10.0.1.2"
        },
        "password": "redacted",
        "port": "6432",
        "superuser": "postgres"
    }
}
PLAY RECAP *********************************************************************
10.0.1.1                   : ok=125  changed=60   unreachable=0    failed=0    skipped=446  rescued=0    ignored=0   
localhost                  : ok=36   changed=14   unreachable=0    failed=0    skipped=204  rescued=0    ignored=0 
Image

Expected behavior

Status to change to deployed

Steps to reproduce

  1. Create a 1 instance cluster to deploy on Hetzner

Installation method

Console (UI)

System info

Autobase console deployed to a Docker Swarm cluster using the individual images for ui/api/db

Additional info

No response

@sbusso sbusso added bug Something isn't working needs triage labels Feb 6, 2025
@vitabaks
Copy link
Owner

vitabaks commented Feb 6, 2025

Please attach API service log

example:

docker exec autobase-console cat /var/log/supervisor/pg-console-api-stdout.log | gzip > autobase-console-api-stdout.log.gz
docker exec autobase-console cat /var/log/supervisor/pg-console-api-stderr.log | gzip > autobase-console-api-stderr.log.gz

@sbusso
Copy link
Author

sbusso commented Feb 6, 2025

Please attach API service log

example:

docker exec autobase-console cat /var/log/supervisor/autobase-console-api-stdout.log | gzip > autobase-console-api-stdout.log.gz
docker exec autobase-console cat /var/log/supervisor/autobase-console-api-stderr.log | gzip > autobase-console-api-stderr.log.gz

I believe there are no supervisor on the autobase-console-api image. But here are the error I can see from the container logs:

2025-02-06T08:45:52.413828596Z 2025-02-06T08:45:52Z ERR failed to get containers status | app=pg_console version=2.1.0 module=log_watcher cid=08dfd4e3-a341-4846-95e3-bcaec838d8fa operation_id=1 error=Error response from daemon: No such container: 9ff2f13b6c98637729dd256662046aa85149da2924f209a3c77d3519caefe1f9
    
2025-02-06T08:45:52.414570774Z 2025-02-06T08:45:52Z ERR failed to get containers status | app=pg_console version=2.1.0 module=log_watcher cid=ba0c3742-204b-4204-a95a-5680d7e09980 operation_id=2 error=Error response from daemon: No such container: daae9291a4641d258b8ac295cae95aa9ae10121470b3bc894cc86424690469a8

This is after a second attempt and when I try a third time, I will get a 3rd error for a 3rd containers, is it trying to track the automation container status?

@vitabaks
Copy link
Owner

vitabaks commented Feb 6, 2025

@ngurban Could you take a look at this?

@vitabaks
Copy link
Owner

vitabaks commented Feb 6, 2025

@sbusso Could you attach the full log? this would help us in the diagnostics.

@sbusso
Copy link
Author

sbusso commented Feb 6, 2025

Here are logs of another try with the vars from frontend redacted:

log.json

@sbusso
Copy link
Author

sbusso commented Feb 6, 2025

when the automation container is created the api get the right container id:

{"log":"{\"level\":\"info\",\"app\":\"pg_console\",\"version\":\"2.1.0\",\"cid\":\"9a6bc197-91c6-4ae7-ae3d-e60ddf9d83b4\",\"operation\":{\"ID\":5,\"ProjectID\":34,\"ClusterID\":7,\"DockerCode\":\"a2f997d0b153d821776f5edb28f2d4d0a7d475ae8cb813e56b0d9489bf400b27\",\"Cid\":\"9a6bc197-91c6-4ae7-ae3d-e60ddf9d83b4\",\"Type\":\"deploy\",\"Status\":\"in_progress\",\"Log\":null,\"CreatedAt\":\"2025-02-06T12:20:48.764705Z\",\"UpdatedAt\":null},\"time\":\"2025-02-06T12:20:48Z\",\"message\":\"operation was created\"}\n","stream":"stdout","time":"2025-02-06T12:20:48.766871737Z"}

but when the container is destroyed, the api loses track and throws an error, either the operation was successful or failed.

@sbusso
Copy link
Author

sbusso commented Feb 6, 2025

The only reference I found is

err = lw.dockerManager.RemoveContainer(opCtx, xdocker.InstanceID(op.DockerCode))

where it looks like the logwatcher is destroying the container when the operation is finished but there is no update of the operation status.

@vitabaks
Copy link
Owner

vitabaks commented Feb 7, 2025

Thanks, we'll take a look at it.

It seems that this problem is not always reproduced:

Image
Image

@sbusso could you share your instructions on how to start the Autobase console? Have you mounted a directory with the ansible json log?

  --volume /tmp/ansible:/tmp/ansible \

Example:

cat /tmp/ansible/postgres-cluster-01.json

...
{
    "time": "2025-02-07T10:05:14.355137",
    "summary": {
        "localhost": {
            "ok": 36,
            "failures": 0,
            "unreachable": 0,
            "changed": 10,
            "skipped": 204,
            "rescued": 0,
            "ignored": 0
        },
        "10.0.1.1": {
            "ok": 125,
            "failures": 0,
            "unreachable": 0,
            "changed": 60,
            "skipped": 446,
            "rescued": 0,
            "ignored": 0
        }
    },
    "status": "success"
}

@sbusso
Copy link
Author

sbusso commented Feb 7, 2025

Ok, the path /tmp/ansible was the issue. I had a lot of trouble with it as I deployed to a cluster, and the path was not available without creating it manually on each node, so I used /tmp. It is working after updating the mounting point.

Could this path use a named volume shared by the containers instead of a host hardcoded path?

Here is the config I use. I can submit it with a caddy config in a PR over the weekend

services:
  pg-console-api:
    image: autobase/console_api:latest
    container_name: pg-console-api
    restart: unless-stopped

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/ansible:/tmp/ansible
    depends_on:
      - pg-console-db

    environment:
      - PG_CONSOLE_API_URL=${PG_CONSOLE_API_URL}
      - PG_CONSOLE_AUTHORIZATION_TOKEN=${PG_CONSOLE_AUTH_TOKEN}
      - PG_CONSOLE_DB_HOST=pg-console-db
      - PG_CONSOLE_LOGGER_LEVEL=${PG_CONSOLE_LOGGER_LEVEL:-INFO}
    networks:
      - pg-console
      - caddy

  pg-console-ui:
    image: autobase/console_ui:latest
    container_name: pg-console-ui
    restart: unless-stopped
    labels:
      caddy: ${PG_CONSOLE_DOMAIN}
      [email protected]: /api/v1/*
      caddy.0_reverse_proxy: "@api pg-console-api:8080"
      caddy.1_reverse_proxy: "{{upstreams 80}}"

    environment:
      - PG_CONSOLE_API_URL=${PG_CONSOLE_API_URL}
      - PG_CONSOLE_AUTHORIZATION_TOKEN=${PG_CONSOLE_AUTH_TOKEN}
    networks:
      - pg-console
      - caddy
  pg-console-db:
    image: autobase/console_db:latest
    container_name: pg-console-db
    restart: unless-stopped
    volumes:
      - console_postgres:/var/lib/postgresql
    networks:
      - pg-console

volumes:
  console_postgres:


networks:
  pg-console:
  caddy:
    name: caddy
    external: true

@ngurban
Copy link

ngurban commented Feb 7, 2025

@ngurban Could you take a look at this?

Hi, @sbusso! Can you turn on TRACE log level and collect log one more time?

@vitabaks vitabaks removed the bug Something isn't working label Feb 7, 2025
@vitabaks vitabaks changed the title [Bug] Status of Cluster creation stay in Deploying state Status of Cluster creation stay in Deploying state Feb 7, 2025
@vitabaks vitabaks added the question Further information is requested label Feb 7, 2025
@vitabaks
Copy link
Owner

vitabaks commented Feb 7, 2025

Could this path use a named volume shared by the containers instead of a host hardcoded path?

I think so. The automation container’s log needs to be accessible to the API container.

@vitabaks vitabaks removed the question Further information is requested label Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants