Status of Cluster creation stay in Deploying state #884

sbusso · 2025-02-06T05:14:38Z

Bug description

I deployed a 1 instance cluster to Hetzner and the operation stayed in in_progress despite the deployment being finished

TASK [deploy-finish : Connection info] *****************************************
ok: [10.0.1.1] => {
    "msg": {
        "address": {
            "primary": "10.0.1.2"
        },
        "password": "redacted",
        "port": "6432",
        "superuser": "postgres"
    }
}
PLAY RECAP *********************************************************************
10.0.1.1                   : ok=125  changed=60   unreachable=0    failed=0    skipped=446  rescued=0    ignored=0   
localhost                  : ok=36   changed=14   unreachable=0    failed=0    skipped=204  rescued=0    ignored=0

Expected behavior

Status to change to deployed

Steps to reproduce

Create a 1 instance cluster to deploy on Hetzner

Installation method

Console (UI)

System info

Autobase console deployed to a Docker Swarm cluster using the individual images for ui/api/db

Additional info

No response

The text was updated successfully, but these errors were encountered:

vitabaks · 2025-02-06T05:39:36Z

Please attach API service log

example:

docker exec autobase-console cat /var/log/supervisor/pg-console-api-stdout.log | gzip > autobase-console-api-stdout.log.gz
docker exec autobase-console cat /var/log/supervisor/pg-console-api-stderr.log | gzip > autobase-console-api-stderr.log.gz

sbusso · 2025-02-06T08:57:54Z

Please attach API service log

example:

docker exec autobase-console cat /var/log/supervisor/autobase-console-api-stdout.log | gzip > autobase-console-api-stdout.log.gz
docker exec autobase-console cat /var/log/supervisor/autobase-console-api-stderr.log | gzip > autobase-console-api-stderr.log.gz

I believe there are no supervisor on the autobase-console-api image. But here are the error I can see from the container logs:

2025-02-06T08:45:52.413828596Z 2025-02-06T08:45:52Z ERR failed to get containers status | app=pg_console version=2.1.0 module=log_watcher cid=08dfd4e3-a341-4846-95e3-bcaec838d8fa operation_id=1 error=Error response from daemon: No such container: 9ff2f13b6c98637729dd256662046aa85149da2924f209a3c77d3519caefe1f9
    
2025-02-06T08:45:52.414570774Z 2025-02-06T08:45:52Z ERR failed to get containers status | app=pg_console version=2.1.0 module=log_watcher cid=ba0c3742-204b-4204-a95a-5680d7e09980 operation_id=2 error=Error response from daemon: No such container: daae9291a4641d258b8ac295cae95aa9ae10121470b3bc894cc86424690469a8

This is after a second attempt and when I try a third time, I will get a 3rd error for a 3rd containers, is it trying to track the automation container status?

vitabaks · 2025-02-06T10:30:04Z

@ngurban Could you take a look at this?

vitabaks · 2025-02-06T10:56:20Z

@sbusso Could you attach the full log? this would help us in the diagnostics.

sbusso · 2025-02-06T12:18:54Z

Here are logs of another try with the vars from frontend redacted:

log.json

sbusso · 2025-02-06T12:26:56Z

when the automation container is created the api get the right container id:

{"log":"{\"level\":\"info\",\"app\":\"pg_console\",\"version\":\"2.1.0\",\"cid\":\"9a6bc197-91c6-4ae7-ae3d-e60ddf9d83b4\",\"operation\":{\"ID\":5,\"ProjectID\":34,\"ClusterID\":7,\"DockerCode\":\"a2f997d0b153d821776f5edb28f2d4d0a7d475ae8cb813e56b0d9489bf400b27\",\"Cid\":\"9a6bc197-91c6-4ae7-ae3d-e60ddf9d83b4\",\"Type\":\"deploy\",\"Status\":\"in_progress\",\"Log\":null,\"CreatedAt\":\"2025-02-06T12:20:48.764705Z\",\"UpdatedAt\":null},\"time\":\"2025-02-06T12:20:48Z\",\"message\":\"operation was created\"}\n","stream":"stdout","time":"2025-02-06T12:20:48.766871737Z"}

but when the container is destroyed, the api loses track and throws an error, either the operation was successful or failed.

sbusso · 2025-02-06T12:59:16Z

The only reference I found is

autobase/console/service/internal/watcher/log_watcher.go

Line 107 in 28f7557

    
           err = lw.dockerManager.RemoveContainer(opCtx, xdocker.InstanceID(op.DockerCode))

where it looks like the logwatcher is destroying the container when the operation is finished but there is no update of the operation status.

vitabaks · 2025-02-07T10:19:28Z

Thanks, we'll take a look at it.

It seems that this problem is not always reproduced:

@sbusso could you share your instructions on how to start the Autobase console? Have you mounted a directory with the ansible json log?

  --volume /tmp/ansible:/tmp/ansible \

Example:

cat /tmp/ansible/postgres-cluster-01.json

...
{
    "time": "2025-02-07T10:05:14.355137",
    "summary": {
        "localhost": {
            "ok": 36,
            "failures": 0,
            "unreachable": 0,
            "changed": 10,
            "skipped": 204,
            "rescued": 0,
            "ignored": 0
        },
        "10.0.1.1": {
            "ok": 125,
            "failures": 0,
            "unreachable": 0,
            "changed": 60,
            "skipped": 446,
            "rescued": 0,
            "ignored": 0
        }
    },
    "status": "success"
}

sbusso · 2025-02-07T11:51:20Z

Ok, the path /tmp/ansible was the issue. I had a lot of trouble with it as I deployed to a cluster, and the path was not available without creating it manually on each node, so I used /tmp. It is working after updating the mounting point.

Could this path use a named volume shared by the containers instead of a host hardcoded path?

Here is the config I use. I can submit it with a caddy config in a PR over the weekend

services:
  pg-console-api:
    image: autobase/console_api:latest
    container_name: pg-console-api
    restart: unless-stopped

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/ansible:/tmp/ansible
    depends_on:
      - pg-console-db

    environment:
      - PG_CONSOLE_API_URL=${PG_CONSOLE_API_URL}
      - PG_CONSOLE_AUTHORIZATION_TOKEN=${PG_CONSOLE_AUTH_TOKEN}
      - PG_CONSOLE_DB_HOST=pg-console-db
      - PG_CONSOLE_LOGGER_LEVEL=${PG_CONSOLE_LOGGER_LEVEL:-INFO}
    networks:
      - pg-console
      - caddy

  pg-console-ui:
    image: autobase/console_ui:latest
    container_name: pg-console-ui
    restart: unless-stopped
    labels:
      caddy: ${PG_CONSOLE_DOMAIN}
      [email protected]: /api/v1/*
      caddy.0_reverse_proxy: "@api pg-console-api:8080"
      caddy.1_reverse_proxy: "{{upstreams 80}}"

    environment:
      - PG_CONSOLE_API_URL=${PG_CONSOLE_API_URL}
      - PG_CONSOLE_AUTHORIZATION_TOKEN=${PG_CONSOLE_AUTH_TOKEN}
    networks:
      - pg-console
      - caddy
  pg-console-db:
    image: autobase/console_db:latest
    container_name: pg-console-db
    restart: unless-stopped
    volumes:
      - console_postgres:/var/lib/postgresql
    networks:
      - pg-console

volumes:
  console_postgres:


networks:
  pg-console:
  caddy:
    name: caddy
    external: true

ngurban · 2025-02-07T12:30:25Z

@ngurban Could you take a look at this?

Hi, @sbusso! Can you turn on TRACE log level and collect log one more time?

vitabaks · 2025-02-07T12:47:45Z

Could this path use a named volume shared by the containers instead of a host hardcoded path?

I think so. The automation container’s log needs to be accessible to the API container.

sbusso added bug Something isn't working needs triage labels Feb 6, 2025

vitabaks removed the needs triage label Feb 6, 2025

vitabaks removed the bug Something isn't working label Feb 7, 2025

vitabaks changed the title ~~[Bug] Status of Cluster creation stay in Deploying state~~ Status of Cluster creation stay in Deploying state Feb 7, 2025

vitabaks added the question Further information is requested label Feb 7, 2025

vitabaks removed the question Further information is requested label Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of Cluster creation stay in Deploying state #884

Status of Cluster creation stay in Deploying state #884

sbusso commented Feb 6, 2025

vitabaks commented Feb 6, 2025 •

edited

Loading

sbusso commented Feb 6, 2025

vitabaks commented Feb 6, 2025

vitabaks commented Feb 6, 2025 •

edited

Loading

sbusso commented Feb 6, 2025

sbusso commented Feb 6, 2025

sbusso commented Feb 6, 2025

vitabaks commented Feb 7, 2025

sbusso commented Feb 7, 2025

ngurban commented Feb 7, 2025 •

edited

Loading

vitabaks commented Feb 7, 2025

Status of Cluster creation stay in Deploying state #884

Status of Cluster creation stay in Deploying state #884

Comments

sbusso commented Feb 6, 2025

Bug description

Expected behavior

Steps to reproduce

Installation method

System info

Additional info

vitabaks commented Feb 6, 2025 • edited Loading

sbusso commented Feb 6, 2025

vitabaks commented Feb 6, 2025

vitabaks commented Feb 6, 2025 • edited Loading

sbusso commented Feb 6, 2025

sbusso commented Feb 6, 2025

sbusso commented Feb 6, 2025

vitabaks commented Feb 7, 2025

sbusso commented Feb 7, 2025

ngurban commented Feb 7, 2025 • edited Loading

vitabaks commented Feb 7, 2025

vitabaks commented Feb 6, 2025 •

edited

Loading

vitabaks commented Feb 6, 2025 •

edited

Loading

ngurban commented Feb 7, 2025 •

edited

Loading