Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry deadline exceeded #900

Open
mburring opened this issue Mar 11, 2025 · 2 comments
Open

retry deadline exceeded #900

mburring opened this issue Mar 11, 2025 · 2 comments

Comments

@mburring
Copy link

Describe the bug

We have two separate icinga instances running identical configurations and icingadb will randomly crash with a 'retry deadline exceeded' error.

Both of these installations are single master.

To Reproduce

Appears random

Expected behavior

That it doesn't happen

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Icinga DB version: 1.2.1-1+ubuntu20.04
  • Icinga 2 version: 2.14.5-1+ubuntu20.04
  • Operating System and version: Ubuntu 20.04

Additional context

● icingadb.service - Icinga DB
     Loaded: loaded (/lib/systemd/system/icingadb.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2025-03-11 01:50:01 AEDT; 10h ago
    Process: 1112676 ExecStart=/usr/sbin/icingadb --config /etc/icingadb/config.yml (code=exited, status=1/FAILURE)
   Main PID: 1112676 (code=exited, status=1/FAILURE)

Mar 11 01:49:01 master1 icingadb[1112676]: heartbeat: Waiting for Icinga heartbeat
Mar 11 01:49:20 master1 icingadb[1112676]: history-sync: Synced 5 notification history items
Mar 11 01:49:20 master1 icingadb[1112676]: history-sync: Synced 36 state history items
Mar 11 01:49:40 master1 icingadb[1112676]: history-sync: Synced 33 state history items
Mar 11 01:49:40 master1 icingadb[1112676]: history-sync: Synced 4 notification history items
Mar 11 01:50:00 master1 icingadb[1112676]: history-sync: Synced 4 notification history items
Mar 11 01:50:00 master1 icingadb[1112676]: history-sync: Synced 32 state history items
Mar 11 01:50:01 master1 icingadb[1112676]: retry deadline exceeded
                                                                           github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
                                                                                   github.com/icinga/icingadb/pkg/icingadb/ha.go:166
                                                                           runtime.goexit
                                                                                   runtime/asm_amd64.s:1700
                                                                           HA aborted
                                                                           github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1
                                                                                   github.com/icinga/icingadb/pkg/icingadb/ha.go:134
                                                                           sync.(*Once).doSlow
                                                                                   sync/once.go:76
                                                                           sync.(*Once).Do
                                                                                   sync/once.go:67
                                                                           github.com/icinga/icingadb/pkg/icingadb.(*HA).abort
                                                                                   github.com/icinga/icingadb/pkg/icingadb/ha.go:132
                                                                           github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
                                                                                   github.com/icinga/icingadb/pkg/icingadb/ha.go:166
                                                                           runtime.goexit
                                                                                   runtime/asm_amd64.s:1700
                                                                           HA exited with an error
                                                                           main.run
                                                                                   github.com/icinga/icingadb/cmd/icingadb/main.go:336
                                                                           main.main
                                                                                   github.com/icinga/icingadb/cmd/icingadb/main.go:37
                                                                           runtime.main
                                                                                   runtime/proc.go:272
                                                                           runtime.goexit
                                                                                   runtime/asm_amd64.s:1700
Mar 11 01:50:01 master1 systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Mar 11 01:50:01 master1 systemd[1]: icingadb.service: Failed with result 'exit-code'.
@oxzi
Copy link
Member

oxzi commented Mar 11, 2025

Thanks for posting this issue.

Could you please provide the complete Icinga DB log from program start to crash with extended systemd journald fields? Please use either --output verbose or --output json as described here, https://icinga.com/docs/icinga-db/latest/doc/03-Configuration/#systemd-journald-fields.

Furthermore, could you please post a redacted version of your Icinga DB configuration and tell us which SQL database server you are using, version included.

The logs are starting with the following line:

Mar 11 01:49:01 master1 icingadb[1112676]: heartbeat: Waiting for Icinga heartbeat

Is your Icinga 2 healthy? And how about your Redis?

@mburring
Copy link
Author

mburring commented Mar 11, 2025

Tue 2025-03-11 01:49:01.583741 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6ad;b=f52e39aa9f7542ed859a9e8f612e52c2;m=10197455a31;t=62f>
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    PRIORITY=4
    MESSAGE=heartbeat: Waiting for Icinga heartbeat
    _SOURCE_REALTIME_TIMESTAMP=1741618141583741
Tue 2025-03-11 01:49:20.584464 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6ae;b=f52e39aa9f7542ed859a9e8f612e52c2;m=101986747a8;t=62f>
    PRIORITY=6
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    MESSAGE=history-sync: Synced 5 notification history items
    _SOURCE_REALTIME_TIMESTAMP=1741618160584464
Tue 2025-03-11 01:49:20.584481 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6af;b=f52e39aa9f7542ed859a9e8f612e52c2;m=10198674b7e;t=62f>
    PRIORITY=6
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    MESSAGE=history-sync: Synced 36 state history items
    _SOURCE_REALTIME_TIMESTAMP=1741618160584481
Tue 2025-03-11 01:49:40.585094 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b0;b=f52e39aa9f7542ed859a9e8f612e52c2;m=10199987701;t=62f>
    PRIORITY=6
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    MESSAGE=history-sync: Synced 33 state history items
    _SOURCE_REALTIME_TIMESTAMP=1741618180585094
Tue 2025-03-11 01:49:40.585790 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b1;b=f52e39aa9f7542ed859a9e8f612e52c2;m=101999879ac;t=62f>
    PRIORITY=6
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    MESSAGE=history-sync: Synced 4 notification history items
    _SOURCE_REALTIME_TIMESTAMP=1741618180585790
Tue 2025-03-11 01:50:00.584382 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b7;b=f52e39aa9f7542ed859a9e8f612e52c2;m=1019ac9a142;t=62f>
    PRIORITY=6
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    MESSAGE=history-sync: Synced 4 notification history items
    _SOURCE_REALTIME_TIMESTAMP=1741618200584382
Tue 2025-03-11 01:50:00.584424 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b8;b=f52e39aa9f7542ed859a9e8f612e52c2;m=1019ac9a2d0;t=62f>
    PRIORITY=6
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    MESSAGE=history-sync: Synced 32 state history items
    _SOURCE_REALTIME_TIMESTAMP=1741618200584424
Tue 2025-03-11 01:50:01.583004 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b9;b=f52e39aa9f7542ed859a9e8f612e52c2;m=1019ad8de32;t=62f>
    _SELINUX_CONTEXT=unconfined
    _BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
    _MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
    _HOSTNAME=master1
    _TRANSPORT=journal
    _SYSTEMD_SLICE=system.slice
    _CAP_EFFECTIVE=0
    SYSLOG_IDENTIFIER=icingadb
    _PID=1112676
    _UID=116
    _GID=120
    _COMM=icingadb
    _EXE=/usr/sbin/icingadb
    _CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
    _SYSTEMD_CGROUP=/system.slice/icingadb.service
    _SYSTEMD_UNIT=icingadb.service
    _SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
    PRIORITY=2
    MESSAGE=retry deadline exceeded
            github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
                github.com/icinga/icingadb/pkg/icingadb/ha.go:166
            runtime.goexit
                runtime/asm_amd64.s:1700
            HA aborted
            github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1
                github.com/icinga/icingadb/pkg/icingadb/ha.go:134
            sync.(*Once).doSlow
                sync/once.go:76
            sync.(*Once).Do
                sync/once.go:67
            github.com/icinga/icingadb/pkg/icingadb.(*HA).abort
                github.com/icinga/icingadb/pkg/icingadb/ha.go:132
            github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
                github.com/icinga/icingadb/pkg/icingadb/ha.go:166
            runtime.goexit
                runtime/asm_amd64.s:1700
            HA exited with an error
            main.run
                github.com/icinga/icingadb/cmd/icingadb/main.go:336
            main.main
                github.com/icinga/icingadb/cmd/icingadb/main.go:37
            runtime.main
                runtime/proc.go:272
            runtime.goexit
                runtime/asm_amd64.s:1700
    _SOURCE_REALTIME_TIMESTAMP=1741618201583004
database:
  host: xxx
  port: 3306
  database: icingadb
  user: icingadb
  password: xxx
  tls: False
  ca: /usr/local/share/ca-certificates/xxx.crt
redis:
  host: localhost
  port: 6379
  password: xxx
  tls: true
  insecure: true 
logging:
  level: info
retention:
  history-days: 10
  sla-days: 10
  options:
    acknowledgement: 90
    comment: 365
    downtime: 90
    flapping: 10
    notification: 10
    state: 10

mariadb-server: 1:10.3.39-0ubuntu0.20.04.2

When this happens on both instances where icinga and redis are still running with nothing of note in their logs. The fix has been to restart the icingadb service and within a few days to a week this error will occur again. The 3 services are all running on the same host and the host itself is not loaded up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants