Public visor losing transports / can't establish stcpr transport to public visor #1936

0pcom · 2025-02-18T19:58:24Z

on starting visor 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb as a public visor,, it gets a lot of transports initially, over 200, but this number dwindles over time to less than a dozen, I've observed a few times

When I try to use a proxy server for a route that goes through said visor, the bandwidth seems low, but this may be unrelated or anomalous.

I'm unsure if this is an issue with the deployment or not as this visor is run inside the deployment.

0pcom · 2025-02-19T14:51:16Z

when I check back at the number of transports on that public visor, now, it's very few. Just over a dozen.

$ skywire cli tp
type      id                                       remote_pk                                                              mode        label
stcpr     0401a59c-a08a-03b3-b9f7-34e2c1325338     03e5869cb802f68bd89e2a86f89f906f8b50fd927686c3b3440526cbad41d9c137     regular     user
sudph     16b0d469-8541-0195-9ded-f7abe62a9111     0323272a60895f56aad82cb767fb5c413807adcf7c9fb0578b1b1c5807c7f29d4c     regular     user
stcpr     2bce5d29-cde7-0bc6-8a80-0136bab4972f     0286b3cd72cdbad25528bb510c8a5ce641fd5b5288e0d8ead73701f0140b4e3ac8     regular     user
stcpr     414e3607-86dd-04ba-b661-f3ef60a83559     03394d850efbf9d2e0ae2fa628a7b6d129827fc64f9d1b5992996f7aa329765864     regular     user
stcpr     81e5d4a9-3139-0583-9f22-f5a3ab58d089     02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927     regular     user
stcpr     9092b802-f531-083f-a099-f26d61f0fd80     02329b448b186b60e82f2e95ef22a421595b4c2eeaab0ddacde264b043a944f1b0     regular     user
stcpr     91ba789e-ad78-00e2-8e9f-864f9d3d5ba9     0338dad3035a7634887a65b955e4f7e3a5708ef4b9632f52e4c59b633a176ba0f8     regular     user
stcpr     9fcb8f04-fd69-0b0e-a799-f4f17df0d33b     02cbc914f8d4c5f26fa7ed0c26dcdaa68e4617ea7b7e056aeb1c5b7ded45cc2284     regular     user
stcpr     a8f81743-7c06-0ad0-a74d-d24ec2eea184     0324e2c91b43e33f4f17487eb99652ff8f792c0cd2981d8155a669a4322c8a324b     regular     user
stcpr     ab7e535d-4d42-0465-9b9f-15bce9826766     02d209c3217902a5e2f72a514d01dd61471a959d5baa1bfe904e412a39180df6db     regular     user
stcpr     b1c7cd75-99a3-0670-923c-0c21ea2cf6d0     0323e8c46f333951c2dc1583e19036a3ba888acad80b1d81010f163fc54341821c     regular     user
stcpr     cb0499bd-a223-06e6-835d-c003bf88020a     038ae6c75f916f6f13f72547b763915fcd31a519537c3de77ec5503ba934fb1342     regular     user
stcpr     cd619a6f-3329-0a73-b4ab-5d578c3fe747     02d900cb10a2636095c6912d541998987b78128bcfdeea9f5de85a9fce2c41321e     regular     user
stcpr     f62f3fbc-6ca2-0668-9b3e-58aaab989a17     0270f721716c8c7c2e29439f57b8da6f9127ce7aa9942416d3f6d6f0de5b230a98     regular     user

$ curl -sL https://tpd.skywire.skycoin.com/all-transports | jq '.'
[
  {
    "t_id": "f995afac-2813-0383-8860-5287dfb06904",
    "edges": [
      "0203e874d524e528d06567c90851f56c997bd3a2c7427ed9e53e4800f10fe91873",
      "0203e874d524e528d06567c90851f56c997bd3a2c7427ed9e53e4800f10fe91873"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "dcaff904-9ee2-0fdf-810d-7fdd0e9311f3",
    "edges": [
      "0211734b690e8eaf838191e77e53b7de0ea09118db99e9b5af51090381ff0a5449",
      "0211734b690e8eaf838191e77e53b7de0ea09118db99e9b5af51090381ff0a5449"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "8ea8cc46-0904-0146-ad3b-0c6772a5403a",
    "edges": [
      "022b8644f60eb0d211452cbc9af143a8c0b3a718d6790539d7f109cf5f328ba424",
      "022b8644f60eb0d211452cbc9af143a8c0b3a718d6790539d7f109cf5f328ba424"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "2b355a82-7dce-0a43-895a-98e1962bb2c2",
    "edges": [
      "02672fb67b9f67f9034c85ea28a130a0f2109f6097d8978f0a1a64884ad8d655e0",
      "02672fb67b9f67f9034c85ea28a130a0f2109f6097d8978f0a1a64884ad8d655e0"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "a1c8354c-b515-0764-be89-219af30fbaac",
    "edges": [
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927",
      "0377054ab803c1bc6d42ea2cbff6d6bf702b3d7e90228aa6f02c19854d0820766f"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "afa5366d-8a1d-0868-8d01-be66b24c7cfe",
    "edges": [
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927",
      "02cb204158b96db162613e06f7bd88b2e4a0e628bc429863df69e4ac3e0bdaffaa"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "12d556f8-358b-0412-be0e-2d035a38b99b",
    "edges": [
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927",
      "0289a464f485ce9036f6267db10e5b6eaabd3972a25a7c2387f92b187d313aaf5e"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "b1c7cd75-99a3-0670-923c-0c21ea2cf6d0",
    "edges": [
      "0323e8c46f333951c2dc1583e19036a3ba888acad80b1d81010f163fc54341821c",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "16b0d469-8541-0195-9ded-f7abe62a9111",
    "edges": [
      "0323272a60895f56aad82cb767fb5c413807adcf7c9fb0578b1b1c5807c7f29d4c",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "sudph",
    "label": "user"
  },
  {
    "t_id": "4a042530-0ecf-0c74-9e42-b3b3900121a8",
    "edges": [
      "03b00148dcf6849720bac935acc231f3a944fd409658b96210ef0ead00554120c1",
      "03b00148dcf6849720bac935acc231f3a944fd409658b96210ef0ead00554120c1"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "6de77841-4e89-0f25-893f-b74e28c95760",
    "edges": [
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927",
      "03b9568d7b0c99c0f2b474001e3adeb78d4613ae739a0ca0e3b524e4871f651bce"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "a060f46a-8f43-0e73-b61c-3e8fe9145a63",
    "edges": [
      "0265dd4fbe5693c462fcb88108b9b5b20b3a3f03a1890cd04c4f850992f6ca867a",
      "0265dd4fbe5693c462fcb88108b9b5b20b3a3f03a1890cd04c4f850992f6ca867a"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "a76eb77f-d816-0b08-b670-b03671360b25",
    "edges": [
      "03a6e79ec9d7860d699964c1facf1a89b9d0440f0338a588b7e3e253e586f1aaa1",
      "03a6e79ec9d7860d699964c1facf1a89b9d0440f0338a588b7e3e253e586f1aaa1"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "cb0499bd-a223-06e6-835d-c003bf88020a",
    "edges": [
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb",
      "038ae6c75f916f6f13f72547b763915fcd31a519537c3de77ec5503ba934fb1342"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "d6f0ab7e-e657-04de-91b3-f1cb02740964",
    "edges": [
      "026b7c22823555888f27acd179d4973252eb862e1e04b7a0670005a927aef2afb6",
      "026b7c22823555888f27acd179d4973252eb862e1e04b7a0670005a927aef2afb6"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "dde2941c-dfad-0f8e-94ed-c07e8c63b89c",
    "edges": [
      "03ca7e9083426003c27d58f0eed6e6870335ad4319d1bfad2afd3c329d3283d060",
      "03ca7e9083426003c27d58f0eed6e6870335ad4319d1bfad2afd3c329d3283d060"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "87565488-cf11-0561-a92f-da2f93988a58",
    "edges": [
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927",
      "025a8670166126f19e4c439d8d43b2450a9f00ab9b6458b2511fb35b8b330743f4"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "81e5d4a9-3139-0583-9f22-f5a3ab58d089",
    "edges": [
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "09ba0a55-b041-0594-9f7c-f8a2f31ccab3",
    "edges": [
      "02389221a2b4a832b919fd5ee6d1c6fc10a8403b25c89adf92517f1e5c8b993907",
      "02397fdf656f4b96dda76a5c729ac4159043ea917afbe78905588b12a322d00927"
    ],
    "type": "dmsg",
    "label": "user"
  },
  {
    "t_id": "9092b802-f531-083f-a099-f26d61f0fd80",
    "edges": [
      "02329b448b186b60e82f2e95ef22a421595b4c2eeaab0ddacde264b043a944f1b0",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "cd619a6f-3329-0a73-b4ab-5d578c3fe747",
    "edges": [
      "02d900cb10a2636095c6912d541998987b78128bcfdeea9f5de85a9fce2c41321e",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "9fcb8f04-fd69-0b0e-a799-f4f17df0d33b",
    "edges": [
      "02cbc914f8d4c5f26fa7ed0c26dcdaa68e4617ea7b7e056aeb1c5b7ded45cc2284",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "91ba789e-ad78-00e2-8e9f-864f9d3d5ba9",
    "edges": [
      "0338dad3035a7634887a65b955e4f7e3a5708ef4b9632f52e4c59b633a176ba0f8",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "2bce5d29-cde7-0bc6-8a80-0136bab4972f",
    "edges": [
      "0286b3cd72cdbad25528bb510c8a5ce641fd5b5288e0d8ead73701f0140b4e3ac8",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "a8f81743-7c06-0ad0-a74d-d24ec2eea184",
    "edges": [
      "0324e2c91b43e33f4f17487eb99652ff8f792c0cd2981d8155a669a4322c8a324b",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "0401a59c-a08a-03b3-b9f7-34e2c1325338",
    "edges": [
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb",
      "03e5869cb802f68bd89e2a86f89f906f8b50fd927686c3b3440526cbad41d9c137"
    ],
    "type": "stcpr",
    "label": "user"
  },
  {
    "t_id": "f62f3fbc-6ca2-0668-9b3e-58aaab989a17",
    "edges": [
      "0270f721716c8c7c2e29439f57b8da6f9127ce7aa9942416d3f6d6f0de5b230a98",
      "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb"
    ],
    "type": "stcpr",
    "label": "user"
  }
]

This is certainly not the desired behavior. There is no reason for such a significant reduction in the number of transports over time for public visors. If the transport fails, remote visors should attempt to autoconnect to that visor again

when i check the service discovery, the public visor is still registered there

https://sd.skycoin.com/api/services?type=visor.

0pcom · 2025-02-19T15:12:51Z

I further tested setting a persistent transport to the public visor - instead of using the public autoconnect or manually attempting transport creation.

  "persistent_transports": [
    {
      "pk": "03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb",
      "type": "stcpr"
    }
  ]

My local visor was unable to create a stcpr transport to the public visor.

[2025-02-19T09:06:30.897050521-06:00] DEBUG [transport_manager]: Reconnecting to persistent transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb, type stcpr
[2025-02-19T09:06:30.897147893-06:00] DEBUG [transport_manager]: Initializing TP with ID b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:06:30.897309617-06:00] DEBUG [transport_manager]: Dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr
[2025-02-19T09:06:30.897400519-06:00] DEBUG [stcpr]: Dialing PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:06:31.165370017-06:00] DEBUG [stcpr]: Resolved PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb to visor data {143.42.59.213 false {45035 [127.0.0.1 143.42.59.213 ::1 2a01:7e01::f03c:93ff:fe99:8bd6]}}
[2025-02-19T09:06:33.89830398-06:00] DEBUG [transport_manager]: Error dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr: mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout
[2025-02-19T09:06:34.165658409-06:00] DEBUG [tp:037734]: Error deleting transport error="404 Not Found: {"error":"transport not found"}" tp-id=b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:06:34.165753935-06:00] WARN [transport_manager]: Cannot connect to persistent remote error="mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout" network_type=stcpr remote_pk=03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:06:44.166750309-06:00] DEBUG [transport_manager]: Reconnecting to persistent transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb, type stcpr
[2025-02-19T09:06:44.166868411-06:00] DEBUG [transport_manager]: Initializing TP with ID b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:06:44.167035642-06:00] DEBUG [transport_manager]: Dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr
[2025-02-19T09:06:44.167112689-06:00] DEBUG [stcpr]: Dialing PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:06:44.483727478-06:00] DEBUG [stcpr]: Resolved PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb to visor data {143.42.59.213 false {45035 [127.0.0.1 143.42.59.213 ::1 2a01:7e01::f03c:93ff:fe99:8bd6]}}
[2025-02-19T09:06:47.167484572-06:00] DEBUG [transport_manager]: Error dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr: mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout
[2025-02-19T09:06:47.466123102-06:00] DEBUG [tp:037734]: Error deleting transport error="404 Not Found: {"error":"transport not found"}" tp-id=b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:06:47.466214174-06:00] WARN [transport_manager]: Cannot connect to persistent remote error="mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout" network_type=stcpr remote_pk=03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:06:57.466851651-06:00] DEBUG [transport_manager]: Reconnecting to persistent transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb, type stcpr
[2025-02-19T09:06:57.466949703-06:00] DEBUG [transport_manager]: Initializing TP with ID b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:06:57.467139023-06:00] DEBUG [transport_manager]: Dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr
[2025-02-19T09:06:57.467216995-06:00] DEBUG [stcpr]: Dialing PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:06:57.729419547-06:00] DEBUG [stcpr]: Resolved PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb to visor data {143.42.59.213 false {45035 [127.0.0.1 143.42.59.213 ::1 2a01:7e01::f03c:93ff:fe99:8bd6]}}
[2025-02-19T09:07:00.467270707-06:00] DEBUG [transport_manager]: Error dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr: mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout
[2025-02-19T09:07:00.745370646-06:00] DEBUG [tp:037734]: Error deleting transport error="404 Not Found: {"error":"transport not found"}" tp-id=b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:07:00.745473172-06:00] WARN [transport_manager]: Cannot connect to persistent remote error="mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout" network_type=stcpr remote_pk=03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:07:10.745848821-06:00] DEBUG [transport_manager]: Reconnecting to persistent transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb, type stcpr
[2025-02-19T09:07:10.745936434-06:00] DEBUG [transport_manager]: Initializing TP with ID b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:07:10.746135211-06:00] DEBUG [transport_manager]: Dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr
[2025-02-19T09:07:10.746203094-06:00] DEBUG [stcpr]: Dialing PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb
[2025-02-19T09:07:11.028243881-06:00] DEBUG [stcpr]: Resolved PK 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb to visor data {143.42.59.213 false {45035 [127.0.0.1 143.42.59.213 ::1 2a01:7e01::f03c:93ff:fe99:8bd6]}}
[2025-02-19T09:07:13.746863115-06:00] DEBUG [transport_manager]: Error dialing transport to 03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb via stcpr: mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout
[2025-02-19T09:07:14.012737231-06:00] DEBUG [tp:037734]: Error deleting transport error="404 Not Found: {"error":"transport not found"}" tp-id=b1a07e3e-a24a-09c8-8845-e0fa6081e39b
[2025-02-19T09:07:14.012781146-06:00] WARN [transport_manager]: Cannot connect to persistent remote error="mt.client.Dial: dial tcp 143.42.59.213:45035: i/o timeout" network_type=stcpr remote_pk=03773464102a7fcb9021962a4a6cf3c6a23739a8720e8b43280b1fd078cfbd6bfb

When I attempt to create a sudph transport from my local visor to the public visor, it shows success but when I attempt to use that transport in a route to access the proxy server, I'm never able to connect to anything via the proxy server connection!

When I restart the visor as not a public visor and try again, I'm able to access the proxy server again immediately without issues or errors.

0pcom · 2025-02-19T16:14:44Z

related issue

#1803

@mrpalide Is there any way we can devise a test of transport creation / re-creation with different transport types outside the visor context?

mrpalide · 2025-03-05T12:36:48Z

I test this issue and find ALL transports that dropped from public visor be offline in UT!
I think related to transportable logic and restarting visor process.
I should do more test on it for make sure about it.

Anyway, we should check that why never make transport again to public visor after dropping transport!

0pcom · 2025-03-05T16:10:56Z

Anyway, we should check that why never make transport again to public visor after dropping transport!

They don't make one because the public visor stops being able to accept them, I'm telling you.

If the remote visor went offline and came back it should still be attempting to create a transport to the public visor again - and I think it is attempting to do so, but it's not able to because the public visor stops being able to accept transports.

If you can solve the issue of the visor not accepting transports then of course we can do away with thee shutdown logic on this. But ideally the shutdown logic just never engages because the visor remains transportable

0pcom · 2025-03-06T16:37:55Z

I'm updating this ticket with comments you sent me on telegram @mrpalide

Client/Visor:
in pkg/servicedisc/autoconnect.go line 86 we have a logic that visor only try 3 times (hardcoded value as maxFailedAddressRetryAttempt) for connecting to public visor and if not then consider that public visor key as blocked and never try again til restart.

We should consider resetting this limit frequently, at least for public visors. Perhaps every 10 minutes. And we should space out connection attempts to public visors so that they only happen once per minute or something. In fact, I think even this logic may still need further refinement or some rate limiting mechanism.

I propose the following:

Wait a random interval between 0 and 60 seconds before attempting to connect to the public visor - after seeing that it exists - for the first try. Then back off with another random interval between 0 and 3 minutes before the next connection attempt. Then, finally back off with another random interval between 0 and 5 minutes before the next connection attempt. After three failed attempts, don't attempt again until the limit has been reset - ~10 minutes after the first connection attempt.
limit service discovery queries to happen every 5 minutes, and cache the responses. To avoid rate limiting there. You can use the existing cache file locations in /tmp or just use in-memory cache.

Public-Visor/Server:

After dropping transport (by any reason), it still is in network as CLOSE-WAIT record. This will be affected in server of public visor and make its resources. We close connection in code, but CLOSE-WAIT remains there. I should make deep [dive] on this.

Seem we have concurrent connection limits, because of EOF errors that could be related to public-visor sever hardware, or our code. Needs more investigation.
If related to Hardware, we should improve some values like net.core.somaxconn or net.ipv4.tcp_max_syn_backlog. I test it but no changes yet observed.

We should optimize both code and net.Conn

Also we should make sure that after dropping transport, request should discard and ensure that no data is left unprocessed

For example I check it in my server after 30 minutes and I get this output:

ubuntu@de-server:~$ ss -nat | grep 42553 | grep CLOSE-WAIT | wc -l  && ss -nat | grep 42553 | grep EST |  wc -l
4422
56

4422 CLOSE-WAIT record!

Good ideas on how to handle these issues.

I believe that should resolve more than just this issue. It should resolve the non-transportability issue.

Once that is resolved, we can probably remove the transport-ability checker. or else make it to only happen with a flag.

0pcom · 2025-03-07T02:25:51Z

@mrpalide you actually filed a ticket on this behavior earlier

#1246

0pcom added the bug Something isn't working label Feb 18, 2025

0pcom changed the title ~~Public visor losing transports~~ Public visor losing transports / can't establish stcpr transport to public visor Feb 19, 2025

0pcom added urgent This issue should be done with the highest priority breaking issue breaks critical functionality multi-hop public visor transports labels Feb 19, 2025

0pcom pinned this issue Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public visor losing transports / can't establish stcpr transport to public visor #1936

Public visor losing transports / can't establish stcpr transport to public visor #1936

0pcom commented Feb 18, 2025 •

edited

Loading

0pcom commented Feb 19, 2025 •

edited

Loading

0pcom commented Feb 19, 2025

0pcom commented Feb 19, 2025 •

edited

Loading

mrpalide commented Mar 5, 2025

0pcom commented Mar 5, 2025 •

edited

Loading

0pcom commented Mar 6, 2025 •

edited

Loading

0pcom commented Mar 7, 2025

Public visor losing transports / can't establish stcpr transport to public visor #1936

Public visor losing transports / can't establish stcpr transport to public visor #1936

Comments

0pcom commented Feb 18, 2025 • edited Loading

0pcom commented Feb 19, 2025 • edited Loading

0pcom commented Feb 19, 2025

0pcom commented Feb 19, 2025 • edited Loading

mrpalide commented Mar 5, 2025

0pcom commented Mar 5, 2025 • edited Loading

0pcom commented Mar 6, 2025 • edited Loading

0pcom commented Mar 7, 2025

0pcom commented Feb 18, 2025 •

edited

Loading

0pcom commented Feb 19, 2025 •

edited

Loading

0pcom commented Feb 19, 2025 •

edited

Loading

0pcom commented Mar 5, 2025 •

edited

Loading

0pcom commented Mar 6, 2025 •

edited

Loading