diff --git a/docs/.vuepress/config.js b/docs/.vuepress/config.js index 326535cfa..243907c32 100644 --- a/docs/.vuepress/config.js +++ b/docs/.vuepress/config.js @@ -208,11 +208,22 @@ module.exports = { '/how-to/nat-configuration', '/how-to/kubo-rpc-tls-auth', '/how-to/kubo-garbage-collection', - '/how-to/troubleshooting', + '/how-to/troubleshooting-kubo', '/how-to/webtransport', '/install/run-ipfs-inside-docker', ] }, + { + title: 'Troubleshooting', + sidebarDepth: 1, + collapsable: true, + children: [ + '/how-to/troubleshooting', + '/how-to/troubleshooting-kubo', + '/reference/diagnostic-tools', + '/how-to/nat-configuration', + ] + }, { title: 'Manage files', sidebarDepth: 1, @@ -273,7 +284,7 @@ module.exports = { collapsable: true, children: [ '/how-to/gateway-best-practices', - '/how-to/gateway-troubleshooting', + '/how-to/troubleshooting', ] }, { diff --git a/docs/.vuepress/redirects b/docs/.vuepress/redirects index e31980549..96bb819cb 100644 --- a/docs/.vuepress/redirects +++ b/docs/.vuepress/redirects @@ -45,6 +45,7 @@ /how-to/run-ipfs-inside-docker /install/run-ipfs-inside-docker /how-to/ipfs-updater /install/command-line /how-to/websites-on-ipfs/link-a-domain /how-to/websites-on-ipfs/custom-domains +/how-to/gateway-troubleshooting /how-to/troubleshooting /install/command-line-quick-start/ /how-to/command-line-quick-start /install/js-ipfs/ https://github.com/ipfs/helia/wiki /introduction/ /concepts diff --git a/docs/concepts/lifecycle.md b/docs/concepts/lifecycle.md index b091835e9..59a07c5df 100644 --- a/docs/concepts/lifecycle.md +++ b/docs/concepts/lifecycle.md @@ -22,7 +22,7 @@ For example, merkleizing a static web application into a UnixFS DAG looks like t ## 2. Providing -Once the input data has been merkleized and addressed by a CID, the node announces itself as a provider of the CID(s) to the IPFS network, thereby creating a public mapping between the CID and the node. This is typically known as **providing**, other names for this step are **publishing** and **advertising**. +Once the input data has been merkleized and addressed by a CID, the node announces itself as a provider of the CID(s) to the IPFS network, thereby creating a public mapping between the CID and the node. This is typically known as **providing**, other names for this step are **publishing** **advertising**, and **reproviding** to emphasize the continuous nature of the process in which a node advertises provider records. IPFS nodes announce CID(s) to either the [DHT](./dht.md) or the [IPNI](./ipni.md) — the two content routing systems supported by [IPFS Mainnet](./glossary.md#mainnet). diff --git a/docs/concepts/public-utilities.md b/docs/concepts/public-utilities.md index 6203ab304..a347bb37b 100644 --- a/docs/concepts/public-utilities.md +++ b/docs/concepts/public-utilities.md @@ -97,6 +97,15 @@ To increase resilience and implementation diversity, as of 2024, the IPFS Founda `/dnsaddr/va1.bootstrap.libp2p.io/p2p/12D3KooWKnDdG3iXw9eTFijk3EWSunZcFi54Zka4wmtqtt6rPxc8`. + +## IPFS Check + +[IPFS Check](https://check.ipfs.network) is a tool for debugging retrieval by CID. It works by routing CIDs the DHT and IPNI, and then probing retrieval from the providers for a given CID over both Bitswap and HTTP (depending on the provider's support). + +The IPFS Foundation provides a hosted version of IPFS Check as a public good, and is available at [check.ipfs.network](https://check.ipfs.network). + +The backend is open source at [`ipfs/ipfs-check`](https://github.com/ipfs/ipfs-check), and can be run self-hosted, ideally on a remote server in order to get an external perspective. + ## Frequently Asked Questions (FAQs) ### How is the ipfs.io gateway different from other gateways? diff --git a/docs/how-to/gateway-troubleshooting.md b/docs/how-to/gateway-troubleshooting.md deleted file mode 100644 index 1dd62b7d3..000000000 --- a/docs/how-to/gateway-troubleshooting.md +++ /dev/null @@ -1,144 +0,0 @@ ---- -title: Troubleshooting -description: Learn how to troubleshoot common issues with IPFS HTTP Gateways ---- - -# Troubleshooting HTTP Gateways - -IPFS HTTP Gateways are an HTTP-based service allowing browsers, tools and software to retrieve IPFS content with HTTP. When using HTTP Gateways, developers may need to troubleshoot issues like a: - -- CID not being retrievable via public IPFS gateways. -- CID being slow to load. - -This page summarizes the different ways to troubleshoot common issues. To learn more about the concepts behind IPFS gateways, including how they work, available providers, types and FAQs, see [IPFS Gateway](../concepts/ipfs-gateway.md). - -## General advice - -In general, slow retrieval or timeouts while fetching a CID from an IPFS gateway is typically related to one of the following: - -- The gateway itself. -- The provider of the CID might be unreachable or down. -- You (or the provider) are not providing your CIDs to the IPFS network via the DHT or the network indexer, so it is not discoverable. -- Network latency between the client and the gateway, or the gateway and the provider. - -::: -When troubleshooting IPFS gateways, ensure that you are familiar with [how gateways work](../concepts/ipfs-gateway.md), as this will make the process quicker and easier. -::: - -To further narrow down the root cause, use one of the following methods: - -- If you want an automated, browser based tool that does the majority of the diagnosing and debugging for you, use [pl-diagnose](#debug-with-pl-diagnose). -- If you are running an IPFS Kubo node, you can [manually debug using kubo and IPFS check](#debug-manually). - -## Debug with pl-diagnose - -The pl-diagnose tool is a browser-based software application that automates a large part of the process described in [Debug manually](#debug-manually). Specifically, this tool can help you answer these questions: - -- Is a given piece of content, identified with a with a certain CID available on the IPFS network, and which peers does the DHT list as hosts for that content? -- Which addresses are listed in the DHT for a given IPFS node? -- Is an IPFS node accessible by other peers? -- Is specific content available from an IPFS node? - -To use the tool, do the following: - -1. Navigate to the [application page](https://pl-diagnose.on.fleek.co/#/diagnose/access-content). -1. In the **Backend URL** field, enter the address of the node you are trying to check. -1. In the menu, select from one of the options depending on your specific needs: - - - **Is my content on the DHT?** - Given a CID on the node you are checking, determine if is listed in the DHT. - - **Is my peer in the DHT?** - Given a public network address of a node, determine if the node is listed in the DHT. - - **Is my node accessible by other peers?** - Given a public network address of a node, determine if that node is reachable by peers. - - **Is my node serving the content?** - Determine if the node is actually serving the content. - - -## Debug manually - -This procedure assumes that you have the latest version of kubo installed. To debug manually: - -1. Open up a terminal window. - -1. Using kubo, determine if any peers are advertising the `` you are requesting: - - ```shell - ipfs routing findprovs - ``` - - **If providers are found in DHT**, their Peer IDs are returned. Example output: - - ``` - 12D3KooWChhhfGdB9GJy1GbhghAAKCUR99oCymMEVS4eUcEy67nt - 12D3KooWJkNYFckQGPdBF57kVCLdkqZb1ZmZXAphe9MZkSh16UfP - QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC - 12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT - ``` - - In this case, complete the steps described in [Providers returned](#providers-returned). - - **If no providers were returned**, the cause of your problem might be content publishing. Complete the steps described in [No providers returned](#no-providers-returned). - -### Providers returned - -If providers were found in the DHT, do the following: - -1. In the terminal, retrieve the network addresses of one of the peers returned using its ``: - - ```shell - ipfs id -f '' - ``` - - Upon success, you'll see a list of addresses like: - - ``` - /ip4/145.40.90.155/tcp/4001/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT - /ip4/145.40.90.155/tcp/4002/ws/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT - ip6/2604:1380:45e1:2700::d/tcp/4001/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT - /ip6/2604:1380:45e1:2700::d/tcp/4002/ws/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT - ``` - -1. Note the returned addresses, as you'll use them in step 4. -1. Navigate to [IPFS Check](https://check.ipfs.network/). -1. Enter the following information: - - In the **CID** field, enter the `` you are requesting. - - In the **Multiaddr field**, enter one of the peer addresses noted in step 2. -1. Click **Run Test**. - - If the test is unsuccessful, complete the steps described in [No providers returned](#no-providers-returned). - -### No providers returned - -If no providers are returned, the issue may lie in the content publishing lifecycle, specifically _reprovider runs_, the continuous process in which a node advertises provider records. _Provider records_ are mappings of CIDs to network addresses, and have an expiration time of 48 hours, which accounts for provider churn. Generally speaking, as more files are added to an IPFS node, the longer reprovide runs take. When a reprovide run takes longer than 48 hours (the expiration time for provider records), CIDs will no longer be discoverable. - -::: -You can learn more about the content publishing lifecycle in [How IPFS works](../concepts/how-ipfs-works.md). -::: - -With this in mind, if no providers are returned, do the following: - -1. First, determine how long a reprovide run takes: - - ```shell - ipfs stats provide - ``` - - The output should look something like: - - ```shell - TotalProvides: 7k (7,401) - AvgProvideDuration: 271.271ms - LastReprovideDuration: 13m16.104781s - LastReprovideBatchSize: 1k (1,858) - ``` - -2. Note the value for `LastReprovideDuration`. If it is close to 48 hours, select one of the following options, keeping in mind that each has tradeoffs: - - - **Enable the [Accelerated DHT Client](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#accelerated-dht-client) in Kubo**. This configuration improves content publishing times significantly by maintaining more connections to peers and a larger routing table and batching advertising of provider records. However, this performance boost comes at the cost of increased resource consumption. - - - **Change the reprovider strategy from `all` to either `pinned` or `roots`.** In both cases, only provider records for explicitly pinned content are advertised. Differences and tradeoffs are noted below: - - The `pinned` strategy will advertise both the root CIDs and child block CIDs (the entire DAG) of explicitly pinned content. - - The `roots` strategy will only advertise the root CIDs of pinned content, reducing the total number of provides in each run. This strategy is the most efficient, but should be done with caution, as it will limit discoverability only to root CIDs. In other words, if you are adding folders of files to an IPFS node, only the CID for the pinned folder will be advertised. All the blocks will still be retrievable with Bitswap once a connection to the node is established. - -3. Manually trigger a reprovide run: - - ```shell - ipfs bitswap reprovide - ``` diff --git a/docs/how-to/images/helia-identify.gif b/docs/how-to/images/helia-identify.gif new file mode 100644 index 000000000..f4a85cb22 Binary files /dev/null and b/docs/how-to/images/helia-identify.gif differ diff --git a/docs/how-to/images/ipfs-check-cid-result-nat.jpg b/docs/how-to/images/ipfs-check-cid-result-nat.jpg new file mode 100644 index 000000000..6318a425a Binary files /dev/null and b/docs/how-to/images/ipfs-check-cid-result-nat.jpg differ diff --git a/docs/how-to/images/ipfs-check-cid-results.jpg b/docs/how-to/images/ipfs-check-cid-results.jpg new file mode 100644 index 000000000..2604b2af3 Binary files /dev/null and b/docs/how-to/images/ipfs-check-cid-results.jpg differ diff --git a/docs/how-to/images/ipfs-check-peer-result.jpg b/docs/how-to/images/ipfs-check-peer-result.jpg new file mode 100644 index 000000000..441835f45 Binary files /dev/null and b/docs/how-to/images/ipfs-check-peer-result.jpg differ diff --git a/docs/how-to/images/ipfs-check-peer-wss-maddr-result.jpg b/docs/how-to/images/ipfs-check-peer-wss-maddr-result.jpg new file mode 100644 index 000000000..7e2108885 Binary files /dev/null and b/docs/how-to/images/ipfs-check-peer-wss-maddr-result.jpg differ diff --git a/docs/how-to/troubleshooting-kubo.md b/docs/how-to/troubleshooting-kubo.md new file mode 100644 index 000000000..5a59397aa --- /dev/null +++ b/docs/how-to/troubleshooting-kubo.md @@ -0,0 +1,238 @@ +--- +title: Troubleshooting Kubo +description: "If you're running into problems with Kubo, use this page to debug your issues and find a solution quickly." +--- + +# Troubleshooting Kubo + +If you're running into problems providing or retrieving content with Kubo, use this page to debug your issues and find a solution quickly. + +:::tip +You can use [IPFS Check](https://check.ipfs.network) to help troubleshoot your Kubo node and get an external perspective on your Kubo node's network reachability. See the [Troubleshooting retrieval](./troubleshooting.md#debug-with-ipfs-check) page for more information. +::: + +## Check that your Kubo daemon is running + +If you're getting unexpected behavior when trying to run common commands such as `ipfs get ` returning `Error: merkledag: not found`, the issue is likely that your daemon isn't running. + +This can be remedied by running `ipfs daemon`, and using a different terminal to interact with the daemon. + +## Kubo is running slowly + +Commands like `ipfs ls` are going to the network to try and find data. If for some reason, that data is not _routble_ then Kubo will just keep looking for who has the data forever. Common reasons for data not being _routble_ are that: + +- There are no providers for the CID. +- There are providers for the CID, but they are not reachable over the network (due to NAT related issues, firewalls, etc.). +- The provider for the CID has not yet announced the data in a way that your node can find it. + +You can take a look at what's going on with Bitswap using `ipfs bitswap stat` to help you determine if you're stuck looking for data. If the data you are looking for is perpetually in the `wantlist` then your node may be experiencing one of the common reasons listed above. + +Some functions also have flags like `--stream` or `--progress` that can help you see incremental updates. For logging behavior, there is `ipfs log`, where `ipfs log level` can help you inspect subsystems further. + +You can pass a timeout flag to basically all Kubo commands if you're concerned about your CLI not responding quickly enough when the data just isn't available on the network. + +## File transfers + +To start, make sure that Kubo is running on both machines. To verify, run `ipfs id` on each machine and check if the `Addresses` field has anything in it. If it says `null`, then your node is not online, and you will need to run `ipfs daemon`. + +Now, let's call the node with the file you want to transfer node 'A' and the node you want to get the file to node 'B'. On `node a`, add the file to Kubo using the `ipfs add` command. This will print out the multihash of the content you added. Now, on `node b`, you can fetch the content using `ipfs get `. + +```shell +# On A +ipfs add myfile.txt +> added bafkreihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku myfile.txt + +# On B +ipfs get bafkreihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku +> Saving file(s) to bafkreihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku +> 13 B / 13 B [=====================================================] 100.00% 1s +``` + +If that worked and your node downloaded the file, then congratulations! You just used Kubo to move files across the internet! But, if that `ipfs get` command is hanging, with no output, continue reading. + +### Checking for existing connections + +The first thing to do is to double-check that both nodes are, in fact, running and online. To do this, run `ipfs id` on each machine. If both nodes show some addresses (like the example below), then your nodes are online. + +```json +{ + "ID": "12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "PublicKey": "CAESILFGFWHUCrCI/5gZbFejCt7X+ORxckMvKyMY6klvwPwm", + "Addresses": [ + "/ip4/127.0.0.1/tcp/4001/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/127.0.0.1/udp/4001/quic-v1/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiADD1J9gKOoRM-XvC9EYkbDCe97dwwjVNaheeQ4C1X8Iw/certhash/uEiA6LFi0_EAMHJUX9F9D8BmBiblrH0qrZNAWJqRmpa0rPw/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/127.0.0.1/udp/4001/webrtc-direct/certhash/uEiAFVMBmTvM0f0DWr_kmRgi_QKrWQfRoI8rel0JxOugIkg/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/79.193.32.60/tcp/51684/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/79.193.32.60/udp/51684/quic-v1/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/79.193.32.60/udp/51684/quic-v1/webtransport/certhash/uEiADD1J9gKOoRM-XvC9EYkbDCe97dwwjVNaheeQ4C1X8Iw/certhash/uEiA6LFi0_EAMHJUX9F9D8BmBiblrH0qrZNAWJqRmpa0rPw/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm", + "/ip4/79.193.32.60/udp/51684/webrtc-direct/certhash/uEiAFVMBmTvM0f0DWr_kmRgi_QKrWQfRoI8rel0JxOugIkg/p2p/12D3KooWMkNK8zgTQvtinDY8nuKmMAPBi3fBmvZj6W5huokJxekm" + ], + "AgentVersion": "kubo/0.35.0/Homebrew", + "Protocols": [ + "/ipfs/bitswap", + "/ipfs/bitswap/1.0.0", + "/ipfs/bitswap/1.1.0", + "/ipfs/bitswap/1.2.0", + "/ipfs/id/1.0.0", + "/ipfs/id/push/1.0.0", + "/ipfs/kad/1.0.0", + "/ipfs/lan/kad/1.0.0", + "/ipfs/ping/1.0.0", + "/libp2p/autonat/1.0.0", + "/libp2p/autonat/2/dial-back", + "/libp2p/autonat/2/dial-request", + "/libp2p/circuit/relay/0.2.0/hop", + "/libp2p/circuit/relay/0.2.0/stop", + "/libp2p/dcutr", + "/x/" + ] +} +``` + +Next, check to see if the nodes have a connection to each other. You can do this by running `ipfs swarm peers` on one node and checking for the other node's peer ID in the output. If the two nodes _are_ connected, and the `ipfs get` command is still hanging, then something unexpected is going on, and Kubo maintainers recommend filing an issue about it. If they are not connected, then let's try and debug why. (Note: you can skip to [Manually connecting `node a` to `node b`](#manually-connecting-node-a-to-node-b) if you just want things to work. However, going through the debugging process and reporting what happened to the Kubo maintainers is helpful to us to understand common pitfalls that people run into). + +### Checking for providers in the DHT and IPNI + +When requesting content with Kubo, nodes search the DHT and the IPNI for 'provider records' to see who has what content. To test this manually, use the `ipfs routing findprovs ` command on `node b` to make sure that `node b` is able to find `node a` as a provider for the content: + +```shell +ipfs routing findprovs +``` + +You should see the peer ID of `node a` printed out. + +If this command returns nothing (or returns IDs that are not `node a`), then no record of node `a` being a provider for the CID. This can happen if the data is added while `node a` does not have a daemon running. + +If this happens, you can the `ipfs routing provide ` command on `node a` to announce to the network that you have that CID: + +```shell +ipfs routing provide +``` + +Then try running the `ipfs get` command again, `node b` should now be able to find `node a` as a provider for the content. + +If `node a`'s peer ID showed up in the initial `findprovs` call or manually providing the hash didn't resolve the problem, then it's likely that `node b` is unable to make a connection to `node a`. + +### Checking multiaddrs + +In the case where `node b` simply cannot establish a connection to `node a`, despite knowing that it needs to, the likely culprit is a NAT related issue. When `node b` learns that it needs to connect to `node a`, it checks the DHT for addresses for `node a`, and then starts trying to connect to them. We can check those addresses by running `ipfs routing findpeer ` on `node b`. This command should return a list of addresses for `node a`. If it doesn't return any addresses, then you should try running the manual providing command from the previous steps. Example output of addresses might look something like this: + +```shell +/ip4/147.28.186.157/tcp/4001 +/ip6/2604:1380:4642:6600::3/udp/4001/quic-v1/webtransport/certhash/uEiBvH7itEeFeNCMSlB1H0uq8pfhd3_1UgFc9TCbdfMF9pA/certhash/uEiCCxQXfyMXHWRgcooayE0BaQwjKtBmiJ50EznK8zQtBxw +/ip4/147.28.186.157/udp/4001/webrtc-direct/certhash/uEiASP_-_GKr5tkR9sOeyWhG6GWoWpzhszTPLQBxMhiBrXw +/ip4/147.28.186.157/udp/4001/quic-v1 +/ip6/2604:1380:4642:6600::3/udp/4001/quic-v1 +/dns6/2604-1380-4642-6600--3.k51qzi5uqu5dj2m5y8ah51jqi9nl0f45fi3m6gtnwyj15k8vjlj541lgqpyq2k.libp2p.direct/tcp/4001/tls/ws +/ip6/2604:1380:4642:6600::3/udp/4001/webrtc-direct/certhash/uEiASP_-_GKr5tkR9sOeyWhG6GWoWpzhszTPLQBxMhiBrXw +/dns4/147-28-186-157.k51qzi5uqu5dj2m5y8ah51jqi9nl0f45fi3m6gtnwyj15k8vjlj541lgqpyq2k.libp2p.direct/tcp/4001/tls/ws +/ip6/2604:1380:4642:6600::3/tcp/4001 +/ip4/147.28.186.157/udp/4001/quic-v1/webtransport/certhash/uEiBvH7itEeFeNCMSlB1H0uq8pfhd3_1UgFc9TCbdfMF9pA/certhash/uEiCCxQXfyMXHWRgcooayE0BaQwjKtBmiJ50EznK8zQtBxw +``` + +In this case, we can see the following multiaddrs: IPv4, IPv6, and an AutoTLS DNS multiaddr, and support for TCP, QUIC, WebTransport, WebRTC-direct, and Secure WebSockets (with a TLS certificate). + +If one of the addresses in the matches your public IP, then the network knows a valid external address for your node. + +If you see a lot of multiaddrs, you can try to use the `ipfs swarm connect ` command to connect to `node a` from `node b`. This command will return a list of NAT traversal methods that your node supports. If your node supports UPnP or NAT-PMP, you can try to enable them on the router of `node a` and retry the process. Otherwise, you can try manually connecting `node a` to `node b`. + +### Checking connectivity with the identify protocol + +To check if your node can connect to `node a`, try running the `ipfs id ` command: with the peer ID of `node a` on `node b`: + +```shell +ipfs id +``` + +This command will resolve the PeerID to the multiaddrs of `node a`, connect to the node and run the identify protocol. + +If successful, you should see the peer ID of `node a` in the output, and the `Addresses` field should not be empty. + +To see the multiaddr used for the connection run: + +```shell +ipfs swarm peers -v | grep +``` + +## Go debugging + +When you see ipfs doing something (using lots of CPU, memory, or otherwise being weird), the first thing you want to do is gather all the relevant profiling information. + +There's a command (`ipfs diag profile`) that will do this for you and bundle the results up into a zip file, ready to be attached to a bug report. + +If you feel intrepid, you can dump this information and investigate it yourself: + +1. goroutine dump: + + ```shell + curl localhost:5001/debug/pprof/goroutine\?debug=2 > ipfs.stacks + ``` + +1. 30-second cpu profile: + + ```shell + curl localhost:5001/debug/pprof/profile > ipfs.cpuprof + ``` + +1. heap trace dump: + + ```shell + curl localhost:5001/debug/pprof/heap > ipfs.heap + ``` + +1. memory statistics. In JSON see `memstats` object: + + ```shell + curl localhost:5001/debug/vars > ipfs.vars + ``` + +1. System information: + + ```shell + ipfs diag sys > ipfs.sysinfo + ``` + +### Analyzing the stack dump + +The first thing to look for is hung goroutines - any goroutine that's been stuck for over a minute will note that in the trace. It looks something like: + +```shell +goroutine 2306090 [semacquire, 458 minutes]: +sync.runtime_Semacquire(0xc8222fd3e4) + /home/whyrusleeping/go/src/runtime/sema.go:47 +0x26 +sync.(*Mutex).Lock(0xc8222fd3e0) + /home/whyrusleeping/go/src/sync/mutex.go:83 +0x1c4 +gx/ipfs/QmedFDs1WHcv3bcknfo64dw4mT1112yptW1H65Y2Wc7KTV/yamux.(*Session).Close(0xc8222fd340, 0x0, 0x0) + /home/whyrusleeping/gopkg/src/gx/ipfs/QmedFDs1WHcv3bcknfo64dw4mT1112yptW1H65Y2Wc7KTV/yamux/session.go:205 +0x55 +gx/ipfs/QmWSJzRkCMJFHYUQZxKwPX8WA7XipaPtfiwMPARP51ymfn/go-stream-muxer/yamux.(*conn).Close(0xc8222fd340, 0x0, 0x0) + /home/whyrusleeping/gopkg/src/gx/ipfs/QmWSJzRkCMJFHYUQZxKwPX8WA7XipaPtfiwMPARP51ymfn/go-stream-muxer/yamux/yamux.go:39 +0x2d +gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream.(*Conn).Close(0xc8257a2000, 0x0, 0x0) + /home/whyrusleeping/gopkg/src/gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream/conn.go:156 +0x1f2 + created by gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream.(*Conn).GoClose + /home/whyrusleeping/gopkg/src/gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream/conn.go:131 +0xab +``` + +At the top, you can see that this goroutine (number 2306090) has been waiting to acquire a semaphore for 458 minutes. That seems bad. Looking at the rest of the trace, we see the exact line it's waiting on is line 47 of runtime/sema.go. That's not particularly helpful, so we move on. Next, we see that call was made by line 205 of yamux/session.go in the `Close` method of `yamux.Session`. This one appears to be the issue. + +Given that information, look for another goroutine that might be holding the semaphore in question in the rest of the stack dump. + +There are a few different reasons that goroutines can be hung: + +- `semacquire` means we're waiting to take a lock or semaphore. +- `select` means that the goroutine is hanging in a select statement, and none of the cases are yielding anything. +- `chan receive` and `chan send` are waiting for a channel to be received from or sent on, respectively. +- `IO wait` generally means that we are waiting on a socket to read or write data, although it *can* mean we are waiting on a very slow filesystem. + +If you see any of those tags _without_ a `, X minutes` suffix, that generally means there isn't a problem -- you just caught that goroutine in the middle of a short wait for something. If the wait time is over a few minutes, that either means that goroutine doesn't do much, or something is pretty wrong. + +If you see a lot of goroutines, consider using [stackparse](https://github.com/whyrusleeping/stackparse) to filter, sort, and summarize them. + +### Analyzing the CPU Profile + +The go team wrote an [excellent article on profiling go programs](http://blog.golang.org/profiling-go-programs). If you've already gathered the above information, you can skip down to where they start talking about `go tool pprof`. My go-to method of analyzing these is to run the `web` command, which generates an SVG dotgraph and opens it in your browser. This is the quickest way to easily point out where the hot spots in the code are. + +### Analyzing vars and memory statistics + +The output is JSON formatted and includes badger store statistics, the command line run, and the output from Go's [runtime.ReadMemStats](https://golang.org/pkg/runtime/#ReadMemStats). The [MemStats](https://golang.org/pkg/runtime/#MemStats) has useful information about memory allocation and garbage collection. + diff --git a/docs/how-to/troubleshooting.md b/docs/how-to/troubleshooting.md index 855e9afeb..6927c016c 100644 --- a/docs/how-to/troubleshooting.md +++ b/docs/how-to/troubleshooting.md @@ -1,173 +1,288 @@ --- -title: Troubleshooting -description: "If you're running into problems with Kubo, use this page to debug your issues and find a solution quickly." +title: Troubleshooting IPFS +description: Learn how to troubleshoot common issues with retrieval and providing in IPFS by identifying causes and failure modes with content routing, transfer protocols, and more. --- + -## Check that your Kubo daemon is running +# Troubleshooting IPFS -If you're getting unexpected behavior when trying to run common commands such as `ipfs get ` returning `Error: merkledag: not found`, the issue is likely that your daemon isn't running. This can be remedied by running `ipfs daemon`, and using a different terminal to interact with the daemon. +From a high level, troubleshooting IPFS typically comes down to finding the root cause of a problem in one of the following operations: -## Kubo is running slowly +- [**Retrieval**](#troubleshooting-retrieval) - Retrieving data by CID from other peers in the network. +- [**Providing**](#troubleshooting-providing) - Providing data to other peers in the network. -Commands like `ipfs ls` are going to the network to try and find data. If for some reason, that data is not _findable_ then Kubo will just keep looking for who has the data forever. Common reasons for data not being _findable_ are that: +In both cases, the failure modes can be attributed to the following: -- Nobody online has it. -- There is one node that has the data, but it's behind a NAT. -- The node that has it has not yet advertised the data in a way that your node can find it. +- **Content routing**: providers for a CID cannot be found in the DHT or the IPNI. +- **Network connectivity**: a connection to provider is not possible, either because the provider is not online, or because the provider is not reachable over the network. -You can take a look at what's going on with Bitswap using `ipfs bitswap stat` to help you determine if you're stuck looking for data. If the data you are looking for is perpetually in the `wantlist` then your node may be experiencing one of the common reasons listed above. +This guide outlines techniques to troubleshoot and identify the root cause of common issues with retrieval and providing. -Some functions also have flags like `--stream` or `--progress` that can help you see incremental updates. For logging behavior, there is `ipfs log`, where `ipfs log level` can help you inspect subsystems further. +For the purposes of this guide, we will use the following tools: +- [IPFS Check](https://check.ipfs.network) - A browser-based debugging tool that can help you identify the root cause of a problem with retrieval. +- [Kubo](https://github.com/ipfs/kubo) - A command-line debugging tool that can help you identify the root cause of a problem with retrieval. +- [Helia Identify tool](https://ipfs.fyi/identify) - A browser-based tool to run libp2p identify with a given peer id, testing whether the peer is dialable from a browser. +- [Public Delegated Routing Endpoint](../concepts/public-utilities.md#delegated-routing-endpoint) at `https://delegated-ipfs.dev/routing/v1` - which can be used to find providers for a CID. -You can pass a timeout flag to basically all Kubo commands if you're concerned about your CLI not responding quickly enough when the data just isn't available on the network. +## Troubleshooting retrieval -## File transfers +In this section, you will learn to troubleshoot common issues with retrieval. For a more detailed overview of the retrieval process, see [the lifecycle of data in IPFS](../concepts/lifecycle.md#3-retrieving). -To start, make sure that Kubo is running on both machines. To verify, run `ipfs id` on each machine and check if the `Addresses` field has anything in it. If it says `null`, then your node is not online, and you will need to run `ipfs daemon`. -Now, let's call the node with the file you want to transfer node 'A' and the node you want to get the file to node 'B'. On `node a`, add the file to Kubo using the `ipfs add` command. This will print out the multihash of the content you added. Now, on `node b`, you can fetch the content using `ipfs get `. +::: callout +If you are troubleshooting retrieval from a public recursive IPFS gateway, keep in mind that the gateway is just another IPFS node and an additional point of failure that you commonly have no insight into. This can make it harder to troubleshoot, because it's not clear whether the problem is with the gateway or the provider node. -```shell -# On A -ipfs add myfile.txt -> added QmZJ1xT1T9KYkHhgRhbv8D7mYrbemaXwYUkg7CeHdrk1Ye myfile.txt - -# On B -ipfs get QmZJ1xT1T9KYkHhgRhbv8D7mYrbemaXwYUkg7CeHdrk1Ye -> Saving file(s) to QmZJ1xT1T9KYkHhgRhbv8D7mYrbemaXwYUkg7CeHdrk1Ye -> 13 B / 13 B [=====================================================] 100.00% 1s -``` +We therefore recommended using Kubo or IPFS Check to troubleshoot retrieval, which give you direct insight into the retrievability of the data by CID. +::: -If that worked and your node downloaded the file, then congratulations! You just used Kubo to move files across the internet! But, if that `ipfs get` command is hanging, with no output, continue reading. - -### Checking for existing connections - -The first thing to do is to double-check that both nodes are, in fact, running and online. To do this, run `ipfs id` on each machine. If both nodes show some addresses (like the example below), then your nodes are online. - -```json -{ - "ID": "12D3KooWRaeAw2oromYUN5rAjYQ6KhqvXiWg8KuxeU9YWv7v3Ewa", - "PublicKey": "CAASp[...]P2nfUUIR3AgMBAAE=", - "Addresses": [ - "/ip4/127.0.0.1/tcp/4001/p2p/12D3KooWRaeAw2oromYUN5rAjYQ6KhqvXiWg8KuxeU9YWv7v3Ewa", - "/ip4/127.0.0.1/udp/4001/quic-v1/p2p/12D3KooWRaeAw2oromYUN5rAjYQ6KhqvXiWg8KuxeU9YWv7v3Ewa", - "/ip4/192.168.2.131/tcp/4001/p2p/12D3KooWRaeAw2oromYUN5rAjYQ6KhqvXiWg8KuxeU9YWv7v3Ewa", - "/ip4/192.168.2.131/udp/4001/quic-v1/p2p/12D3KooWRaeAw2oromYUN5rAjYQ6KhqvXiWg8KuxeU9YWv7v3Ewa" - ], - "AgentVersion": "kubo/0.29.0-dev/", - "ProtocolVersion": "ipfs/0.1.0" -} -``` +### What causes failure to retrieve data by CID? -Next, check to see if the nodes have a connection to each other. You can do this by running `ipfs swarm peers` on one node and checking for the other node's peer ID in the output. If the two nodes _are_ connected, and the `ipfs get` command is still hanging, then something unexpected is going on, and Kubo maintainers recommend filing an issue about it. If they are not connected, then let's try and debug why. (Note: you can skip to [Manually connecting `node a` to `node b`](#manually-connecting-node-a-to-node-b) if you just want things to work. However, going through the debugging process and reporting what happened to the Kubo maintainers is helpful to us to understand common pitfalls that people run into). +When failing to fetch the data for a given CID, there are main classes of errors that may be the reason for this: -### Checking providers +- Content routing: providers for the CID cannot be found in the DHT or the IPNI: + - Because there are no providers for the CID. + - Because the providers aren't announcing the CID to the DHT or IPNI + - Because there are problems with the [DHT](https://discuss.ipfs.tech/t/incident-report-increased-latency-on-the-amino-dht/17338) or the [IPNI](https://blog.ipfs.tech/newsletter-205/#ipni-service-update). +- Connectivity: + - The provider is offline or unreachable over the network due to NAT or firewall issues. + - The provider is not dialable from browsers: + - Because the provider doesn't have a public IP. + - Because the provider doesn't support browser transports like Secure WebSockets, WebTransport, or WebRTC. -When requesting content with Kubo, nodes search the DHT for 'provider records' to see who has what content. Let's manually do that on `node b` to make sure that `node b` is able to determine that `node a` has the data. Run `ipfs dht findprovs `. We expect to see the peer ID of `node a` printed out. If this command returns nothing (or returns IDs that are not `node a`), then no record of A having the data exists on the network. This can happen if the data is added while `node a` does not have a daemon running. If this happens, you can run `ipfs routing provide ` on `node a` to announce to the network that you have that hash. Then if you restart the `ipfs get` command, `node b` should now be able to tell that `node a` has the content it wants. If `node a`'s peer ID showed up in the initial `findprovs` call or manually providing the hash didn't resolve the problem, then it's likely that `node b` is unable to make a connection to `node a`. +In the next section, you will learn how to determine the root cause with IPFS Check. -### Checking addresses +### Troubleshooting retrieval with IPFS Check -In the case where `node b` simply cannot form a connection to `node a`, despite knowing that it needs to, the likely culprit is a bad NAT. When `node b` learns that it needs to connect to `node a`, it checks the DHT for addresses for `node a`, and then starts trying to connect to them. We can check those addresses by running `ipfs routing findpeer ` on `node b`. This command should return a list of addresses for `node a`. If it doesn't return any addresses, then you should try running the manual providing command from the previous steps. Example output of addresses might look something like this: +[IPFS Check](https://check.ipfs.network) is a web app that helps you troubleshoot retrieval by CID. -```shell -/ip4/127.0.0.1/tcp/4001 -/ip4/127.0.0.1/udp/4001/quic-v1 -/ip4/192.168.2.133/tcp/4001 -/ip4/192.168.2.133/udp/4001/quic-v1 -/ip4/88.157.217.196/tcp/63674 -/ip4/88.157.217.196/udp/63674/quic-v1 -``` +It helps you answer the following questions: + +1. How many providers for this CID could be found on IPFS Mainnet? +1. In which routing system was each of those providers found, the Amino DHT or the IPNI? +1. Is the data for the CID retrievable from the providers that are announcing it? +1. Is the data for the CID retrievable over Bitswap and/or HTTP? +1. What multiaddresses and network transports are used to connect to successful providers for a CID? +1. Was NAT hole punching necessary to retrieve the data? + +IPFS Check is comprised of a frontend interacting with a backend. The backend is a set of Go libraries that are used to query the DHT and IPNI, and to probe retrieval from the providers for a given CID. The frontend is a web app that allows you to interact with the backend and see the results. + +### IPFS Check modes of operation + +IPFS Check supports two modes of operation: + +1. **Multi-provider check**: you pass a CID and IPFS Check will search for providers both in the IPNI and the DHT, and return the retrievability results for multiple providers. +2. **Provider-specific check**: you pass a CID and a provider's multiaddr or peer id, with `/p2p/` prepended. + +### Provider-specific checks with IPFS Check + +1. Navigate to the [IPFS Check](https://check.ipfs.network/) tool. +2. In the **CID** field, enter the CID you are trying to check +3. In the **Multiaddr field**, enter the multiaddress (either Peer ID or full multiaddr) of the IPFS peer you are trying to check. +4. Click **Run Test**. + +The **Multiaddr** field can be either: +- Just the Peer ID, with `/p2p/` prepended, e.g. `/p2p/12D3KooWBgwLwbTX5YYgASx8sqv49WBhy9gzLCLFVCP9jshfVdC5`. IPFS Check will route the Peer ID to find the full multiaddr. +- The full multiaddr, e.g. `/ip4/1.1.1.1/tcp/4001/p2p/12D3KooWBgwLwbTX5YYgASx8sqv49WBhy9gzLCLFVCP9jshfVdC5`. + +For example, the output will look as follows, when doing a Peer ID specific check for a CID: + +![ipfs-check provider-specific check](images/ipfs-check-peer-result.jpg) + +Looking at the output, you can know the following: + +- The provider and the CID were routable via the DHT. +- The provider is online, and the data for the CID is retrievable over Bitswap. +- The provider was reachable over IPv6 with the QUIC transport, and also supports Secure WebSockets (the multiaddr with `dns4.../...libp2p.direct/tls/`, ), WebTransport, +- No NAT hole punching was necessary to retrieve the data, you can know this because there is a single connection multiaddr in the output, and it doesn't contain `p2p-circuit`. + +You can also test a specific multiaddr and transport combination, by entering the full multiaddr in the **Multiaddr** field. For example, this is what the output looks like when testing the Secure WebSockets multiaddr: + +![ipfs-check multiaddr check](images/ipfs-check-peer-wss-maddr-result.jpg) + +Since the Secure WebSockets multiaddr is also supported by all browsers, you can also test connectivity to the provider directly from a browser (rather than the IPFS Check backend like in this example) with the [Helia Identify tool](#debug-browser-connectivity-with-helia-identify). + +### Multi-provider checks with IPFS Check + +In this mode, IPFS Check will search for providers both in the IPNI and the DHT, and return the retrievability results for multiple providers. + +1. Navigate to the [IPFS Check](https://check.ipfs.network/) tool. +2. In the **CID** field, enter the CID you are trying to check +3. Click **Run Test**. -In this case, we can see a localhost (127.0.0.1) address, a LAN address (the 192.168._._ one), and another address. If this third address matches your external IP, then the network knows a valid external address for your node. At this point, it's safe to assume that your node has a difficult to traverse NAT situation. If this is the case, you can try to enable UPnP or NAT-PMP on the router of `node a` and retry the process. Otherwise, you can try manually connecting `node a` to `node b`. +The output will look as follows: -### Manually connecting `node a` to `node b` +![ipfs-check multi-provider check](images/ipfs-check-cid-results.jpg) -On `node b` run `ipfs id` and take one of the _multiaddrs_ that contains its public IP address, and then on `node a` run `ipfs swarm connect `. You can also try using a relayed connection. If that _still_ doesn't work, then you should either join IRC and ask for help there or file an issue on GitHub. +Looking at the output, you can know the following: -If this manual step _did_ work, then you likely have an issue with NAT traversal, and IPFS cannot figure out how to make it through. Please report situations like this to us so we can work on fixing them. +- There are 9 working providers for the CID. +- Some providers were found in the IPNI, some in the DHT. +- Some providers are providing the data with HTTP (the first result), and others with Bitswap over a libp2p QUIC connection (the second result). -## Go debugging +### Identifying NAT hole punching -When you see ipfs doing something (using lots of CPU, memory, or otherwise being weird), the first thing you want to do is gather all the relevant profiling information. +When using IPFS Check, you can identify whether NAT hole punching was necessary to connect to a provider, by looking at the connection multiaddrs in the output. If there are two connection multiaddrs, and one of them contains `p2p-circuit`, for example: -There's a command (`ipfs diag profile`) that will do this for you and bundle the results up into a zip file, ready to be attached to a bug report. +![ipfs-check multi-provider check with NAT hole punching](images/ipfs-check-cid-result-nat.jpg) -If you feel intrepid, you can dump this information and investigate it yourself: +This is because when a provider peer is behind NAT, it will aquire a circuit relay reservation as part of the [NAT hole punching process (DCUtR)](https://blog.ipfs.tech/2022-01-20-libp2p-hole-punching/). -1. goroutine dump: +If NAT traversal is necessary to connect to a provider, and you are also behind NAT, there's a chance that NAT hole punching will fail for you, because unlike the IPFS Check backend which has a public IP, allowing DCUtR to leverage dialback for direct connection, when two peers are behind NAT, they cannot dial back to each other, and require hole punching, which is not guaranteed to be successful. - ```shell - curl localhost:5001/debug/pprof/goroutine\?debug=2 > ipfs.stacks - ``` +### IPFS Check video guide -1. 30-second cpu profile: +The following video gives an overview of how to use IPFS Check and its different modes of operation. - ```shell - curl localhost:5001/debug/pprof/profile > ipfs.cpuprof - ``` +@[youtube](XeNOQDOrdC0) -1. heap trace dump: +## Debug browser connectivity with Helia Identify - ```shell - curl localhost:5001/debug/pprof/heap > ipfs.heap - ``` +[Helia Identify](https://ipfs.fyi/identify) is a browser-based tool to run libp2p identify with a given peer id, testing whether the peer is dialable from a browser. This is useful to test whether a provider is reachable from a browser, which is a common cause of browser-based retrieval failures. -1. memory statistics. In JSON see `memstats` object: +The following gif shows how to use Helia Identify to test whether a provider is reachable from a browser, by entering a Peer ID in the input field and clicking the **Identify** button. - ```shell - curl localhost:5001/debug/vars > ipfs.vars - ``` +![helia identify](images/helia-identify.gif) -1. System information: +## Troubleshooting with Kubo - ```shell - ipfs diag sys > ipfs.sysinfo - ``` +This procedure assumes that you have the latest version of kubo installed. To debug manually: -### Analyzing the stack dump +1. Open up a terminal window. -The first thing to look for is hung goroutines - any goroutine that's been stuck for over a minute will note that in the trace. It looks something like: +1. Using kubo, determine if any peers are advertising the `` you are requesting: + + ```shell + ipfs routing findprovs + ``` + + **If providers are found**, their Peer IDs are returned. Example output: + + ``` + 12D3KooWSvjCTS6w6f6nyJQ615p4ipiW3L7BTbt9XvpR6Kxi385m + 12D3KooWDCNa4MmDPHr3916gpk2PcQJbJXyKxfByTL6UBmSwBM2H + 12D3KooWDEYGGZAH4v1Hu75nqyF4vnN8UyfgCCwerTD98F1Z8Q1z + 12D3KooWHr9MZJVKwe7tZyD6Z8uRcZFQ7XUqhM2nQvpeQxDyAN4E + 12D3KooWGLyBGRMdNQe5KnkeT2g3QYp7uM71tpn77somfRHaWmmS + ``` + + In this case, complete the steps described in [Providers returned](#providers-returned). + + **If no providers were returned**, the cause of your problem might be content publishing. Complete the steps described in [No providers returned](#no-providers-returned). + +### Providers returned + +If providers were found, do the following: + +1. In the terminal, retrieve the network addresses of one of the peers returned using its ``: + + ```shell + ipfs id -f '' + ``` + + Upon success, you'll see a list of addresses like: + + ``` + /ip4/145.40.90.155/tcp/4001/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT + /ip4/145.40.90.155/tcp/4002/ws/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT + ip6/2604:1380:45e1:2700::d/tcp/4001/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT + /ip6/2604:1380:45e1:2700::d/tcp/4002/ws/p2p/12D3KooWSH5uLrYe7XSFpmnQj1NCsoiGeKSRCV7T5xijpX2Po2aT + ``` + +2. Note the returned addresses, as you'll use them in step 4. +3. Navigate to [IPFS Check](https://check.ipfs.network/). +4. Enter the following information: + - In the **CID** field, enter the `` you are requesting. + - In the **Multiaddr field**, enter one of the peer addresses noted in step 2. +5. Click **Run Test**. + +### No providers returned + +If no providers are returned, it could be due to one of the following reasons: + +- All providers for the CID are currently offline. +- There is a problem with the content routing system (either the DHT or IPNI). +- The provider is having trouble announcing the CID to the DHT or IPNI. +- The provider is not online. + +To get an additional confirmation that the CID is not being advertised, you can try the delegated routing endpoint at `https://delegated-ipfs.dev/routing/v1` with the CID. ```shell -goroutine 2306090 [semacquire, 458 minutes]: -sync.runtime_Semacquire(0xc8222fd3e4) - /home/whyrusleeping/go/src/runtime/sema.go:47 +0x26 -sync.(*Mutex).Lock(0xc8222fd3e0) - /home/whyrusleeping/go/src/sync/mutex.go:83 +0x1c4 -gx/ipfs/QmedFDs1WHcv3bcknfo64dw4mT1112yptW1H65Y2Wc7KTV/yamux.(*Session).Close(0xc8222fd340, 0x0, 0x0) - /home/whyrusleeping/gopkg/src/gx/ipfs/QmedFDs1WHcv3bcknfo64dw4mT1112yptW1H65Y2Wc7KTV/yamux/session.go:205 +0x55 -gx/ipfs/QmWSJzRkCMJFHYUQZxKwPX8WA7XipaPtfiwMPARP51ymfn/go-stream-muxer/yamux.(*conn).Close(0xc8222fd340, 0x0, 0x0) - /home/whyrusleeping/gopkg/src/gx/ipfs/QmWSJzRkCMJFHYUQZxKwPX8WA7XipaPtfiwMPARP51ymfn/go-stream-muxer/yamux/yamux.go:39 +0x2d -gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream.(*Conn).Close(0xc8257a2000, 0x0, 0x0) - /home/whyrusleeping/gopkg/src/gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream/conn.go:156 +0x1f2 - created by gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream.(*Conn).GoClose - /home/whyrusleeping/gopkg/src/gx/ipfs/QmZK81vcgMhpb2t7GNbozk7qzt6Rj4zFqitpvsWT9mduW8/go-peerstream/conn.go:131 +0xab +curl "https://delegated-ipfs.dev/routing/v1/providers/" ``` -At the top, you can see that this goroutine (number 2306090) has been waiting to acquire a semaphore for 458 minutes. That seems bad. Looking at the rest of the trace, we see the exact line it's waiting on is line 47 of runtime/sema.go. That's not particularly helpful, so we move on. Next, we see that call was made by line 205 of yamux/session.go in the `Close` method of `yamux.Session`. This one appears to be the issue. +If the CID is not being advertised, you will see an empty array in the response. + +If that happens, check the [IPNI website](https://cid.contact/) to rule out an issue with the IPNI. + +Broadly speaking, the Amino DHT is more resilient to outages, so it's less likely to be the cause of the issue. A more likely cause is that the provider is having trouble announcing the CID to the DHT. + +If you are the provider for the CID, see the next section on [troubleshooting providing](#troubleshooting-providing). + +If you are not the provider for the CID and you cannot find any providers for the CID there's not much more you can do. If you have a copy of the content or a `.car` file, you can provide it to the network by importing it into Kubo with `ipfs dag import .car`. + +## Troubleshooting providing + +In this section, you will learn to troubleshoot common issues with providing. For a more detailed overview of the providing process, see [the lifecycle of data in IPFS](../concepts/lifecycle.md#2-providing). + +If no providers are returned, the issue may lie in the content providing lifecycle, specifically _reprovider runs_, the continuous process in which a node advertises provider records. _Provider records_ are mappings of CIDs to network addresses, and have an expiration time of 48 hours, which accounts for provider churn. Generally speaking, as more files are added to an IPFS node, the longer reprovide runs take. When a reprovide run takes longer than 48 hours (the expiration time for provider records), CIDs will no longer be discoverable. -Given that information, look for another goroutine that might be holding the semaphore in question in the rest of the stack dump. +With this in mind, if no providers are returned, do the following: -There are a few different reasons that goroutines can be hung: +1. First, determine how long a reprovide run takes: -- `semacquire` means we're waiting to take a lock or semaphore. -- `select` means that the goroutine is hanging in a select statement, and none of the cases are yielding anything. -- `chan receive` and `chan send` are waiting for a channel to be received from or sent on, respectively. -- `IO wait` generally means that we are waiting on a socket to read or write data, although it *can* mean we are waiting on a very slow filesystem. + ```shell + ipfs stats provide + ``` -If you see any of those tags _without_ a `, X minutes` suffix, that generally means there isn't a problem -- you just caught that goroutine in the middle of a short wait for something. If the wait time is over a few minutes, that either means that goroutine doesn't do much, or something is pretty wrong. + The output should look something like: -If you see a lot of goroutines, consider using [stackparse](https://github.com/whyrusleeping/stackparse) to filter, sort, and summarize them. + ```shell + TotalProvides: 7k (7,401) + AvgProvideDuration: 271.271ms + LastReprovideDuration: 13m16.104781s + LastReprovideBatchSize: 1k (1,858) + ``` -### Analyzing the CPU Profile +2. Note the value for `LastReprovideDuration`. If it is close to 48 hours, select one of the following options, keeping in mind that each has tradeoffs: -The go team wrote an [excellent article on profiling go programs](http://blog.golang.org/profiling-go-programs). If you've already gathered the above information, you can skip down to where they start talking about `go tool pprof`. My go-to method of analyzing these is to run the `web` command, which generates an SVG dotgraph and opens it in your browser. This is the quickest way to easily point out where the hot spots in the code are. + - **Enable the [Accelerated DHT Client](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#accelerated-dht-client) in Kubo**. This configuration improves content providing times significantly by maintaining more connections to peers and a larger routing table and batching advertising of provider records. However, this performance boost comes at the cost of increased resource consumption, most notably network connections to other peers, and can lead to degraded network performance in home networks. -### Analyzing vars and memory statistics + - **Change the reprovider strategy from `all` to either `pinned` or `roots`.** In both cases, only provider records for explicitly pinned content are advertised. Differences and tradeoffs are noted below: + - The `pinned` strategy will advertise both the root CIDs and child block CIDs (the entire DAG) of explicitly pinned content. + - The `roots` strategy will only advertise the root CIDs of pinned content, reducing the total number of provides in each run. This strategy is the most efficient, but should be done with caution, as it will limit discoverability only to root CIDs. In other words, if you are adding folders of files to an IPFS node, only the CID for the pinned folder will be advertised. All the blocks will still be retrievable with Bitswap once a connection to the node is established. -The output is JSON formatted and includes badger store statistics, the command line run, and the output from Go's [runtime.ReadMemStats](https://golang.org/pkg/runtime/#ReadMemStats). The [MemStats](https://golang.org/pkg/runtime/#MemStats) has useful information about memory allocation and garbage collection. +3. Manually trigger a reprovide run: + ```shell + ipfs routing reprovide + ``` diff --git a/docs/reference/diagnostic-tools.md b/docs/reference/diagnostic-tools.md index aece5f559..9b19622e0 100644 --- a/docs/reference/diagnostic-tools.md +++ b/docs/reference/diagnostic-tools.md @@ -37,6 +37,10 @@ Each error type output by the tool can indicate a solution to your problem: Learn more about CID concepts, including components and versions in the [content addressing concepts guide](../concepts/content-addressing.md). ::: +## Helia Identify + +[Helia Identify](https://ipfs.fyi/identify) is a browser-based tool to run libp2p identify with Peer IDs / multiaddrs, testing whether an IPFS peer is Web friendly, i.e. whether it can be connected to from a browser. This is useful to test whether content can be directly retrieved from a provider node. + ## IPFS Gateway Checker :::warning