Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command stuck without any return when device not returning command result #369

Closed
sc108-lee opened this issue Jun 3, 2022 · 5 comments
Closed

Comments

@sc108-lee
Copy link
Contributor

Currently I don't have any environment that reproduce this.
When I am able to reproduce this, will do more analysis

  1. Device stop (no return)
  2. nvme show-regs ( skip nvme_get_property )
  3. blocking "nvme_scan" ( it's issuing identify command internally)

it was blocking "nvme_scan" not nvme_get_property.
I just remembered, I blocked nvme_get_property block to not issue command when that problem happens

I tried read with old version of nvme-cli that just read with mapping registers without nvme_scan & nvme_get_property, that works fine.

Originally posted by @sc108-lee in linux-nvme/nvme-cli#1541 (comment)

@ikegami-t
Copy link
Contributor

ikegami-t commented Jun 23, 2023

  1. Device stop (no return)

Seems usually the commands stuck caused if the device not responsed and the driver retunrs an error after the timeout 60 seconds or any. Is this issue behavior mentioned same or similar with the other commands?

  1. nvme show-regs ( skip nvme_get_property )
  2. blocking "nvme_scan" ( it's issuing identify command internally)

The command is implemented as below and nvme_scan executed at first then called nvme_get_properties. But seems different order mentioned by this issue. Why or how is the nvme_get_property skipped? By the way not found the implementation to issue the identify command in nvme_scan as far as I checked.

static int show_registers(int argc, char **argv, struct command *cmd, struct plugin *plugin)
{
...
	r = nvme_scan(NULL);
...
	bar = mmap_registers(r, dev);
	if (!bar) {
		err = nvme_get_properties(dev_fd(dev), &bar);
...

(Add)
The issue created for the PR linux-nvme/nvme-cli#1541 so the issue behavior depended on the PR changes or the issue created on June 3 2022 so the implementation is possible to be changed after that so is the issue resolved for now actually?

@sc108-lee
Copy link
Contributor Author

sc108-lee commented Jun 27, 2023

As far as I remember 1 & 3 was main problem to read register.
register should be readable, even if device does not returning identify.

  1. It's unusual status of device (hard to reproduce), timeout not occured just hang.

  2. it looks mmap_registers first and when it fails, nvme_get_properties on latest master so I think no need to care about it.

  3. I do not follow all changes, but when I try to check
    It still issue identify command while read register via show-regs
    see below nvme trace.

from nvme_scan
[root@localhost nvme-cli]# cat /sys/kernel/debug/tracing/trace
nvme-22758 [007] ..... 5530.767779: nvme_setup_cmd: nvme0: qid=0, cmdid=24588, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
-0 [000] d.h1. 5530.768280: nvme_sq: nvme0: qid=0, head=12, tail=12
-0 [000] ..s1. 5530.768288: nvme_complete_rq: nvme0: qid=0, cmdid=24588, res=0x0, retries=0, flags=0x2, status=0x0
nvme-22758 [007] ..... 5530.768317: nvme_setup_cmd: nvme0: qid=0, cmdid=24589, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=3, ctrlid=0)
-0 [000] d.h1. 5530.768380: nvme_sq: nvme0: qid=0, head=13, tail=13
-0 [000] ..s1. 5530.768381: nvme_complete_rq: nvme0: qid=0, cmdid=24589, res=0x0, retries=0, flags=0x2, status=0x2002

Below is issue nvme show-regs

master (a1d59599178a962241380b6cd62eeefd5a316816)
[root@localhost nvme-cli]# cat /sys/kernel/debug/tracing/trace
nvme-21024 [003] ..... 4662.243644: nvme_setup_cmd: nvme0: qid=0, cmdid=23, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
-0 [007] d.h1. 4662.244135: nvme_sq: nvme0: qid=0, head=7, tail=7
-0 [007] ..s1. 4662.244143: nvme_complete_rq: nvme0: qid=0, cmdid=23, res=0x0, retries=0, flags=0x2, status=0x0
nvme-21024 [003] ..... 4662.244172: nvme_setup_cmd: nvme0: qid=0, cmdid=4116, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=3, ctrlid=0)
-0 [007] d.h1. 4662.244235: nvme_sq: nvme0: qid=0, head=8, tail=8
-0 [007] ..s1. 4662.244236: nvme_complete_rq: nvme0: qid=0, cmdid=4116, res=0x0, retries=0, flags=0x2, status=0x2002
nvme-21024 [003] ..... 4662.244345: nvme_setup_cmd: nvme0: qid=0, cmdid=4117, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
-0 [007] d.h1. 4662.244816: nvme_sq: nvme0: qid=0, head=9, tail=9
-0 [007] ..s1. 4662.244817: nvme_complete_rq: nvme0: qid=0, cmdid=4117, res=0x0, retries=0, flags=0x2, status=0x0
nvme-21024 [003] ..... 4662.244821: nvme_setup_cmd: nvme0: qid=0, cmdid=4118, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=3, ctrlid=0)
-0 [007] d.h1. 4662.244881: nvme_sq: nvme0: qid=0, head=10, tail=10
-0 [007] ..s1. 4662.244882: nvme_complete_rq: nvme0: qid=0, cmdid=4118, res=0x0, retries=0, flags=0x2, status=0x2002

OLD version (v2.1-rc0) issues cmd
[root@localhost nvme-cli]# cat /sys/kernel/debug/tracing/trace
nvme-16547 [000] ..... 3915.439690: nvme_setup_cmd: nvme0: qid=0, cmdid=0, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
-0 [002] d.h1. 3915.440197: nvme_sq: nvme0: qid=0, head=30, tail=30
-0 [002] ..s1. 3915.440204: nvme_complete_rq: nvme0: qid=0, cmdid=0, res=0x0, retries=0, flags=0x2, status=0x0
nvme-16547 [000] ..... 3915.440233: nvme_setup_cmd: nvme0: qid=0, cmdid=1, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=3, ctrlid=0)
-0 [002] d.h1. 3915.440307: nvme_sq: nvme0: qid=0, head=31, tail=31
-0 [002] ..s1. 3915.440313: nvme_complete_rq: nvme0: qid=0, cmdid=1, res=0x0, retries=0, flags=0x2, status=0x2002
nvme-16547 [000] ..... 3915.440480: nvme_setup_cmd: nvme0: qid=0, cmdid=2, nsid=4, flags=0x0, meta=0x0, cmd=(nvme_fabrics_type_property_get attrib=1, ofst=0x0)
-0 [002] d.h1. 3915.442450: nvme_sq: nvme0: qid=0, head=0, tail=0
-0 [002] ..s1. 3915.442454: nvme_complete_rq: nvme0: qid=0, cmdid=2, res=0x0, retries=0, flags=0x2, status=0x1
nvme-16547 [000] ..... 3915.442991: nvme_setup_cmd: nvme0: qid=0, cmdid=3, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
-0 [002] d.h1. 3915.443483: nvme_sq: nvme0: qid=0, head=1, tail=1
-0 [002] ..s1. 3915.443490: nvme_complete_rq: nvme0: qid=0, cmdid=3, res=0x0, retries=0, flags=0x2, status=0x0
nvme-16547 [000] ..... 3915.443524: nvme_setup_cmd: nvme0: qid=0, cmdid=4096, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=3, ctrlid=0)
-0 [002] d.h1. 3915.443602: nvme_sq: nvme0: qid=0, head=2, tail=2
-0 [002] ..s1. 3915.443608: nvme_complete_rq: nvme0: qid=0, cmdid=4096, res=0x0, retries=0, flags=0x2, status=0x2002

@ikegami-t
Copy link
Contributor

Thanks for sharing the issue behavior information.
Seems the identify commands repeatedly issued were caused by an infinite loop or any software unexpected behavior, etc. in kernel driver but not caused by the nvme-cli nvme_scan and show-regs command.

@igaw
Copy link
Collaborator

igaw commented Dec 8, 2023

When #754 lands, we should get rid of any commands being posted via the scanning of the topology.

As discussed in linux-nvme/nvme-cli#2048 next step is to try to cut down on the dependency even more.

@hreinecke
Copy link
Collaborator

And the pull request has been merged, so we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants