Skip to content

Commit e20e81a

Browse files
Niklas CasselDamien Le Moal
Niklas Cassel
authored and
Damien Le Moal
committed
ata: libata-core: do not issue non-internal commands once EH is pending
While the ATA specification states that a device should return command aborted for all commands queued after the device has entered error state, since ATA only keeps the sense data for the latest command (in non-NCQ case), we really don't want to send block layer commands to the device after it has entered error state. (Only ATA EH commands should be sent, to read the sense data etc.) Currently, scsi_queue_rq() will check if scsi_host_in_recovery() (state is SHOST_RECOVERY), and if so, it will _not_ issue a command via: scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd()) -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue() Before commit e494f6a ("[SCSI] improved eh timeout handler"), when receiving a TFES error IRQ, the call chain looked like this: ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() -> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() -> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) -> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY) Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY would be set. However, after commit e494f6a ("[SCSI] improved eh timeout handler"), scsi_times_out() will instead call scsi_abort_command() which will queue delayed work, and the worker function scmd_eh_abort_handler() will call scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY). So now, after the TFES error IRQ has been serviced, we need to wait for the SCSI workqueue to run its work before SHOST_RECOVERY gets set. It is worth noting that, even before commit e494f6a ("[SCSI] improved eh timeout handler"), we could receive an error IRQ from the time when scsi_queue_rq() checks scsi_host_in_recovery(), to the time when ata_scsi_queuecmd() is actually called. In order to handle both the delayed setting of SHOST_RECOVERY and the window where we can receive an error IRQ, add a check against ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ), inside ata_scsi_queuecmd() itself, while holding the ap->lock. (Since the ap->lock is held while servicing IRQs.) Fixes: e494f6a ("[SCSI] improved eh timeout handler") Signed-off-by: Niklas Cassel <[email protected]> Tested-by: John Garry <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
1 parent 1ff3635 commit e20e81a

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

drivers/ata/libata-scsi.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3964,9 +3964,19 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
39643964

39653965
int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
39663966
{
3967+
struct ata_port *ap = dev->link->ap;
39673968
u8 scsi_op = scmd->cmnd[0];
39683969
ata_xlat_func_t xlat_func;
39693970

3971+
/*
3972+
* scsi_queue_rq() will defer commands if scsi_host_in_recovery().
3973+
* However, this check is done without holding the ap->lock (a libata
3974+
* specific lock), so we can have received an error irq since then,
3975+
* therefore we must check if EH is pending, while holding ap->lock.
3976+
*/
3977+
if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS))
3978+
return SCSI_MLQUEUE_DEVICE_BUSY;
3979+
39703980
if (unlikely(!scmd->cmd_len))
39713981
goto bad_cdb_len;
39723982

0 commit comments

Comments
 (0)