Skip to content

Conversation

@gmgunter
Copy link
Contributor

@gmgunter gmgunter commented Dec 31, 2020

This PR adds several interrelated classes for error-handling.

  • ErrorCode provides a mechanism for type-erasure of error code enumeration types, allowing for interoperability of different error codes. It stores an integer value and a pointer to an object of type ErrorCategory.
  • ErrorCategory is an abstract base class for specific error categories. Each derived class provides support for a unique error code enum type to ErrorCode.
  • Error stores an error code along with information about the source location where the error occurred (filename and line number).
  • Expected<T> is a wrapper that may contain an object of type T or an Error. It provides an alternative to traditional error handling mechanisms (such as exceptions and error codes) when used as a return type for operations which may fail. On success, the returned object contains the expected result. In case of failure, it instead contains an object that describes the error encountered.

I'm posting this implementation for posterity, but note that there are some significant compatibility issues with CUDA code that I'd initially overlooked. Check out the (forthcoming) v2 implementation instead.

  1. The fallback mechanism -- whereby invalid access to an Expected<T>::value() causes an exception to be thrown -- doesn't work in device code (obviously, since exceptions aren't available on the device).

    The best you can do is something like

const auto error = Error(DomainError::DivisionByZero);

// error.throw_exception(); // XXX not possible in device code!

print(stderr, "%s:%d: %s", error.file(), error.line(), error.error_code().description());
__trap(); // Abort kernel execution and raise `cudaErrorLaunchFailure` on the host
  1. ErrorCategory relies on dynamic polymorphism which has significant limitations in CUDA code. Instances of derived error categories created on the host may not be safely passed to the device and vice versa. As an alternative, ErrorCode could achieve runtime polymorphic behavior using a variant/visitor idiom, which has no such limitations. However, this makes extending ErrorCode much more intrusive -- you need to modify the class definition to plug in additional error code types.
  2. Device code may not obtain a pointer to an error category instance created on the host, even if the object is a compile-time constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant