Skip to content

server: sanitize the handling of APIv1 errors #80907

@knz

Description

@knz

Today we use an internal helper function serverError or call status.Errorf manually in the API handlers that serve the DB console.

This is problematic: this is extra complexity that is not idiomatic go, and people "forget" about calling these conversion functions.

We want something simpler.

This logic is made extra complicated by the fact the "APIv1" handlers
are defined as protobuf services, available to HTTP clients
using grpc-gateway. So we cannot construct HTTP errors directly:
we have to construct gRPC "*status.Error" objects, which will
be converted internally by grpc-gateway into HTTP status responses.

In an ideal world, we could make the situation simpler by letting
the API handlers return errors of any type, and then install
a gRPC unary interceptor server-side to convert these "any errors"
into a suitable grpc status.Status.

Unfortunately, we are not in such an ideal world: the gRPC service
that serves the APIv1 endpoints also serves the KV endpoints and
other CockroachDB internal APIs. These internal APIs return
protobuf-encodable error payloads that should be preserved as-is
for the benefit of internal API clients. So we cannot use an unary
interceptor to do this error conversion indiscriminately.

We are thus left with the following uncomfortable choice:

  • either pre-massage the errors manually into a status.Error
    inside each API handler (such as done with the current functions
    defined below).

    The drawback here is that the person implementing an API
    handler can "forget" to call the error conversion API, with
    complicated consequences:

    • the payload of the error object flows unredacted to the HTTP
      client, resulting in a potential data leak. (This is serious)
    • the grpc status code becomes code.Unknown, which becomes
      HTTP status 500 at the grpc-gateway boundary. This is sad
      because oftentimes a different HTTP status would be better
      suited.
  • or, we define a unary interceptor that "does the right thing"
    by performing redaction of the error object, and compute
    a grpc status code suitable given the actual error object.

    However, this unary interceptor must then maintain a list
    of which API endpoints should be given this treatment:
    all the "internal APIs" should be excluded.

  • or, we could pull out the APIv1 tree out of the gRPC system
    and stop defining it as a protobuf service, and bypass
    the complexity of grpc-gateway internally.

    This is what we wanted to achieve with the APIv2 tree but
    unfortunately the APIv1 tree is still lagging behind.

Related: #56208

Jira issue: CRDB-15456

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-error-handlingError messages, error propagations/annotationsA-server-architectureRelates to the internal APIs and src org for server codeC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-observability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions