Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid inputs for nodes parameter. #146

Closed
anntzer opened this issue Jan 29, 2019 · 12 comments
Closed

Valid inputs for nodes parameter. #146

anntzer opened this issue Jan 29, 2019 · 12 comments
Assignees

Comments

@anntzer
Copy link
Contributor

anntzer commented Jan 29, 2019

Very minor usability request: It would be nice if the Curve constructor directly accepted nested lists as input for the nodes array, calling np.asarray on them as needed.

In [4]: bezier.Curve.from_nodes([[1, 2, 1], [3, 4, 5]])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-016ca81d741a> in <module>
----> 1 bezier.Curve.from_nodes([[1, 2, 1], [3, 4, 5]])

~/.local/lib/python3.7/site-packages/bezier/curve.py in from_nodes(cls, nodes, _copy)
    114             Curve: The constructed curve.
    115         """
--> 116         _, num_nodes = nodes.shape
    117         degree = cls._get_degree(num_nodes)
    118         return cls(nodes, degree, _copy=_copy)

AttributeError: 'list' object has no attribute 'shape'

Also, note that the example in the docs for the Curve constructor

>>> import bezier
>>> nodes = np.asfortranarray([
...     [0.0, 0.625, 1.0],
...     [0.0, 0.5  , 0.5],
... ])
>>> curve = bezier.Curve(nodes, degree=2)
>>> curve
<Curve (degree=2, dimension=2)>

is a bit confusing, as it suggests that the input must be in Fortran order, even though this is not the case (AFAICT).

@dhermes
Copy link
Owner

dhermes commented Jan 30, 2019

Thanks for stopping by @anntzer! I love hearing from users.

I have thought about adding a "list of lists" as an acceptable input and I think it's a reasonable request. I'm somewhat inclined to "guard" the __init__ constructor for one specific type and instead offer a factory constructor, e.g.

curve = bezier.Curve.from_free([[0.0, 1.5, 3.5], [2.0, 1.5, 0.0]])

Then from_free could use isinstance(nodes, collections.abc.Sequence) or just rely on the numpy.ndarray constructor to validate the input when not isintance(nodes, np.ndarray).

(My inclination to "guard": I worry a bit that polymorphic inputs make code harder to reason about and makes APIs harder to use, but maybe I'm wrong in that worry.)


Nodes are expected to be in Fortran order since all of the low-level algorithms access the raw array underlying the ndarray. The base class constructor for Curve and Surface handles converting the input to Fortran-ordered if the user doesn't pass it in that way. (I should probably raise an exception if _copy=False but nodes..flags.f_contiguous is not True.)

@dhermes dhermes self-assigned this Jan 30, 2019
@anntzer
Copy link
Contributor Author

anntzer commented Jan 30, 2019

My personal view on polymorphic constructors is that if you need to put in conditional branches that interpret the inputs in a wildly different manner, then it's better to split in multiple constructors, but here you don't need to, you can just pass everything through np.asarray. So (for some unrelated class) I'd have Foo.from_ints(1, 2) and Foo.from_array([1, 2]) or Foo.from_array(np.array([1, 2])).
Likewise I don't really like parameters that are clearly redundant with others so I'd have made the degree parameter to the "main" constructor optional as it clearly just serves to sanity-check the nodes input.
OTOH all that is purely esthetic taste so it's up to you.

Given that the constructor will always handle the fortranordering, I would not bother with showing that info in the docs (I think f-order arrays are quite rare in "normal" numpy usage). Likewise, unless I am mistaken, the copy will never actually copy arrays of size more than 15 (for degree-4 surfaces), which is tiny enough to not worry about the cost of the copy? So I'd just not expose that argument publically. (Otherwise, as a user, I start to wonder: so is it OK for me to pass copy=False and mutate the array after the fact? Are there cache invalidation problems or something else? yada yada)

Again, not a huge deal either way. The quality of the library (especially the documentation of the C-API) is top notch and I am just nitpicking.

@dhermes
Copy link
Owner

dhermes commented Jan 31, 2019

TL;DR: Indeed, punting to np.ndarray will cover this.


Likewise I don't really like parameters that are clearly redundant with others so I'd have made the degree parameter to the "main" constructor optional as it clearly just serves to sanity-check the nodes input.

I could go either way on that. When I did it, I was on a "don't do any computation in the real constructor" kick. For Curve, the computation is d = num_nodes - 1 but for Surface we need to actually solve a quadratic (2d + 3)^2 = 8 num_nodes + 1, which is a very expensive™ operation.


I think f-order arrays are quite rare in "normal" numpy usage

Yes and no. The default ordering provided by the library is C, but most computational kernels are still written in Fortran or were ported (e.g. via f2c) from Fortran.


Likewise, unless I am mistaken, the copy will never actually copy arrays of size more than 15 (for degree-4 surfaces)

Well most useful things (e.g. evaluate) support arbitrarily high degree, the following methods may raise UnsupportedDegree:

  • Curve.reduce_ (for degree other than 1, 2, 3, 4)
  • Curve.intersect() or Surface.intersect() with the algebraic intersection strategy (if one of the two objects has degree other than 0, 1, 2, 3)
  • Surface.area() or CurvedPolygon.area() (for degree other than 1, 2, 3, 4)
  • Surface.is_valid (for degree other than 1, 2, 3)

It's worth noting that _copy is the argument, not copy. I.e. user beware, this is not a public constructor argument.

The copy serves a few purposes:

  • Makes a bezier-owned version of the data so that the user doesn't accidentally modify the nodes after passing them to the constructor
  • Makes sure the nodes are contiguous in memory (for now the Fortran code assumes arrays are contiguous, though I suppose I could use strided arrays someday)
  • Make sure the nodes are in Fortran order

@anntzer
Copy link
Contributor Author

anntzer commented Jan 31, 2019

Yes and no. The default ordering provided by the library is C, but most computational kernels are still written in Fortran or were ported (e.g. via f2c) from Fortran.

Let's say "if someone just wants to use some library in drive-by mode and generates their inputs using standard numpy constructs, these inputs will likely (IMO) be in C-order".

Well most useful things (e.g. evaluate) support arbitrarily high degree, the following methods may raise UnsupportedDegree.

Ah, I see. The docs are a bit unclear there; it may be nice to explicitly document the supported degrees for each method (I don't think this info is available anywhere?).

It's worth noting that _copy is the argument, not copy. I.e. user beware, this is not a public constructor argument.

I guess the question is whether the end user passing in _copy=False is supported. If not, but you need to use that for internal performance reasons, perhaps just move that to some private constructor like Curve._fast_no_copy?

Again, none of this is really that important :)

@anntzer
Copy link
Contributor Author

anntzer commented Jan 31, 2019

Oh and by the way, now that we're talking about design points:
I think it is much more common for "list-of-points" inputs to have shape (npts, ndim) (e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html, https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.Delaunay.html) rather than (ndim, npts) as this library's nodes currently do.

Perhaps a bit too late to change that though...

@anntzer
Copy link
Contributor Author

anntzer commented Feb 4, 2019

And while we're at it: it is also slightly jarring that the argument order of evaluate_hodograph and get_curvature (for example) are not consistent: the former takes s before the nodes, whereas the latter takes the nodes before s.

@dhermes
Copy link
Owner

dhermes commented Feb 4, 2019

Perhaps a bit too late to change that though...

Not too late, I'm open to it. However, that's a C-vs.-Fortran ordering issue. I originally had them that way, but the algorithms more often grab an entire point (x, y or x, y, z values) all at once then they grab all x-values, hence making the points contiguous in memory in Fortran order was a performance optimization.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 4, 2019

Sorry if I'm being dense, but wouldn't it work if the API was expecting (n, 2) arrays in C-order (from the point of view of the user)? It "just happens" that these are also (2, n) arrays in F-order, so everything goes well?

dhermes added a commit that referenced this issue Sep 1, 2019
Towards #146. This ensures arrays are Fortran-ordered and have floating
point values. However, it doesn't address the request (in #146) for
(n, 2) in C order <--> (2, n) in F order.
@dhermes
Copy link
Owner

dhermes commented Sep 1, 2019

@anntzer I partially satisfied this request in 68f7dc7. Sorry for dragging my feet for so long.

Some things that are pending from our discussion:

  • I have yet update any documentation to show a "regular" C-ordered np.array() or a list of lists or an array of integers as possible inputs
  • I haven't addressed the (n, 2) in C-order <--> (2, n) in F-order equivalence

@anntzer
Copy link
Contributor Author

anntzer commented Sep 1, 2019

No worries :)
Let me know if there's anything I can help with.

dhermes added a commit that referenced this issue Oct 23, 2019
Towards #146.

Before, these factory constructor methods **assumed** `nodes` had
a shape, which was not required by the documentation.
@dhermes
Copy link
Owner

dhermes commented Oct 24, 2019

I actually fixed the from_nodes() issue in f5c7869 (I thought I had gotten it in 68f7dc7 but referred to .shape before making sure the "sequence of sequences" was converted to an ndarray first.

@dhermes
Copy link
Owner

dhermes commented Oct 24, 2019

The core part of this issue (i.e. "accept sequence of sequence instead of an explicit NumPy array") is complete. I've moved the two remaining unhandled suggestions into #153 and #154

@dhermes dhermes closed this as completed Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants