Skip to content

Support array.array('f', ...) -> Vec<f32> #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ehiggs opened this issue Jul 15, 2015 · 13 comments
Closed

Support array.array('f', ...) -> Vec<f32> #19

ehiggs opened this issue Jul 15, 2015 · 13 comments

Comments

@ehiggs
Copy link
Contributor

ehiggs commented Jul 15, 2015

rust-cpython can convert Python lists to Rust vectors, but it can't yet convert arrays created by the array module. It also doesn't support numpy arrays, which is probably a bigger project, but also valuable goal.

For example, I have an example where I am writing a dot product function:

mod dot {
pub fn dot(a : &Vec<f32>, b : &Vec<f32>) -> f32 {                               
    let mut out = 0.;                                                           
    for (x, y) in a.iter().zip(b.iter()) {                                      
        out += x*y;                                                             
    }                                                                           
    out                                                                         
}                                                                               

} // mod 
fn dot<'p>(py: Python<'p>, args: &PyTuple<'p>) -> PyResult<'p, f32> {           
    if args.len() < 2 {
// TODO: write a macro for making PyErr so it's less painful.                                                         
        let msg = "dot takes two arrays of float";                              
        let pyerr = PyErr::new_lazy_init(py.get_type::<exc::ValueError>(), Some(msg.to_py_object(py).into_object()));
        return Err(pyerr);                                                      
    }                                                                           
    let arg0 = match args.get_item(0).extract::<Vec<f32>>() {                   
        Ok(x) => x,                                                             
        Err(_) => {                                                             
            let msg = "Could not convert argument 0 to array of float";         
            let pyerr = PyErr::new_lazy_init(py.get_type::<exc::ValueError>(), Some(msg.to_py_object(py).into_object()));
            return Err(pyerr);                                                  
        }                                                                       
    };                                                                          

    let arg1 = match args.get_item(1).extract::<Vec<f32>>() {                   
        Ok(x) => x,                                                             
        Err(_) => {                                                             
            let msg = "Could not convert argument 0 to array of float";         
            let pyerr = PyErr::new_lazy_init(py.get_type::<exc::ValueError>(), Some(msg.to_py_object(py).into_object()));
            return Err(pyerr);                                                  
        }                                                                       
    };                                                                          

    if arg0.len() != arg1.len() {                                               
            let msg = "arg0 and arg1 must be the same length";                  
            let pyerr = PyErr::new_lazy_init(py.get_type::<exc::ValueError>(), Some(msg.to_py_object(py).into_object()));
            return Err(pyerr);                                                  
    }                                                                           

    Ok(dot::dot(&arg0, &arg1))                                                  
}  

py_module_initializer!(librust_python_example, |_py, m| {                       
    try!(m.add("__doc__", "Dot product"));                                                     
    try!(m.add("dot", py_fn!(dot)));                                            
    Ok(())                                                                      
});                            

It works pretty well for a toy example:

In [1]: def dot(a, b):
    return sum(map(lambda x: x[0]*x[1], zip(a,b)))
   ...: 

In [2]: import numpy as np

In [3]: import librust_python_example as r

In [4]: timeit dot([0,2,3], [1,2,3])
100000 loops, best of 3: 1.86 µs per loop

In [5]: timeit np.dot([0,2,3], [1,2,3])
100000 loops, best of 3: 3.89 µs per loop

In [6]: timeit r.dot([0,2,3], [1,2,3])
1000000 loops, best of 3: 920 ns per loop

However, it won't convert numpy arrays or array.arrays:

In [8]: r.dot(np.arange(3), np.arange(3))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-0e9a2e3217bb> in <module>()
----> 1 r.dot(np.arange(3), np.arange(3))

ValueError: Could not convert argument 0 to array of float

In [9]: import array
In [10]: r.dot(array.array('f', [0,2,3]), array.array('f', [1,2,3]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-3eab6b6f9184> in <module>()
----> 1 r.dot(array.array('f', [0,2,3]), array.array('f', [1,2,3]))

ValueError: Could not convert argument 0 to array of float
@ehiggs ehiggs changed the title Support array.array('f', ...) -> Vec<f32> Support array.array('f', ...) -> Vec<f32> Jul 15, 2015
@dgrunwald
Copy link
Owner

impl FromPyObject for Vec is currently restricted to lists to mirror the ToPyObject impl.
I agree it's probably better to be flexible and accept arbitrary sequences. Of course, the abstract sequence protocol is going to be relatively slow (has to allocate a python object for each sequence element).

The buffer protocol could be used to optimize extracting a Vec<f32> to a simple memcpy, but that's not possible in the generic FromPyObject implementation. So this optimization would have to be a separate function call, at least until rust gets impl specialization.

@ehiggs
Copy link
Contributor Author

ehiggs commented Jul 24, 2015

Maybe I misunderstood the approach. Are Sequence, Mapping, Iterator, and Buffer protocols supported? If these are supported then I don't think we need to copy data into a Vec and the bug title is completely misleading. Sorry for that.

So in this case, I think it makes sense to make the Iterator protocol map to something like std::iter::IntoIterator. Mapping can support std::iter::IntoIterator<(K, V)> as well as std::ops::Index. And I'm not sure which traits would be appropriate for Sequence or Buffer. They probably map to traits that make them look like Vectors.

@dgrunwald
Copy link
Owner

Note: the Sequence, Mapping, Iterator and Buffer protocols are not yet supported in the safe rust-cpython API. The unsafe APIs are available in pythonXY-sys, so we can still use them internally in rust-cpython. I haven't yet thought about how to expose these APIs to safe Rust. Using ops::Index not possible because it's required to return borrowed references. We'll probably end up implementing IntoIterator, though there's some trickery involved with error handling.

On the "extract to Vec<T>" topic: we already support extracting a python object into Vec<T> for arbitrary T. (currently this only works for python lists, but that can be easily extended for any type supporting the Sequence protocol)
Supporting arbitrary T means we need a PyObject for each item in the sequence so that impl ExtractPyObject for Vec<T> can call impl ExtractPyObject for T. That's what I meant with the abstract sequence protocol being slow -- it'll have to allocate a PyObject for every float in the array.

The buffer protocol allows accessing the data without allocations. But that's not possible for arbitrary T, but only for primitive types like i32, f32 etc. But without impl specialization, it's not possible to implement ExtractPyObject for some primitive T when there's already a general implementation.

My recent change from FromPyObject to ExtractPyObject is intended to allow extracting as &[T] without necessarily having to copy the memory into a Vec<T>. (the change allows connecting the &PyObject's lifetime to the extracted value's lifetime)
But the same issue with impl specialization applies here.

Of course, an easy workaround to the impl specialization problem is to have the ExtractPyObject trait always use the slow sequence protocol; and require the user to call a different (less general) function to use the buffer protocol.

@ehiggs
Copy link
Contributor Author

ehiggs commented Jul 24, 2015

Thanks for the explanations.

So, if I understand correctly, ExtractPyObject can only work in converting Python objects into the appropriate Rust objects without impl specializations if there are multiple versions of the function for defining which conversion to use. e.g. ExtractPyObject, ExtractPyBuffer, ExtractPyMapping.

That doesn't seem too bad. It probably makes sense to have these traits anyway and ExtractPyObject could refer to them if/when impl specialization arrives. I would add ExtractPySequence to the list so it maps onto the actual Python APIs.

Perhaps another option is to allow for a template argument defining the template extractor.

    let arg0 = match args.get_item(0).extract::<Vec<f32>, PyObjectExtractor>() {                   
    //...                                                                 
    };      
    let arg1 = match args.get_item(0).extract::<Vec<f32>, PySequenceExtractor() {
    // ...
    };

This allows for functions like extract which can default to PyObjectExtractor and extract_by or extract_with that takes an explicit extractor.

I smashed together some code to do this, but it hits a compiler bug. It's probably not terribly useful, but it might illustrate the idea I'm trying to get across:

diff --git a/src/conversion.rs b/src/conversion.rs
index 771fb3e..0d256d5 100644
--- a/src/conversion.rs
+++ b/src/conversion.rs
@@ -89,12 +89,32 @@ pub trait ToPyObject<'p> {
 ///
 /// In cases where the result does not depend on the `'prepared` lifetime,
 /// the inherent method `PyObject::extract()` can be used.
+
+pub trait PyExtractor<'python, 'source, 'prepared> {
+    type Prepared;
+    fn extract<T>(prepared: &'prepared Self::Prepared) -> PyResult<'python, T>
+        where T: PythonObjectWithCheckedDowncast<'python> ;
+}
+
+pub struct PyObjectExtractor;
+
+impl <'python, 'source, 'prepared> PyExtractor<'python, 'source, 'prepared> 
+    for PyObjectExtractor {
+    type Prepared = &'source PyObject<'python>;
+
+    #[inline]
+    fn extract<T>(&&ref obj: &'prepared Self::Prepared) -> PyResult<'python, T> 
+        where T: PythonObjectWithCheckedDowncast<'python> {
+        Ok(try!(obj.clone().cast_into()))
+    }
+}
+
 pub trait ExtractPyObject<'python, 'source, 'prepared> {
     type Prepared;

     fn prepare_extract(obj: &'source PyObject<'python>) -> PyResult<'python, Self::Prepared>;

-    fn extract(prepared: &'prepared Self::Prepared) -> PyResult<'python, Self>;
+    fn extract<E: PyExtractor<'python, 'source, 'prepared>>(prepared: &'prepared Self::Prepared) -> PyResult<'python, Self>;
 }

 impl <'python, 'source, 'prepared, T> ExtractPyObject<'python, 'source, 'prepared>
@@ -108,8 +128,8 @@ impl <'python, 'source, 'prepared, T> ExtractPyObject<'python, 'source, 'prepare
     }

     #[inline]
-    fn extract(&&ref obj: &'prepared Self::Prepared) -> PyResult<'python, T> {
-        Ok(try!(obj.clone().cast_into()))
+    fn extract<E: PyExtractor<'python, 'source, 'prepared>>(&&ref obj: &'prepared Self::Prepared) -> PyResult<'python, T> {
+        E::extract::<Self::Prepared>(&obj)
     }
 }

It also would need the obvious changes in src/objects/{num|list|string|mod}.rs.

So that's an idea on how to approach the extraction without impl specialization.

@ehiggs
Copy link
Contributor Author

ehiggs commented Sep 6, 2016

@arielb1 claims that rust-lang/rust#22066 seems to be working in 1.10 so maybe we can return to this.

@dgrunwald
Copy link
Owner

The new logic is:
When building with --feature nightly, extracting a Vec<T> first tries the buffer protocol to see if there is a single-dimensional buffer of type T.
If that doesn't work (or if using stable Rust), the sequence protocol is used.

@Eh2406
Copy link

Eh2406 commented Jan 20, 2017

So does this also give us raw access to numpy arrays!? I think they have buffer protocol.

Also if I am reading the sequence protocol corectly; we can use PySequence_Size to get capacity for vec::with_capacity yes?

@dgrunwald
Copy link
Owner

dgrunwald commented Jan 20, 2017

Yes, this should give access to numpy arrays without going through the python runtime for each element.
So far I only tested with simple one-dimensional array.array (and similar); but the API should in theory also work with more complex buffers.

On PySequence_Size: I'm not sure if that's guaranteed to be a fast call, or even guaranteed to be implemented at all.
Looking at the PySequence_List implementation, it uses PyObject_LengthHint instead.
PySequence_List also has specialized code paths for when the sequence is a list or tuple, to avoid the overhead of going through the iterator.

@Eh2406
Copy link

Eh2406 commented Jan 20, 2017

That is so cool! Thank you!

Can we support buffer -> &[T] or &mut [T]?

@dgrunwald
Copy link
Owner

There's as_slice()/as_mut_slice(), which gives you &[ReadOnlyCell<T>] or &[Cell<T>].
The cells are necessary, because it's possible for the buffer contents to get modified by Python code while the reference is alive:

let buf = PyBuffer::get(py, obj).unwrap();
let slice = buf.as_slice::<f32>(py);
let old_val = slice[0].get();
obj.call_method("modify_stuff", NoArgs, None);
let new_val = slice[0].get(); // without the cell, it would be UB if obj.modify_stuff() changed the buffer contents

@Eh2406
Copy link

Eh2406 commented Jan 20, 2017

As usual you already have it covered! 👍

@aomader
Copy link

aomader commented Apr 20, 2017

@dgrunwald What about multi-dimensional numpy arrays?

@dgrunwald
Copy link
Owner

@b52: PyBuffer should support multi-dimensional arrays in theory; but I've never tested that part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants