Mailman 3 December 2012 - NumPy-Discussion

dtype "reduction"
by Nicolas Rougier Jan. 15, 2013

Jan. 15, 2013

Hi all, I'm looking for a way to "reduce" dtype1 into dtype2 (when it is possible of course). Is there some easy way to do that by any chance ? dtype1 = np.dtype( [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')]) ] ) dtype2 = np.dtype( [ ('vertex', 'f4', 3), ('normal', 'f4', 3), ('color', 'f4', 4)] ) Nicolas

2 3

ANN: NumPy 1.7.0rc1 release
by Ondřej Čertík Jan. 9, 2013

Jan. 9, 2013

Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.7.0rc1. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.7.0rc1/ We have fixed all issues known to us since the 1.7.0b2 release. The only remaining issue is a documentation improvement: https://github.com/numpy/numpy/issues/561 Please test this release and report any issues on the numpy-discussion mailing list. If there are no more problems, we'll release the final version soon. I'll wait at least a week and please write me an email if you need more time for testing. I would like to thank Sebastian Berg, Ralf Gommers, Han Genuit, Nathaniel J. Smith, Jay Bourque, Gael Varoquaux, Mark Wiebe, Matthew Brett, Skipper Seabold, Peter Cock, Charles Harris, Frederic, Gabriel, Luis Pedro Coelho, Pauli Virtanen, Travis E. Oliphant and cgohlke for sending patches and fixes for this release since 1.7.0b2. Cheers, Ondrej P.S. Source code is uploaded to sourceforge, and I'll upload the rest of the Windows and Mac binaries in a few hours as they finish building.

6 7

Numpy speed ups to simple tasks - final findings and suggestions
by Raul Cota Jan. 5, 2013

Jan. 5, 2013

Hello, On Dec/2/2012 I sent an email about some meaningful speed problems I was facing when porting our core program from Numeric (Python 2.2) to Numpy (Python 2.6). Some of our tests went from 30 seconds to 90 seconds for example. I saw interest from some people in this list and I left the topic saying I would do a more complete profile of the program and report back anything meaningful. It took me quite a bit to get through things because I ended up having to figure out how to create a Visual Studio project that I could debug and compile from the IDE. First, the obvious, Everything that relies heavily on Numpy for speed (mid to large arrays) is pretty much the same speed when compared to Numeric. The areas that are considerably slower in Numpy Vs Numeric are the trivial tasks that we end up using either for convenience (small arrays) or because scalar types such as 'float64' propagate everywhere throughout the program and creep into several of our data structures. This email is really only relevant to people stuck with doing trivial operations with Numpy and want a meaningful speed boost. I focused on float64. * NOTE: I ended up doing everything in Numpy 1.6.2 as opposed to using the latest stuff. I am going to guess all my findings still apply but I will only be able to confirm until later. ========================================================= In this email I include, 1) Main bottlenecks I found which I list and refer to as (a), (b) and (c). 2) The benchmark tests I made and their speed ups 3) Details on the actual changes to the C code ========================================================= Summary of conclusions, - Our code is finally running as fast as it used to by doing some changes in Numpy and also some minor changes in our code. Half of our problems were caused by instantiating small arrays several times which is fairly slow in Numpy. The other half of our problems were are caused by the slow math performance of Numpy scalars. We did find a particular python function in our code that was a big candidate to be rewritten in C and just got it done. - In Numpy I did four sets of changes in the source code. I believe three of them are relevant to every one using Numpy and one of them is probably not going to be very popular. - The main speed up is in float64 scalar operations and creation of small arrays from lists or tuples. The speed up in small array operations is only marginal but I believe there is potential to get them at least twice as fast. ========================================================= 1) By profiling the program I found three generic types of bottlenecks in Numpy that were affecting our code, a) Operations that result in Python internally raising an error e.g. PyObject_GetAttrString(obj, "__array_priority__") when __array_priority__ is not an attribute of obj b) Creation / destruction of scalar array types . In some places this was happening unnecessarily . c) Ufuncs trying to figure out the proper type for an operation (e.g. if I multiply a float64 array by a float64 array, a fair amount of time is spent deciding that it should use float64) I came up with specific changes to address (a) and (b) . I gave up on (c) for now since I couldn't think of a way to speed it up without a large re-write and I really don't know the Numpy code (never saw it before this work). ========================================================= 2) The tests I did were (some are python natives for reference), 1) Array * Array 2) PyFloat * Array 3) Float64 * Array 4) PyFloat + Array 5) Float64 + Array 6) PyFloat * PyFloat 7) Float64 * Float64 8) PyFloat * Float64 9) PyFloat * vector1[1] 10) PyFloat + Float64 11) PyFloat < Float64 12) if PyFloat < Float64: 13) Create array from list 14) Assign PyFloat to all 15) Assign Float64 to all 16) Float64 * Float64 * Float64 * Float64 * Float64 17) Float64 * Float64 * Float64 * Float64 * Float64 18) Float64 ** 2 19) PyFloat ** 2 where Array -> Numpy array of float64 of two elements (vector1 = array( [2.0, 3.1] )). PyFloat -> pyFloat = 3.1 Float64 -> Numpy scalar 'float64' (scalarFloat64 = vector1[1]) Create array from list -> newVec = array([0.2, 0.3], dtype="float64") Assign PyFloat to all -> vector1[:] = pyFloat Assign Float64 to all -> vector1[:] = scalarFloat64 I ran every test 100000 and timed it in seconds. These are the base timings with the original Numpy TIME[s] TEST 1) 0.2003 Array * Array 2) 0.2502 PyFloat * Array 3) 0.2689 Float64 * Array 4) 0.2469 PyFloat + Array 5) 0.2640 Float64 + Array 6) 0.0055 PyFloat * PyFloat 7) 0.0278 Float64 * Float64 8) 0.0778 PyFloat * Float64 9) 0.0893 PyFloat * vector1[1] 10) 0.0767 PyFloat + Float64 11) 0.0532 PyFloat < Float64 12) 0.0543 if PyFloat < Float64 : 13) 0.6788 Create array from list 14) 0.0708 Assign PyFloat to all 15) 0.0775 Assign Float64 to all 16) 0.2994 Float64 * pyFloat * pyFloat * pyFloat * pyFloat 17) 0.1053 Float64 * Float64 * Float64 * Float64 * Float64 18) 0.0918 Float64 ** 2 19) 0.0156 pyFloat ** 2 - Test (13) is the operation that takes the longest overall - PyFloat * Float64 is 14 times slower than PyFloat * PyFloat By addressing bottleneck (a) I got the following ratios of time (BaseTime/NewTime) i.e. RATIO > 1 means GOOD . RATIO TEST 1) 1.1 Array * Array 2) 1.1 PyFloat * Array 3) 1.1 Float64 * Array 4) 1.1 PyFloat + Array 5) 1.2 Float64 + Array 6) 1.0 PyFloat * PyFloat 7) 1.7 Float64 * Float64 8) 2.8 PyFloat * Float64 9) 2.1 PyFloat * vector1[1] 10) 2.8 PyFloat + Float64 11) 3.3 PyFloat < Float64 12) 3.3 if PyFloat < Float64: 13) 3.2 Create array from list 14) 1.2 Assign PyFloat to all 15) 1.2 Assign Float64 to all 16) 2.9 Float64 * pyFloat * pyFloat * pyFloat * pyFloat 17) 1.7 Float64 * Float64 * Float64 * Float64 * Float64 18) 2.4 Float64 ** 2 19) 1.0 pyFloat ** 2 Speed up from Test (13) and (16) resulted in a big speed boost in our code Keeping the changes above. By addressing (b) in a way that did not change the data types of the return values I got the following ratios of time (BaseTime/NewTime) i.e. RATIO > 1 means GOOD . RATIO TEST 1) 1.1 Array * Array 2) 1.1 PyFloat * Array 3) 1.2 Float64 * Array 4) 1.1 PyFloat + Array 5) 1.2 Float64 + Array 6) 1.0 PyFloat * PyFloat 7) 1.7 Float64 * Float64 8) 4.3 PyFloat * Float64 9) 3.1 PyFloat * vector1[1] 10) 4.4 PyFloat + Float64 11) 9.3 PyFloat < Float64 12) 9.2 if PyFloat < Float64 : 13) 3.2 Create array from list 14) 1.2 Assign PyFloat to all 15) 1.2 Assign Float64 to all 16) 4.7 Float64 * pyFloat * pyFloat * pyFloat * pyFloat 17) 1.8 Float64 * Float64 * Float64 * Float64 * Float64 18) 2.4 Float64 ** 2 19) 1.0 pyFloat ** 2 - Scalar operations are quite a bit faster but PyFloat * Float64 is 2.9 times slower than PyFloat * PyFloat I decided to then tackle (b) even further by changing things like PyFloat * Float64 to return a PyFloat as opposed to a Float64. This is the change that I don't think is going to be very popular. This is what I got, 1) 1.1 Array * Array 2) 1.1 PyFloat * Array 3) 1.2 Float64 * Array 4) 1.1 PyFloat + Array 5) 1.2 Float64 + Array 6) 1.0 PyFloat * PyFloat 7) 3.2 Float64 * Float64 8) 8.1 PyFloat * Float64 9) 4.1 PyFloat * vector1[1] 10) 8.3 PyFloat + Float64 11) 9.4 PyFloat < Float64 12) 9.2 if PyFloat < Float64 : 13) 3.2 Create array from list 14) 1.2 Assign PyFloat to all 15) 1.2 Assign Float64 to all 16) 17.3 Float64 * pyFloat * pyFloat * pyFloat * pyFloat 17) 3.3 Float64 * Float64 * Float64 * Float64 * Float64 18) 2.4 Float64 ** 2 19) 1.0 pyFloat ** 2 - Test (16) shows how only one Float64 spoils the speed of trivial math. Now imagine the effect in hundreds of lines like that. - Even Test (17) got faster which uses only Float64 - Test (18) Float64 ** 2 is still returning a float64 in this run. Regarding bottleneck (c) . Deciding the type of UFunc. I hacked a version for testing purposes to check the potential speed up (some dirty changes in generate_umath.py). This version avoided the overhead of the call to the calls to find the matching ufunc. The ratio of speed up for something like Array * Array was only 1.6 . This was not too exciting so I walked away for now. ========================================================= 3) These are the actual changes to the C code, For bottleneck (a) In general, - avoid calls to PyObject_GetAttrString when I know the type is List, None, Tuple, Float, Int, String or Unicode - avoid calls to PyObject_GetBuffer when I know the type is List, None or Tuple a.1) In arrayobject.h after the line #include "npy_interrupt.h" I added a couple of #define //Check for exact native types that for sure do not //support array related methods. Useful for faster checks when //validating if an object supports these methods #define ISEXACT_NATIVE_PYTYPE(op) (PyList_CheckExact(op) || (Py_None == op) || PyTuple_CheckExact(op) || PyFloat_CheckExact(op) || PyInt_CheckExact(op) || PyString_CheckExact(op) || PyUnicode_CheckExact(op)) //Check for exact native types that for sure do not //support buffer protocol. Useful for faster checks when //validating if an object supports the buffer protocol. #define NEVERSUPPORTS_BUFFER_PROTOCOL(op) ( PyList_CheckExact(op) || (Py_None == op) || PyTuple_CheckExact(op) ) a.2) In common.c above the line if ((ip=PyObject_GetAttrString(op, "__array_interface__"))!=NULL) { I added if (ISEXACT_NATIVE_PYTYPE(op)){ ip = NULL; } else{ and close the } before the line #if !defined(NPY_PY3K) In common.c above the line if (PyObject_HasAttrString(op, "__array__")) { I added if (ISEXACT_NATIVE_PYTYPE(op)){ } else{ and close the } before the line #if defined(NPY_PY3K) In common.c above the line if (PyObject_GetBuffer(op, &buffer_view, PyBUF_FORMAT|PyBUF_STRIDES I added if ( NEVERSUPPORTS_BUFFER_PROTOCOL(op) ){ } else{ and close the } before the line #endif a.3) In ctors.c above the line if ((e = PyObject_GetAttrString(s, "__array_struct__")) != NULL) { I added if (ISEXACT_NATIVE_PYTYPE(s)){ e = NULL; } else{ and close the } before the line n = PySequence_Size(s); In ctors.c above the line attr = PyObject_GetAttrString(input, "__array_struct__"); I added if (ISEXACT_NATIVE_PYTYPE(input)){ attr = NULL; return Py_NotImplemented; } else{ and close the } before the line if (!NpyCapsule_Check(attr)) { In ctors.c above the line inter = PyObject_GetAttrString(input, "__array_interface__"); I added if (ISEXACT_NATIVE_PYTYPE(input)){ inter = NULL; return Py_NotImplemented; } else{ and close the } before the line if (!PyDict_Check(inter)) { In ctors.c above the line array_meth = PyObject_GetAttrString(op, "__array__"); I added if (ISEXACT_NATIVE_PYTYPE(op)){ array_meth = NULL; return Py_NotImplemented; } else{ and close the } before the line if (context == NULL) { In ctors.c above the line if (PyObject_GetBuffer(s, &buffer_view, PyBUF_STRIDES) == 0 || I added if ( NEVERSUPPORTS_BUFFER_PROTOCOL(s) ){ } else{ and close the } before the line #endif a.4) In multiarraymodule.c above the line ret = PyObject_GetAttrString(obj, "__array_priority__"); I added if (ISEXACT_NATIVE_PYTYPE(obj)){ ret = NULL; } else{ and close the } before the line if (PyErr_Occurred()) { For bottleneck (b) b.1) I noticed that PyFloat * Float64 resulted in an unnecessary "on the fly" conversion of the PyFloat into a Float64 to extract its underlying C double value. This happened in the function _double_convert_to_ctype which comes from the pattern, _@name@_convert_to_ctype I ended up splitting _@name@_convert_to_ctype into two sections. One for double types and one for the rest of the types where I extract the C value directly if it passes the check to PyFloat_CheckExact (It could be extended for other types). in scalarmathmodule.c.src I added, /**begin repeat * #name = double# * #Name = Double# * #NAME = DOUBLE# * #PYCHECKEXACT = PyFloat_CheckExact# * #PYEXTRACTCTYPE = PyFloat_AS_DOUBLE# */ static int _@name@_convert_to_ctype(PyObject *a, npy_@name@ *arg1) { PyObject *temp; if (@PYCHECKEXACT@(a)){ *arg1 = @PYEXTRACTCTYPE@(a); return 0; } ... The rest of this function is the implementation of the original _@name@_convert_to_ctype(PyObject *a, npy_@name@ *arg1) The original implementation of _@name@_convert_to_ctype does not include double anymore, i.e. /**begin repeat * #name = byte, ubyte, short, ushort, int, uint, long, ulong, longlong, * ulonglong, half, float, longdouble, cfloat, cdouble, clongdouble# * #Name = Byte, UByte, Short, UShort, Int, UInt, Long, ULong, LongLong, * ULongLong, Half, Float, LongDouble, CFloat, CDouble, CLongDouble# * #NAME = BYTE, UBYTE, SHORT, USHORT, INT, UINT, LONG, ULONG, LONGLONG, * ULONGLONG, HALF, FLOAT, LONGDOUBLE, CFLOAT, CDOUBLE, CLONGDOUBLE# */ static int _@name@_convert_to_ctype(PyObject *a, npy_@name@ *arg1) b.2) This is the change that may not be very popular among Numpy users. I modified Float64 operations to return a Float instead of Float64. I could not think or see any ill effects and I got a fairly decent speed boost. in scalarmathmodule.c.src I modified to this, /**begin repeat * #name=(byte,ubyte,short,ushort,int,uint,long,ulong,longlong,ulonglong)*13, * (half, float, double, longdouble, cfloat, cdouble, clongdouble)*6, * (half, float, double, longdouble)*2# * #Name=(Byte,UByte,Short,UShort,Int,UInt,Long,ULong,LongLong,ULongLong)*13, * (Half, Float, Double, LongDouble, CFloat, CDouble, CLongDouble)*6, * (Half, Float, Double, LongDouble)*2# * #oper=add*10, subtract*10, multiply*10, divide*10, remainder*10, * divmod*10, floor_divide*10, lshift*10, rshift*10, and*10, * or*10, xor*10, true_divide*10, * add*7, subtract*7, multiply*7, divide*7, floor_divide*7, true_divide*7, * divmod*4, remainder*4# * #fperr=1*70,0*50,1*10, * 1*42, * 1*8# * #twoout=0*50,1*10,0*70, * 0*42, * 1*4,0*4# * #otyp=(byte,ubyte,short,ushort,int,uint,long,ulong,longlong,ulonglong)*12, * float*4, double*6, * (half, float, double, longdouble, cfloat, cdouble, clongdouble)*6, * (half, float, double, longdouble)*2# * #OName=(Byte,UByte,Short,UShort,Int,UInt,Long,ULong,LongLong,ULongLong)*12, * Float*4, Double*6, * (Half, Float, Double, LongDouble, CFloat, CDouble, CLongDouble)*6, * (Half, Float, Double, LongDouble)*2# * #OutUseName=(Byte,UByte,Short,UShort,Int,UInt,Long,ULong,LongLong,ULongLong)*12, * Float*4, out*6, * (Half, Float, out, LongDouble, CFloat, CDouble, CLongDouble)*6, * (Half, Float, out, LongDouble)*2# * #AsScalarArr=(1,1,1,1,1,1,1,1,1,1)*12, * 1*4, 0*6, * (1, 1, 0, 1, 1, 1, 1)*6, * (1, 1, 0, 1)*2# * #RetValCreate=(PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New,PyArrayScalar_New)*12, * PyArrayScalar_New*4, PyFloat_FromDouble*6, * (PyArrayScalar_New, PyArrayScalar_New, PyFloat_FromDouble, PyArrayScalar_New, PyArrayScalar_New, PyArrayScalar_New, PyArrayScalar_New)*6, * (PyArrayScalar_New, PyArrayScalar_New, PyFloat_FromDouble, PyArrayScalar_New)*2# */ #if !defined(CODEGEN_SKIP_@oper@_FLAG) static PyObject * @name@_@oper@(PyObject *a, PyObject *b) { ... Same as before and ends with... #else ret = @RetValCreate@(@OutUseName@); if (ret == NULL) { return NULL; } if (@AsScalarArr@) PyArrayScalar_ASSIGN(ret, @OName@, out); #endif return ret; } #endif /**end repeat**/ I still need to do the section for when there are two return values and the power function. I am not sure what else could be there. ========================================================= That's about it. Sorry for the long email. I tried to summarize as much as possible. Let me know if you have any questions or if you want the actual files I modified. Cheers, Raul Cota

2 9

3D array problem in Python
by Happyman Jan. 3, 2013

Jan. 3, 2013

Hello I have 3 dimensional array which I want to calculate in a huge process. Everything is working well if I use ordinary way which is unsuitable in Python like the following: nums=32 rows=120 cols=150 for k in range(0,nums): for i in range(0,rows): for j in range(0,cols): if float ( R[ k ] [ i ] [ j ] ) == 0.0: val11 [ i ] =0.0 else: val11[ i ] [ j ], val22[ i ][ j ] = integrate.quad( lambda x : F1(x)*F2(x) , 0 , pi) But, this calculation takes so long time, let's say about 1 hour (theoretically)... Is there any better way to easily and fast calculate the process such as [ F( i ) for i in xlist ] or something like that rather than using for loop?

5 8

Re: [Numpy-discussion] Manipulate neighboring points in 2D array
by deb Dec. 30, 2012

Dec. 30, 2012

Thanks Zach for your interest I was thinking about ndimage.generic_filter when I wrote about generic filter. For generic_filter I used trivial function that returns .sum() but I can't seem to make the code any faster than it is. This is the code: http://code.activestate.com/recipes/578390-snowflake-simulation-using-reite… As commenter suggested I thought to try and make it in numpy Interestingly, the first thing I tried before trying to use numpy was change range() loops with xrange(), as xrange is considered faster and more efficient, but result was that code was twice slower. Anyway I give up, and concluded that my numpy skills are far below I expected :D > It's possible that some generic filter operations can be cast in > terms of pure-numpy operations, or composed out of existing filters > available in scipy.ndimage. If you can describe the filter operation > you wish to perform, perhaps someone can make some suggestions. > Alternately, scipy.ndimage.generic_filter can take an arbitrary > python function. Though it's not really fast...

2 1

A small challenge
by Charles R Harris Dec. 29, 2012

Dec. 29, 2012

Hi All, I propose a challenge: express the dtype grammar in EBNF. That's all. Chuck

2 3

Re: [Numpy-discussion] Manipulate neighboring points in 2D array
by deb Dec. 29, 2012

Dec. 29, 2012

Thanks Zach You are right. I needed generic filter - to update current point, and not the neighbors as I wrote. Initial code is slow loop over 2D python lists, which I'm trying to convert to numpy and make it useful. In that loop there is inner loop for calculating neighbors properties, which confused me yesterday, and mislead to search for something that probably does not make sense. It's clear now :) Regards

2 1

Manipulate neighboring points in 2D array
by deb Dec. 27, 2012

Dec. 27, 2012

Hi, I have 2D array, let's say: `np.random.random((100,100))` and I want to do simple manipulation on each point neighbors, like divide their values by 3. So for each array value, x, and it neighbors n: n n n n/3 n/3 n/3 n x n -> n/3 x n/3 n n n n/3 n/3 n/3 I searched a bit, and found about scipy ndimage filters, but if I'm not wrong, there is no such function. Of course me being wrong is quite possible, as I did not comprehend whole ndimage module, but I tried generic filter for example and browser other functions. Is there better way to make above manipulation, instead using for loop over every array element? TIA

2 1

numpy.testing.asserts and masked array
by Chao YUE Dec. 27, 2012

Dec. 27, 2012

Dear all, I found here http://mail.scipy.org/pipermail/numpy-discussion/2009-January/039681.html that to use* numpy.ma.testutils.assert_almost_equal* for masked array assertion, but I cannot find the np.ma.testutils module? Am I getting somewhere wrong? my numpy version is 1.6.2 thanks! Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************

2 2

Pre-allocate array
by Nikolaus Rath Dec. 27, 2012

Dec. 27, 2012

Hello, I have an array that I know will need to grow to X elements. However, I will need to work with it before it's completely filled. I see two ways of doing this: bigarray = np.empty(X) current_size = 0 for i in something: buf = produce_data(i) bigarray[current_size:current_size+len(buf)] = buf current_size += len(buf) # Do things with bigarray[:current_size] This avoids having to allocate new buffers and copying data around, but I have to separately manage the current array size. Alternatively, I could do bigarray = np.empty(0) current_size = 0 for i in something: buf = produce_data(i) bigarray.resize(len(bigarray)+len(buf)) bigarray[-len(buf):] = buf # Do things with bigarray this is much more elegant, but the resize() calls may have to copy data around. Is there any way to tell numpy to allocate all the required memory while using only a part of it for the array? Something like: bigarray = np.empty(50, will_grow_to=X) bigarray.resize(X) # Guaranteed to work without copying stuff around Thanks, -Nikolaus

2 1