Using mean function in NumPy

I am puzzled about the mean function in NumPy.

I illustrate with example.

first I create a array.

t_a = np.array([
[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]])

I understand to get mean for each row or column by using axis=0 or axis=1.

print(np.mean(t_a, axis=0))
result: 7 8 9 10

print(np.mean(t_a, axis=1))
result: 2.5 6.5 10.5 14.5

The problem is if I want mean value for only one of the rows.

I manage to get result for one column by doing:
print(np.mean(t_a[:,2], axis=0))
result: 9

However if I try to apply that on a row it do not work.
I expexted the line below to result in 10.5 but I get a error.

print(np.mean(t_a[:2,:], axis=1))

How come column example works but row example does not, I do not understand the logical difference.

Thank you.

Best regards,
Christer Eriksson

What error did you get?
You can help others help you by specifying the type of error, such as ValueError, TypeError etc. Better if you paste the traceback so others can teach you how to read a traceback.

For your code i did not get error but Output: array([2.5, 6.5])
You may have been trying to do np.mean(t_a[2,:], axis=1) rather than np.mean(t_a[:2,:], axis=1)
Latter is selecting first 2 rows, former is select row index 2, which is the 3rd row

Thank you hanqi for answering.

I run the code show below:

start code

import numpy as np

t_a = np.array([
[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]])

print(np.mean(t_a[:,2], axis=0))

print(np.mean(t_a[2,:], axis=1))

End code

This is the output I get, IndexError.


IndexErrorTraceback (most recent call last)
in ()
12 print(np.mean(t_a[:,2], axis=0))
13
—> 14 print(np.mean(t_a[2,:], axis=1))

/dataquest/system/env/python3/lib/python3.4/site-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims)
2955
2956 return _methods._mean(a, axis=axis, dtype=dtype,
-> 2957 out=out, **kwargs)
2958
2959

/dataquest/system/env/python3/lib/python3.4/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
55
56 is_float16_result = False
—> 57 rcount = _count_reduce_items(arr, axis)
58 # Make this warning show up first
59 if rcount == 0:

/dataquest/system/env/python3/lib/python3.4/site-packages/numpy/core/_methods.py in _count_reduce_items(arr, axis)
48 items = 1
49 for ax in axis:
—> 50 items *= arr.shape[ax]
51 return items
52

IndexError: tuple index out of range


It is the second printstatement with axis=1 that causes Error.

Edit: Whoops I mispoke. Will fix it later.

At “https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html” I found example on mean in numpy.

Examples

a = np.array([[1, 2], [3, 4]])
np.mean(a)
2.5

np.mean(a, axis=0)
array([2., 3.])

np.mean(a, axis=1)
array([1.5, 3.5])

I also run the code myself and from that it looks like.

axis=0 calculation down a column.
axis=1 calculation along a row.

that is the reason I use “np.mean(t_a[2,:], axis=1)” in the print statement, I like to calculate over one row, row with index 2.

In the first statement, it all looks like it working that way. t_a[:,2] to select all rows in column indexed 2, followed by axis=0 for calculation down a column.
print(np.mean(t_a[:,2], axis=0))

Okay, so this might shed some light on why the below doesn’t work:

np.mean(t_a[2,:], axis=1)

To see why you’re getting an error, let’s take a closer look at what t_a[2,:] really is.

When you’re isolating a single row in the above fashion, what you’re really getting is the following:

print((t_a[2,:]).shape)
print(type(t_a[2,:]))

Output:

(4,)
<class 'numpy.ndarray'>

Even though you’re isolating a single row, and it might seem like you’re extracting horizontal values, what you’re actually getting is a numpy array of 4 rows! It’s a one-dimensional object with 4 rows, and no defined column amounts. That is why axis = 1 wouldn’t work. You’d have to use the default axis = 0 argument. We see this is confirmed by the correct answer of 10.5 being returned when you set the axis to the default value.

axis = 1 would come into play if you were indexing more than one row and more than one column, because then you’d get a 2-dimensional object!

2 Likes

Thank you blueberrypudding85 for explaining,

I think I start to understand what is happening.
print((t_a[2,:]).shape) output (4,) show the 4 rows.

If I would try to illustrate the would it be something like:

[[9,],
[10,],
[11,],
[12,]]
If I do print(t_a[2,:]) I se [9 10 11 12], that look like a row, but it really is four rows in on column.

Tricky,

Then in looks like numpy does array creation from selection differently in the case t_a[2,:] and for example t_a[2:4,1:4].

In second case it build a row from values in different columns.

row2, col1 -> 10
row2, col2 -> 11 added to same row as 10
row2, col3 -> 12 added to same row as 10 and 11

We have the first row, [10 11 12]

row3, col1 -> 14
row3, col2 -> 15 added to same row as 14
row3, col3 -> 16 added to same row as 14 and 15

We have the second row, [14 15 16]

And together they give the final array,

[[10 11 12]
[14 15 16]]

Incase t_a[2,:] each new column value generate a new row.

row2, col0 -> 9 adden in a row.
row2, col1 -> 10 adden in a new row, not same as 9
row2, col2 -> 11 adden in a new row, not same as 9 and 10
row2, col3 -> 12 adden in a new row, not same as 9, 10 and 11

For me it looks like different behaviours in how array are made, but maybe it is not?

Trying to understand what happen in the background.

All the best,
Christer