Dataframe slicing

Screen Link:

How to combine single and series of columns or rows while slicing a dataframe?

Suppose I have a dataframe with 10 columns and 20 rows. How can I slice the df such that I can select all the rows for the columns1, 3 and columns 6 to 10 inlcusive?

Similary how can I select all columns for rows 2,5,7 and rows 15 to 20 inclusive?

1 Like

Hello,

Maybe something like df.iloc[:, [0, 2, 5, 6, 7, 8, 9]]. Based on the info you gave, I presume columns 6 to 10 refers to the sixth to tenth column, and not column indexed by 6 to column indexed by 10.

If you want to use slice for 6:11, you’ll need something a bit more complicated as mentioned here. One (less elegant) option is df.iloc[:, [0, 2] + list(range(5,10))], and you can find other methods in the previous link.


Similar to the above, df.iloc[[1, 4, 6, 14, 15, 16, 17, 18, 19]

Or df.iloc[[1, 4, 6] + list(range(14,20))].


Maybe numpy.r_ (doc) would be better.

# select all rows, and first, third, and sixth to tenth columns
df.iloc[:, np.r_[0, 2, 5:10]]

# select all columns, and second, fifth, seventh, and fifteenth to twentieth rows
df.iloc[np.r_[1, 4, 6, 14:20]]
3 Likes

I haven’t tried it because my python IDE is in the middle of an upgrade, but I think np.r_ returns an numpy array rather than a list, so you would need to pass list(np.r_(stuff)) to df.iloc.

2 Likes

Yeah, I tried it on Jupyter and it seems iloc does not require any list conversion for the numpy array.

Here’s an example: Slicing.ipynb (38.4 KB)

Though, still a good tip though because some things in Python tend to return a generator, iterator, numpy array rather than a list.

Click here to view the jupyter notebook file in a new tab

3 Likes

Yeah this df.iloc[:, [0, 2, 5, 6, 7, 8, 9]]
was the obvious option. But I understand that pandas does not provide a simple way to perform this. I also checked this and got the impression that there is no simpler way. I believe that this is a common use case where one need to select single and range of columns or single and range of rows or even a combination of both. But I find it surprising that pandas does not provide an easier way to achieve this. Thanks for your response. It was helpful.

2 Likes

Yeah, it is surprising considering that pandas rely on numpy. They could’ve integrated numpy.r_ for its indexing.

Then again, considering how long have pandas have existed plus its popularity, surely the pandas team have their own reasons why the syntax is the way it is right now.

1 Like