Stratified Sampling, Screen 7 using groupby technique

Screen Link:

My Code:

import pandas as pd
import matplotlib.pyplot as plt
wnba = pd.read_csv('wnba.csv')
wnba['PPG'] = wnba['PTS']/wnba['Games Played']
wnba_pos= wnba.groupby('Pos').sample(n= 10, random_state= 0)

What I expected to happen:
Hello everyone. I decided to use the pandas groupby method to stratify this dataset by player position. Also decided to use the groupby.sample method as seen above. I tried it on my command prompt and it worked., but when I tried it here on dataquest, I got the Attribute error described below. I’m not sure why I got this, considering that when I tried the same thing in my command prompt, it worked.

I noticed that the difference between the two obtained groupby objects was that the one in my command prompt is of class pandas.core.groupby.generic.DataFrameGroupBy whereas the one here on Dataquest is of class pandas.core.groupby.DataFrameGroupBy. I would really appreciate some help with this, an explanation perharps on why this happened and the difference if any between the two types.

The output/error
AttributeError: Cannot access callable attribute ‘sample’ of ‘DataFrameGroupBy’ objects, try using the ‘apply’ method

You are using a different version of pandas compare to Dataquest. The version of pandas on your local machine is way ahead. Below are ways to resolve the issues:

Using .__version__ to check for version

DataQuest uses 0.22.0 panda version. You can check your Pandas version using:

import pandas as pd 
print(pd.__version__)

Here’s the documentation on pandas 0.22.0 version

Pandas also provides a utility function, pd.show_versions() , which reports the version of its dependencies as well:

Shows Dataquest setup

print(pd.show_versions())
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-128-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 43.0.0
Cython: None
numpy: 1.14.2
scipy: 0.18.0
pyarrow: None
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.2
pytz: 2020.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.1
openpyxl: 2.2.6
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 1.0b8
sqlalchemy: 1.3.20
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None

Using dir to check for methods in object

Use dir to check what methods are available for this object.

dir(wnba.groupby("Pos"))

Using the dir to check for method sample exists for object wnba.groupby("Pos").

o = wnba.groupby("Pos")
methods = {m for m in dir(o) if callable(getattr(o, m))}
print("sample" in methods)
>>> False

There’s no method sample for pandas.core.groupby.DataFrameGroupBy object in version 0.22.0 of pandas.

AttributeError: Cannot access callable attribute ‘sample’ of ‘DataFrameGroupBy’ objects

There is no callable method name sample within the DataFrameGroupBy class which was verified above using dir.

Setup virtual environment to replicate Dataquest platform environment

Setting up a virtual environment similar to Dataquest platform environment will resolve any issues having incompatibility errors due to different package versions.

Recommend to use pyenv instead of the bloated anaconda. Only install what you need.

I just checked my pandas version and it is version ’1.1.4 while Dataquest’s is 0.2.2 like you already pointed out. That explains a lot. Thank you for your answers. I found them really helpful.

1 Like