Adding Column to Random Sampling Dataframe

Screen Link:

My Code:

sample_ratings = [25630, 31612, 6835, 30684, 87952, 3832, 
                  7, 48533, 10228, 180138, 4649, 12095, 
                  99917, 58476, 12250, 1148, 71277, 13332,
                 145910, 20594]
sample['num_reviews'] = sample_ratings
print(sample.head(3))

print(sum(sample['num_reviews'] < 30))

What I expected to happen:
Sample dataframe to include the number of reviews for each of the 20 films.
Also, the sum of the films with <20 reviews.

What actually happened:

108    Mechanic: Resurrection
206                  Warcraft
106                 Max Steel
Name: movie, dtype: object

TypeErrorTraceback (most recent call last)
<ipython-input-26-457521aa49cb> in <module>()
----> 1 print(sum(sample['num_reviews'] < 30))

TypeError: unorderable types: list() < int()

Sample printed without the number of reviews. Also, the sum of reviews <30 is not calculating.
FandangoMovieRatingsAnUpdate.ipynb (31.6 KB)

Click here to view the jupyter notebook file in a new tab

Hi!
I guess the problem is that you pass only a condition to the sum function:
sample['num_reviews']<30 is a condition,
then I guess you’d like to filter your df by this condition like this:
sample[sample['num_reviews']<30].

If you preferred to do it in 2 lines, it would be as follows:
condition = sample['num_reviews']<30] Print(sum(sample[condition])

I tried this but it also gave me an error. After checking the data type I see sample is actually a series. But when I enter

sample['num_reviews'] = sample.add(sample_ratings)

now I get the error

TypeError: Can't convert 'int' object to str implicitly

So, if sample is a series, you’ll need to convert it first to a data frame and then add a new column to it.
You can use a pandas.DataFrame() function to create a dataframe containing sample series and then add a new column.

1 Like

It worked, thank you!
For anyone else who gets stuck on this, here is the code I used.

sample_df = pd.DataFrame(data = sample)

sample_ratings = [25630, 31612, 6835, 30684, 87952, 3832, 
                  7, 48533, 10228, 180138, 4649, 12095, 
                  99917, 58476, 12250, 1148, 71277, 13332,
                 145910, 20594]
sample_df['num_reviews'] = sample_ratings
print(sample_df.head(3))
1 Like