Unexpected return from explore_data() function

Screen Link:

My Code:

from csv import reader

## Google Play Data Set ##

opened_file = open('googleplaystore.csv')
read_file = reader('opened_file')
android = list(read_file)
android_header = android[0]
android = android[1:]

## App Store Data Set ##

opened_file = open('AppleStore.csv')
read_file = reader('opened_file')
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print('\n') # adds an empty line after each row for legibility
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

explore_data(android, 0, 3, True)

explore_data(ios, 0, 3, True)

What I expected to happen:
Expected function to return information from the two data sets, including header and first few columns, i.e.,

[‘App’, ‘Category’, ‘Rating’, ‘Reviews’, ‘Size’, ‘Installs’, ‘Type’, ‘Price’, ‘Content Rating’, ‘Genres’, ‘Last Updated’, ‘Current Ver’, ‘Android Ver’]

[‘Photo Editor & Candy Camera & Grid & ScrapBook’, ‘ART_AND_DESIGN’, ‘4.1’, ‘159’, ‘19M’, ‘10,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Art & Design’, ‘January 7, 2018’, ‘1.0.0’, ‘4.0.3 and up’]

[‘Coloring book moana’, ‘ART_AND_DESIGN’, ‘3.9’, ‘967’, ‘14M’, ‘500,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Art & Design;Pretend Play’, ‘January 15, 2018’, ‘2.0.0’, ‘4.0.3 and up’]

[‘U Launcher Lite – FREE Live Cool Themes, Hide Apps’, ‘ART_AND_DESIGN’, ‘4.7’, ‘87510’, ‘8.7M’, ‘5,000,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Art & Design’, ‘August 1, 2018’, ‘1.2.4’, ‘4.0.3 and up’]

Number of rows: 10841
Number of columns: 13

What actually happened:





Number of rows: 10
Number of columns: 1

The same output was generated for both the googleplaystore.csv and AppleStore.csv files. I have tried this in both the Jupyter notebook built into the DQ interface, and in a separate notebook via Anaconda (after downloading both .csv files), with the same results multiple times.

Hello @harper.ragin, welcome to the community!

The problem is in the use of the reader function. You’re passing the string 'opened_file' to the function instead of the object opened_file. When you do this, your dataset becomes only that string. Therefore, when you call explore_data it prints each character in the 'opened_file'. As you requested the first 4 elements, you get the letters o, p, e, and n.

Run it without the quotation marks for both datasets:

read_file = reader(opened_file)

I hope this helps you.

1 Like

@harper.ragin: You are trying to read the string "opened_file" and not the value saved in the variable opened_file. Simply remove the quotes to resolve the issue.

Wonder why I was staring at your code and the answer for so long and those 2 looked identical :rofl:

1 Like

Same thing here :joy: :joy: :joy: :sweat_smile:

1 Like

Naturally this hit me as well many hours later as I was falling asleep, of course. Thank you so much!

1 Like