How we decide which columns are right for our analysis, I feel confused about this decision in AppleStore and GooglePlay datasets.
For the scope of that project, to answer our main question, we are interesed in several factors:
- some indicators of popularity of the apps (all the columns with ratings, reviews, and installs)
- information about price, which should be 0 in our case (hence, any columns with price and currency, and also the
'Type'column, showing if an app is free or not)
- genres (all the columns with genres or categories of apps).
Luckily, the column names of both datasets are rather self-explanatory to find all these factors. Otherwise, we could consult the corresponding data dictionaries of these datasets to exlore the meaning of each columns: Android apps and iOS apps.