Covid 19 virus guided project

Guided project: Investigating COVID19 Virus trends

My Code:

covid_df_all_states_daily_sum <- covid_df_all_states_daily %>% group_by(Country_Region) %>% 
summarise( tested = sum(daily_tested), positive = sum(daily_positive), active = sum(active), hospitalized = sum(hospitalizedCurr)) %>% arrange(desc(tested))

covid_df_all_states_daily_sum

What I expected to happen:

# A tibble: 108 x 5
   Country_Region   tested positive  active hospitalized
   <fct>             <int>    <int>   <int>        <int>
 1 United States  17282363  1877179       0            0
 2 Russia         10542266   406368 6924890            0
 3 Italy           4091291   251710 6202214      1699003
 4 India           3692851    60959       0            0
 5 Turkey          2031192   163941 2980960            0
 6 Canada          1654779    90873   56454            0
 7 United Kingdom  1473672   166909       0            0
 8 Australia       1252900     7200  134586         6655
 9 Peru             976790    59497       0            0
10 Poland           928256    23987  538203            0
# ... with 98 more rows

What actually happened:

> covid_df_all_states_daily_sum
# A tibble: 0 x 5
... with 5 variables: Country_Region <chr>, tested <dbl>, positive <dbl>, 
active <dbl>, hospitalized <dbl>

Hey @ pankaj.pankajbisht.b,

Assuming that you’re following chalokwun answers for this one, but can you put out a bit more of a write up of the stuff beforehand to get an idea of what could’ve happened. I’m assuming it might be because of a missing package that wasn’t loaded.

1 Like

#Loading the file containing our dataset
covid_df ← read_csv(‘covid19.csv’)

dim(covid_df) #Displaying the dimention of our data

#Displaying the column names
vector_cols ← colnames(covid_df)
print(vector_cols)

#Displaying part of our dataset
head(covid_df)

#Loading the tibble library and Displaying the summary of our dataset
library(tibble)
glimpse(covid_df) #The glimpse fuction give use a summary of the dataset so that we have a clue of the data we are working with

Isolating the rows we need from our dataset

library(dplyr)

#Filtering only rows related to All States
covid_df_all_states ← covid_df >
filter(Province_State == “ALL STATES”) > select(-Province_State)
view(covid_df_all_states)
head(covid_df_all_states_daily)
#Isolating the Columns We Need from out dataset

covid_df_all_states_daily ← covid_df_all_states >
select(Date, Country_Region, active, hospitalizedCurr, daily_tested,
daily_positive) #Selecting data belonging to daily measures

covid_df_all_states_daily_sum ← covid_df_all_states_daily >
group_by(Country_Region) > summarise(tested = sum(daily_tested),
positive = sum(daily_positive),
active = sum(active),
hospitalized = sum(hospitalizedCurr)) >
arrange(desc(tested))

covid_df_all_states_daily_sum

Hey @pankaj.pankajbisht.b,

I think I’ve probably got an idea of what happened. It’s either one of two things:

  1. You’ve probably didn’t load an additional library that would be necessary to run those pipe operators that you’ve seen @chalokwun used called magrittr. This could be corrected using a more generalizable package called tidyverse which contains all of the necessary libraries to run all of this including magrittr, dplyr, and readr.

  2. You’ve likely accidentally deleted/saved over the previous item (covid_df_all_states_daily) which resulted in those empty cells.

Assuming that you’ve been checking each step along the way, I’m guessing it’s probably the second scenario in this case as the code is a copy of what you’ve provided.

Try out what I’ve got laid out above to see if the error is still there or not.

1 Like

covid_df_all_states and covid_df_all_states_daily variables were empty so i,deleted them and created once again. Thankyou so much