Feedback on Mission 3: Data Analysis Basics in R
Bold text in the below paragraph may be replaced by
followed by and
This workflow starts from a data collection, then the importation of data, following by the data cleaning and the exploration of its content (through transformation and visualization). The communication is the last step where some reports are produced to provide an interpretation and answers to the question.
- I love the data analysis workflow diagram and how it shows the application of the tidyverse packages in the data analysis process. It certainly gives the learner a sense about the data analysis process and can serve as a guide in carrying out analytic projects.
However, I think the diagram captures only the statistical approach to solving problems where you start with a question and then use data to answer the question. In some tasks, especially those related to machine learning, you may start with data and find out what the data is saying. For example, this video explains the statistical and machine learning approaches to solving data problems (start from the second minute). Also, IBM has introduced a data science methodology which follows the “top-down” approach of first starting with a question in data science projects. However, they also acknowledge the “bottom-up” approach of looking into large volumes of data to identify business goals suggested by the data.
So my suggestion is that the workflow diagram should not be characterized as the only approach to data analysis work.
R provided should be
R provided an impressive toolset to support the data analysis workflow
- Below, the open phrase should be
Once we have installed packages,...
Once we installed packages, we won’t need to do so with each time we open our R session (something we’ll get more context for in our guided project where we’ll set an R programming environment on our own computer).
Below text doesn’t read right
A dataset is typically stored on our computer under a file.
name at the end of the last line may be deleted as it is a bit confusing.
by assigning it to a variable reads right for me
readr package contains a function,
read_csv() , that’s specifically for importing data in CSV format into R. We can store the imported data as a dataframe by assigning it to a variable name.
- The second occurrence of
our should be deleted.
As for data cleaning skills, we’ll cover them in step 2 of our journey in our the Dataquest data analysis R path.
- It is not clear where in the learning section the below instruction refers to. Also, is
monster_jobs supposed to be
monster_jobs_clean ? I think the instruction should read:
Compare the result to the captured table view of the monster_jobs_clean below.
Compare the result to the table view of the
monster_jobs in the learning section.
- These 2 paragraphs may be combined to describe how the
color parameter of
qplot() will be used to differentiate salaries by job type. The following line should be deleted:
To do this we will use colors.
On the previous screen we visualized the salaries offered in data science. In this screen, let’s differentiate these salaries by type of job. To do this we will use colors.
To do so, we’ll use the same function,
qplot(), adding the
color parameter to specify how R should color our scatterplot.
- It seems bold
the should be deleted from below text.
The result is a scatterplot where a dot represents the minimum salary of the a given job identifier with a color for each type of job.
data analysis workflow diagram does not show in the downloaded mission takeaway PDF document.
Feedback on Mission 4: Project - Install RStudio
- Repetition of
the and also the hyperlink of
course thread in the community refers back to STEP 3 and not to any community thread.
Users of different operating systems generally follow the same steps, but if you find an operating system-specific differences or have operating system-specific installation questions, feel free to reach out on the the course thread in the community.
- Below, point 2 is clearer if it reads:
Install RStudio by dragging downloaded file into Application folder
Then, let’s follow the appropriate installation instructions for our computer’s operating system:
MAC OS X :
- Select the Mac OS X version.
- Install RStudio by dragging into Applications folder.
like below is better than
like this one in the below text.
Scroll down, and try typing a few expressions like this one. Press the
enter key to see the result.
- I felt the opening phrase should read;
We'll see any objects we creat....
We’ll see any objects we created, like
var_1 , under
values in the
let's in below text may change to
we have to or
If we want to remove selected objects from the workspace, let’s select the
grid view from the dropdown menu:
the in the following text
Make sure to download this dataset and store it in the your working directory .
we written should be
we have written
As we worked through the previous missions in this course, we written code in a text editor and console
- I feel the below text may not easily explain how to add comments to code to a beginner. Adding another line say:
To add a comment to a code, precede the comment with the hash # symbol (# like this), may help.
As we write scripts, it’s good practice to get used to adding comments to them to explain our code (
# like this )
bring should be deleted from below text
To bring help you practice more, here is another dataset prepared for you (the original version can be found here).
- Hyperlink of
course thread in the community points back to STEP 9
Please, publish your findings in the course thread in the community.