Just wanted to show my work and give you my thoughts on this project, which concludes finishing the entire Data Analyst in R pathway . I’ll probably give my thoughts on the whole R pathway in a later post. Going back to the project, I found it challenging from the standpoint of understanding the theory behind each step (particularly in regards to Hyperparameter Tuning). Although it was reassuring when I talked to a few software engineers I know that works in a FinTech and at Shopify that it’s pretty much guesswork.
Obviously what was covered in this section was a good intro into Machine Learning, but you’ll definitely need to do more work outside of it to really get a better understanding. My best help came from a youtube channel of a data scientist working at RStudio found here. Definitely something worth checking out if you need to understand some concepts through a worked example.
Anyways, check out my work and let me know what you think or what I can work on going out. It’ll be much appreciated.
Link to my GitHub
Hello @michael.hoang17 Awesome work you’ve done here. For me I’m a fan of Jupyter Notebook, I’ll try R Markdown someday.
Your read.csv Code
cars <- read.csv("C:/Users/micha/Downloads/imports-85.data", header = FALSE)
When doing a project it is always advisable to use Relative path. Always create a directory for your project. When referring to a particular file or directory use Relative Path.
├───Predicting Car Prices.Rmd
cars = read.csv("datasets/imports-85.data", header=F)
This was a very detailed work you’ve done. Kudos
Thanks for sharing your project with us, Michael! And congratulations on completing the entire Data Analyst in R path.
I am looking forward to your future post where you give your thoughts on the entire R pathway. I am also curious to know more about your discussions with those software engineers that you mentioned here
Thanks for checking it out, @info.victoromondi. I felt like I could’ve done something more with adding the categorical variables into the model (maybe use a generalized linear model), but will chalk that up to a “future me” thing. Having already done some exploration into Jupyter Notebook, I’ll say that having R Markdown in your back pocket is definitely something that you should have since the carryover b/t one and the other is not too bad.
Thanks for the suggestion about using a Relative Path instead of the Absolute Path. As for setting that up as suggested, is that something that you would do by uploading the dataset onto GitHub and have it referred to back in the actual file? I’ve never actually seen this process done as a step-by-step, so if you know there’s a video I can check out, that’ll be much appreciated.
I’ll definitely give my thoughts on the entire R Pathway soon.
As for the discussions with those software engineers and other tech folks, they really are just informal chats that kind of happens at my place of work. We happen to have much of our clients being tech folks working at various levels from both small-scale startups to multinational organizations (eg. Google, Amazon, Shopify) working in Downtown Toronto. Most of them know about me learning how to code (or really code better) and I sometimes mention some of the struggles I get into like best ideas on how to handle missing data, when data dredging goes wrong, how to clean a length of code better than some roundabout way, etc. Usually, they sympathize with my struggles and/or give me some direction on how to approach a given problem, which (while helpful) sometimes leads to going down into some rabbit hole.
In the case that I had referred to with hyperparameter tuning, they basically told me that it’s more or less a common thing that they run into and they also don’t really know how it works, just that they tried something then it somehow does and not question it. I’m assuming that’s what most programmers or software engineers come to realize is what comes with the territory.