Hi,
Just completed the guided project. Any criticism on the analysis would be appreciated. I just want to know whether I’m heading down the right path.
Hello @jesmaxavier, You’ve done an amazing job in your project.
I really like how you’ve documented your work by providing explanations to your findings.
Here are some things you can refactor
Your Function Code
# Function name: print_full(a_list)
# Input: A python list
# Output: The full list
# Description: Jupyter notebooks by default do not display all the data for large datasets. It shows a few lines and summarizes
# the rest using ellipsis. This function helps to see the full python list
def print_full(a_list):
pd.set_option('display.max_rows', len(a_list))
print(a_list)
pd.reset_option('display.max_rows')
It would be great if you could make the comments in the above function be the function docstring. You can check the Docstring Convention.
The above prints the entire series. It would be great if you print a portion of the series items, may be like the first 10 rows … You can use Series.head
Warnings from your code
C:\Users\maxen.x\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py:6287: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._update_inplace(new_data)
Remember to filter the warnings generated by the code.The main warning encountered in your code is SettingWithCopyWarning, Read this from DataQuest Direct:
You are using Jupyter Notebook, It can print tables. When printing your DataFrames instead of using the print() function you can use the Ipythons display function to pretty print DataFrames as tables.
@info.victoromondi thank you very much for your inputs and for the time you put in to review.
I was not aware that there were standards for commenting. I shall have to look in to it.
With regards to the printing of the entire series, I thought it would be more helpful for the reader to verify if they had doubts in the analysis. Saves them the time making assumptions or running the code.
I shall look in to the article you mentioned. I had gone through the article in the project pages but I still had this creep in. More reason for me to improve
Your project looks awsome: very comprehensive structure, interesting style of writing, exhaustive markdown explanations, nice graphs and clean code. It was a great idea to compare the DETE and TAFE institutes’ employees on the same grouped bar plot.
Some comments from my side:
It’s better to re-run all the cells when the project is already ready, so as to have them ordered and starting from 1.
I didn’t fully understand the necessity to introduce the function print_full(). It seems that a normal print() would be enough.
The code cells [258] and [259] can be removed.
The code cells [250], [261], [276], [280] can be written in one line.
In some code cells ([287], [248], [251], [254], [266]) there’re commented drafts of code, probably it’s better to delete also them.
Thanks @Elena_Kosourova . I’ve made a couple of changes to the code based on victoromondi’s comments. Just failed to upload the corrected copy. I shall apply your suggestions as well. Sincerely appreciate your comments, I needed that motivation.