GP_ Analyzing Wikipedia Pages (Using MapReduce)

Greetings !

Here goes the 11th guided project of my Dataquest journey.
This project has indeed helped me code faster and more efficiently. MapReduce is tool which can save a lot of time while going through a huge dataset.

Do go through and comment :smile:

Cheers!
Analyzing Wikipedia Pages.ipynb (67.3 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @shubhkirti.prasad,

Congratulations on completing the 11th project, but you made me feel dizzy again! :sweat_smile::rofl: Probably, my brain is just too fragile :stuck_out_tongue_winking_eye: Actually, I haven’t done the same project yet, but my question is: those 2-3 super-long outputs in your project, are they really so necessary? Could we inspect only several lines of them or maybe general statistics? I suspect that it’s just another technical issue, could you please take a look and let me know? Thank you!

1 Like

Hey @Elena_Kosourova

I have no clue why this glitches have popped up in my past two projects.
Sorry for the headaches :joy:.

I have edited the notebook and made sure that all those glitches are gone!

Thanks :grinning_face_with_smiling_eyes:

1 Like

Hi @shubhkirti.prasad,

Now everything is ok and I’m back to conciousness, so ready to give you my feedback! :grinning: :innocent: Your project looks great as always! Clean and explicit code, easy-to follow step-by-step approach, cool emphasizing especially those outputs in bold, as I’ve already told you before. I still haven’t done this topic, but your work looks impressive anyway! :heart_eyes:

What I would suggest:

  • Be more coherent with the code comments. Your code is perfectly commented, by the way, so what I suggest you here is moslty about style. For example, the code comment should go directly before the piece of code it’s related to, and not several lines in advance. Also, avoid too wordy comments (like # First let’s create a function to get the whole file at once :, ## Let’s check how our function performed as compared to query in series:), make them as concise as possible. Also, use only one hash symbol for each comment line.
  • I noticed that you defined several times the function reducer. I know you were doing iterative improvements, but at least 2 of several times are exactly the same.
  • It’s better to use quotation style only for real quotations.
  • For the subsequent code cells without any output or markdown between them, a good idea is to combine them into one ([9] and [10], [17] and [18]).
  • You might consider making the conclusion a bit more informative and wordy (but not too much :yum:).
  • It’s better to use bullet or numbered lists only when you have 2+ items to list.

Hope my suggestions were useful. Waiting for new cool projects from you, this time without headaches! :sweat_smile: :joy:

1 Like

Hey @Elena_Kosourova ,

Firstly, thanks alot for your review! Your words are always helpful!

  • I am trying to get better at commenting the code more efficiently, your views helped alot!

I will try my best to include all your points in my next project!

Thanks Again!

PS. Hopefully without headaches this time​:roll_eyes::joy:

1 Like

That’s great @shubhkirti.prasad, looking forward to see your next works! :grinning:

1 Like