DataEngineering: ParallelProcessing: How many cores to use? and what I did!

So, I have been playing around with my local setup, mainly the collective power of the processor cores.
For some context, access this link.

To have come across parallel processing has been the best feeling so far!
It has been thoughtfully written by DQ but lacks some very interesting details which I hope to address in a separate topic.

To help my mission, I scraped 1GB+ worth of wikiversity data for my local processing as it was only 55MB on DQ’s end.

In addition, I have made a conscious decision on a couple of things:

  • How many cores to use (with reference to my local setup of course).
  • Reading files in bytes mode alleviating the trouble of having to encode into a specific format (chardet is there of course, but I didn’t see the need for it here).
  • Use of generators instead of passing lists while returning mapping function results.
  • Use of tuples (as needed) instead of a list.

I invite you to take a peek at my work
When I started, it seemed like a mountainous task. After completing, it feels like I’ve just got my feet wet. I sure do see way more that can be done further, but I’m happy with my first step.

Now, to my burning question:
Does it makes sense to use more processes (I’m not talking about threads!) than available logical cores? I’ve found a couple of interesting discussions online which were on threads, and I would like to hear DQ’s thoughts.

To all the learners and experts-in-the-making, I would love to hear your thoughts and any improvements that will benefit my understanding.

I appreciate your time!


This is very interesting, @veena.sanjeeve.line. Congratulations on being the first community member to share a Guided Project from the Data Engineering path! :tada:

Would love to see you write a topic, or better - an article on it! :grinning_face_with_smiling_eyes:

1 Like

Wow! @nityesh, that’s some recognition!
I wasn’t expecting that, but I’m happy I was seen :grin:

Regarding that, I’m on it :slight_smile:

1 Like