Managing python packages, how to

Hello,

In this lesson, it is explained that “the best way to install packages is to use the command line and a program called pip.”. The exercise on the screen lets me practice with the requests library.

I am immediately trying to apply these new skills on my own laptop as well (in addition to the Dataquest play-ground at the right half of the screen), and (after some struggling to get pip to work in the first place), I now have the following:

What we can see on this screenshot:

  • pip does confirm to me that the requests package is installed
  • pip, however, claims not to find the packages matplotlib, pandas or seaborn
    However, I am very sure that those are installed on my laptop as well. Else I could never have completed many ‘guided projects’. And also on the screenshot you can see that according to Anaconda, matplotlib is installed.

Question: why would pip show the expected result for requests, but not for other packages?

More in general, I am not sure how to manage packages. Via some internetting and from this post I understood that apparently there are two main ways, using pip and using Anaconda. And that you best do not mix up those two.

Okay, point taken. Do note that while right now I am recommended to use pip, in an earlier lesson Dataquest wrote “we strongly recommend installing the Anaconda distribution”. So that is why I ended up with the situation that I have now :wink:

So, in addition to my specific question above, any further advice on what I should do best to manage my “python work environment” (including packages) is very welcome!

Hi @jasperquak, I’ll attempt to answer your doubts.

if you enter pip list in WSL is there any output? There should be a table of the installed packages and their corresponding version.

On whether using pip or conda package managers, I would say it depends on your use case, again.

As per this post and this other post, I’d say it will depend on what libraries you would rely on or what environment/language(s) you intend to use. For smaller projects (that may not require full blown jupyter notebooks for your to document and for you to later recall the steps you took, then maybe a simple python script in your code editor of choice with the packages installed using pip might do the trick and reduce overhead. Otherwise, if you can foresee you will need a notebook or use other languages besides python, then use conda as your package manager.

Of course, both pip and conda have the options of creating virtual environments (pip requiring the venv/virtualenv) to test your code or install specific versions of packages for a particular project where there might be version conflicts, so I think it’s up to your personal preference. I’ve attached a summary of the key differences in the diagram below.

image

I’ve also attached a video on virtual envs below which you may find helpful.

1 Like

Hi @masterryan.prof , thank you for your reply and the links!

My current conclusion, after reading the posts that you referred to and some other websites, is that packages-installed-via-pip and packages-from-Anaconda are just two different sets of packages that can co-exist. Where I expected it to be one repository of packages. And it seems that the ‘list’ command in the respective command prompts does not necessarily see the packages from the other.

Some things that I did and saw on my laptop seem to confirm this as well. You typed: “if you enter pip list in WSL is there any output? There should be a table of the installed packages and their corresponding version.”
Yes, there was. It did not include matplotlib, pandas or seaborn though. I was curious what would happen then if I asked to install one of them, as I was/am convinced I had these packages. I typed pip install seaborn, which gave this result:

So it installed seaborn and on the back of that also matplotlib and pandas and many others! Even though I had them already installed to my best understanding.

I then typed again pip list. Now they were included in the list.
Then, I opened also an Anaconda prompt and typed conda list there.
Also there I got a list of packages.
Seeing them next to each other, I got to see this:

So there is some overlap between both lists of packages, but also a lot of differences. And besides, I see different versions numbers in both lists, e.g. for pandas and matplotlib.

So therefore my conclusion that there is not one repository of packages, but different sets of packages now on my laptop.

For someone who knows how all of this works this is maybe completely trivial, but for me all of this stuff is pretty new…

(And in all honesty, while Dataquest does a great job in teaching me to code Python in a Jupyter Notebook, all this “managing the environment” stuff (packages, Anaconda, distributions, command prompts, virtual environments etc etc etc) is mostly trial-and-error and a lot of web searching for me. I would love to see more structured courses around those topics. Targeted on those without a computer science background. Anyway, I guess I learnt something again!)

1 Like

That’s absolutely correct. Even the base/default install of those (i.e. installing pip from scratch and conda from scratch on the same machine) would also give different tables with lots of differences in the packages and versionings.

I guess so haha… maybe you could create a ticket to raise it as a suggestion to the team.

About your suggestion, there might be differences in systems and packages (i.e. DQ might have to keep updating their version of conda and pip to keep up with the latest releases and attempt to simulate user desktop environments), so I can foresee some incompatibilities (between packages to install in the lab exercises on the DQ platform) and also differences with the learner’s desktop environment, which may cause some frustrations, while I agree it might be good to show learners how to do so, be it through external documentation or videos about the different python package managers.

On one end, while I agree it’s true for convenience and value sake (i.e. links or resources to be provided through a lesson to “get what you paid for”), I would say not say that any course (including those by DQ) is a definitive “full course/guide” or “be all end all” as Googling and searching for answers is also a required skill for data scientist/analyst or whoever is working with computers as we run into issues/doubts, since platforms, software installed and OSes used by end-users vary (so it might be infeasible to have all possible solutions to the problems “predicted” by the learning platform we use).

Hope this helps! :slight_smile:

1 Like

A belated thank you for this reply @masterryan.prof What you wrote makes sense to me - including your explanation why it would be challenging to teach such topics via courses.

I’ll continue my journey with doing the courses, searching the web in case I run into issues, trial-and-error… and occasionally posting a question on this forum if I cannot find the answer :grinning:.

Thanks again.

1 Like