Malaria Detection using Deep Learning

Hello DQ Community,

This is my first submission to DQ and I decided to go with a Deep Learning image classification project. I tried my best to really understand why I chose the architecture of the NN and not simply regurgitate other people’s projects. I love ML and data science in general so any feedback would be appreciated.

I hope I followed the upload guidelines correctly.

Many thanks,
Oz

Malaria classification using deep learning .ipynb (911.0 KB)

Click here to view the jupyter notebook file in a new tab

7 Likes

uch trying to install opencv with anaconda but I am getting a lot of error messages !

Anyway, great notebook !

Why Malaria cells are class 0? I always thought the class we are interested to predict is the class 1.

You reached a good accuracy score. Is it enough to put you on the top 10 in Kaggle (I make the assumption this is a competition you entered in, but maybe I am wrong) ?

Maybe as a reader I would have been more interested in plotting the test predictions that are wrong. Also, I would have liked to find some comments about the features that best predict the malaria class. Is there a way with Deep Learning to plot the features importance? Though they may be only pixels it could be interesting to better understand what is happening in the cell “black box”.

Also, maybe add the confusion matrix. It helps to distinguish between false positives and false negatives as discussed here.

Best
W.

3 Likes

Thanks Wilfred,

The comments were very helpful. I’ve made the changes and updated my code. I got a slightly lower accuracy this time around but I see why you said I should add a confusion matrix to properly understand the precision as well. Overall I think it’s a solid beginner project and any and all help is welcome

Thank you again,
Oz

Malaria classification using deep learning (1).ipynb (1021.7 KB)

Click here to view the jupyter notebook file in a new tab

3 Likes

Very nice! Happy my comments help. This is a solid project I agree. And precision & recall are very good.
I appreciate the changes you made, now, how do you read the features importance? Looks strange to me: if I am not strong we can draw the conclusion that the most relevant pixels are those on the left border? How would you explain it?
I would have expected pixels into the cell border being importants too, but curiously pixels inside the cell border are super uniform.
Also, I dont understand the difference between image 1 and image 2.
Wait… I think I got it! You plot features importance image by image? Maybe someone will disagree, but feature importances (coef_ attribute for example with LogisticRegression) apply for the whole dataset in my mind, you don’t have such attribute that you could use with deep learning image classification? I mean a way to visualize where are the “silly pixels” in average over all your observations.

I am not familiar with image classification, so maybe I am missing something.

Best
W.

2 Likes

Hi Wilfred,

Thanks again for the helpful feedback. I’ve gone back and tried to do what you said and look for feature importance. So i did a little reading and came up with a way to do so I think. Unfortunately deep learning doesn’t provide the same feature importance attributes as ML classification algorithms. So there was no coeff_ or ‘best_feature_’ attribute that could use. So I came across sklearns ‘Canny’ function which seems to at least distinguish significant features like cell boundaries and important intracellular parts.

Anyway, here is a reupload. Slightly lower accuracy but still pretty good I think. I feel it’s more complete now. Anyway, do let me know what you think.

many thanks,

Oz

Malaria Classifier Jupyter Notebook File.ipynb (1.1 MB)

Click here to view the jupyter notebook file in a new tab

4 Likes

Hi @animus.agbor,

Thank you for your efforts! I feel a little confused because my comments give you work every time :slight_smile:

So, indeed, looks there is no clear way to generalize the importance of features.

Below is an example from this course, the observations are also pictures, so naively I thought that you could highlight the same kind of pattern. You can see that features importance have been reshaped, and now we can read from the chart that some of the pixels are better predictors than others.
But the detection of malaria is a much more complex problem than this example: the infection anomaly can be anywhere in the cell!

Best,
W.

1 Like

@WilfriedF:

Did you try creating a new environment? I also previously faced some issues when trying to install opencv into an existing (base) environment with tensorflow preinstalled (maybe because of compatability issues).

Try:

conda create --name newenv
conda activate newenv

Thereafter you can proceed to install opencv and tensorflow in the new environment.

Hope this helps!

2 Likes

Wow @animus.agbor looks very good! :tada: Was busy so I missed this previously… Am personally very interested in image detection too but have been busy lately haha

Adding on maybe you could do some more model hyperparameter tuning to improve your metrics?

1 Like

What would you recommend that I haven’t done yet? Any suggestions are welcome

Kind regards

Oz

@animus.agbor perhaps you could try to reduce or increase network architectural complexity or adjust the number of nodes at each layer, or use a different optimiser or learning rate and see how these could improve your accuracy. I also haven’t heard of saving the model as a .v3 file (I usually do it as .h5). Do you have any reference for that which I can explore? Thanks!