A pipeline package

I went through the Data pipelines course, which was great. I am now wondering, is there an existing package one would suggest to have similar functionalities and possibly more?
I know about sklearn.pipeline… but it does not seem exactly the same thing.
Since I am not a data engineer myself, I´d rely on an exiting tool, rather then develop it my self. Among things I´d love to be able to do are for instance:

  • handle more complex functions with multiple inputs and outputs
  • plotting graphically the graph of tasks in the pipeline, displaying function names, input and outputs of each
  • trace tasks across different modules within a package (I assume this is not immediate with the Pipeline class from the course)
    Thanks for any answer


You can combine Feature Union, Column Transformer and your own custom transformer with Pipeline in sklearn. These combination can handle multiple inputs, parallel and series transformation of columns etc.

1 Like