How to run pipeline in Python

So I have a code that creates a set of numbers/report for a group but I want that same code to create a report for multiple groups and create reports; without me have to manually run the code;

Things I have tried:
I know I can put a for loop for unique groups but I am looking for the more robust way.

Please include specific code. There isn’t enough information to process what exactly is the problem.

Hi Ankit,

‘Pipeline’ has other connotations and is probably not what you’re after.
Take a look at ‘functions’ (google Python functions), these use parameters. With parameters you can pass different sets of numbers (for each group) into the function. The function would generate a report for each set of numbers.
If your data is coming from a file/s you can look at using the ‘csv’ module, or Pandas read_csv

Have a go with those. If you would like more help post some specific code so we can better understand what you are trying to do.

Thanks

Hey @dcch,

I don’t have a code ; I am trying to work on a concept. I am very well familiar with pandas; functions and database ; May be I will try to explain the problem better:

  1. Have a humongous database which contains data for more than 10000 groups.
  2. Each group needs to be sent a report based on the data;
  3. basic structure is same
  4. How do I create reports for these groups in pdf and send them over through email( i can outsource that)

So you have described only the global structure, the way to solve your problem depends on several factors

  1. Database structure
  2. Available resources
  3. How your reports are generated

If the report can be generated only within the database, then after the end, depending on the total size of the table it can be considered in pandas, where to form the final structure for each group, and Then split the task into threads or between the workers in celery/dramatiq if speed is important.

If you want it as a pipeline in this task, dataquest

https://app.dataquest.io/m/266/multiple-dependency-pipeline

It can be described as a sequence of functions.

  1. To create a connection to the database
  2. To a unique group
  3. Generate a report for each group
    3.1 or 4 For each group send a letter

But as you can see all the same logic is described by a cycle for each unique group that you have voiced.

Again, if the question is precisely in speed. Instead of a cycle, you can break down tasks within processes, threads, and workers.

  1. get a list of unique groups
  2. divide it between the performers instructing them to generate a report and send letters
1 Like