International Trade Analysis Project: Data Manipulation Problem

I don’t know how to approach a task. I have a Pandas DataFrame that looks like this:

I need to calculate some ‘gravity’ variables, for example remoteness:

1 / SUM ((GPD_trade_partner)/(Distance))

I have exporter-importer pair, the flow of trade from one country to the other. But to calculate the remoteness for a given country in a given year I need to know all its trading partners (all countries that country ‘A’ traded with: as an importer and as an exporter, without doubling them if there was both imports and exports with this country B). For example, to have a DataFrame like this:

Or some other way to calculate remoteness without creating new columns/ new data frame. For example, I can calculate the weighted distance for each row in the DataFrame with importer GDP and exporter GDP (plus two columns). But how can I sum up this weighted distance only for unique exporter-importer pairs?

hey @lclvtn

Welcome to the DQ Community.

Thank you for posting this question. I went the longer route as done in the Stack Overflow post below before I searched for it :grin:

The SO post here has the solution what you are looking for.

Basically this line:

df['check_string'] = df.apply(lambda row: ''.join(sorted([row[exporter], row[importer]])), axis=1)

This of course, adds a column to the dataframe, which can then be used in subset parameter of drop_duplicates() method.

So just out of curiosity, any reason why, you don’t want to create a new column, to achieve this?

Thank you so much! I will try that.
I am okay with the new column, I didn’t know how to approach the task at all.
I meant that I am looking for any solution creating new columns/ dataframe or not.

1 Like

hey @lclvtn

Oh okay. Then it’s cool. :slight_smile: Because I was trying to do it without adding a new column, basically chaining of multiple codes.

You can actually add this column only for this intermediate step and then once your final dataframe is ready, you can drop this new column itself.

I’ve done the ‘check_string’ solution, it worked, it dropped rows with duplicate importer-exporter pairs, but I still could not figure out how to calculate remoteness for each country (weighted distance to all trade partners). So I did that and it worked:

# the original dataframe was 'year', 'exporter', 'importer'
df2 = df[['year', 'importer', 'exporter']]
df.columns = ['year','country','partner']
df2.columns = ['year','country','partner']
df.append(df2, ignore_index=True)

Then I’ve added partner’s GDP and distance between countries and this is how I calculated remoteness:

remotness = df.groupby('country')['dist_w'].sum().reset_index()

hey @lclvtn


I took the attached approach. I am not sure though is the remoteness numbers are correct.

If you find the time and take a look, please let me know if I did something wrong here.


Remoteness.ipynb (14.9 KB)

Click here to view the jupyter notebook file in a new tab

The idea is to calculate remoteness for each country, not for each pair.

So this is the goal:
Screenshot 2020-05-30 at 17.05.16

Sorry, wrong notebook. Here is my solution with your sample code:
Remoteness.ipynb (9.4 KB)

Click here to view the jupyter notebook file in a new tab