Read excel file with cells containing reference to external excel workbooks

Is there any implemented function in openpyxl/panda/numpy to read real cell content containing a formula with reference to external workbooks in same path as the origin excel workbook?

As requested, here is compressed directory with some excel files and links between them: (30.8 KB)

Here are some more details from a post in this topic:

The desired result should be : =H8+E5+G7+‘C:[s2.xlsm]Sheet1’!$D$5+‘C:[s2.xlsm]Sheet1’!$D$1+‘C:\test_different_path[s3.xlsm]Sheet1’!$D$5. However, result actually is : =H8+E5+G7+[4]Sheet1!$D$5+[4]Sheet1!$D$1+[3]Sheet1!$D$5.
How to get the [s2.xlsm] and [s3.xlsm] instead of [4] and [3]?

1 Like

Hi @llonargarsa,

As per this link:

openpyxl seems to be reading the real cell content by default. This behaviour can be controlled by the data_only argument:


Hi Sahil, thanks so much for your answer.
Yes, openpyxl reads the formula by default. And that is what I need. However, as far as I understood from the link you sent me, that link is trying to get a value, rather than a formula. This is not what I need.

What I need is to get the formula when that formula is containing a link to an external workbook. Let me explain: I need to get a formula like this:
(1) =H8+E5+G7+‘s1’!$E$5+'s2.xlsm’Sheet1!$D$1+'s1.xlsm’Sheet1!$D$5

However, this is what openpyxl is providing:
(2) =H8+E5+G7+‘s1’!$E$5+[2]Sheet1!$D$1+[3]Sheet1!$D$5
How can I get the output (1), I mean, the real files ‘s2.xlsm’ and ‘s1.xlsm’ name instead of those indexes [2] and [3]?
Thanks in advance and have a nice day


Can you please share a compressed folder with some minimal data? This will help people investigate your problem.

I usually work in Linux and it’s hard for me to set things up.

In the zip file you shared a script very similar to the following.

from openpyxl import load_workbook

wb1 = load_workbook('s1.xlsm')
ws = wb1["Sheet1"]
c = ws['I9']
print (c.value)

It outputs


and you wanted the output to be


The solution I’m proposing here gets you close to your desired goal. You’ll need to tweak it to make it work exactly as you want.

Each Excel file contains references to external files. These can be accessed with wb1._external_links.
I won’t explore this object here, I’ll dive straight into where the information we want is:

>>> for idx, el in enumerate(wb1._external_links):
...     print(idx,
0 s2.xlsm
1 test_different_path/s3.xlsm
2 /test_different_path/s3.xlsm
3 /s2.xlsm

This is the information you want, only the indexes are off by one. That’s because Excel (and VBA) use 1-based indexing, so 3 above matches 4 in your result and so on.

You can now use this information to replace the numbers by the filepath appropriately.

I couldn’t figure out how exactly the filepaths above work due me being unable to open the files in Excel.

I hope this helps.


Hi Bruno,
Thanks so much and have a wonderful weekend !!


Hello Team, this is my first ask for assistance. Was wondering if anyone is having issues with the excel guided project not tracking progress. The 2nd, 3rd and 4th Guided Excel screen projects are not tracking my progress after I hit next. This is after I follow all instructions and save my work … I click next and no tick marks. Please help. Which also means no certificate :mortar_board: :mortar_board:

1 Like

Hi @llonargarsa,

It seems to be an issue with the platform. Could you please report it to the Technical Support Team of Dataquest? Just click the ? button in the upper-right corner of the platform screen where you detected this issue, click Report an Issue, describe the problem, and send the ticket. Thanks!

Happy to do that @Elena_Kosourova … much appreciated

1 Like