Thanks of unlocking a really difficult ValueError, such that i can’t even find the answer on stackoverflow (had to use chinese blog.csdn.net)
Beware of confusing 2 separate Errors here, the second of which is a little harder to explain.
I will explain ValueError: Must have equal len keys and value when setting with an iterable
first.
df = pd.DataFrame({'AAA': range(5), 'BBB': list('abcde'), 'CCC': ['a', 12, 1.2, ['alist'], (1, 2)]})
df
#df.loc[df.AAA >= 3, 'CCC'] = [1, 3, 3] # ValueError: Must have equal len keys and value when setting with an iterable
df.loc[df.AAA >= 3, 'CCC'] = [1, 3] # 2 rows matched, so assign to the 2 rows in order
df
You can run this example and see [1,3]
is successfully assigned but [1,3,3]
fails. This is because the code is expecting to assign to 2 rows (in this eg.) so you should be giving 2 values in the iterable. This is what the equal len keys and value
in the error refers to.
But i can see assigning a list of values to a set of rows and single column is not what you want to achieve, but you wish to put a list in a single specific row and column. Now lets transition to ValueError: setting an array element with a sequence
Here is how to create it:
df = pd.DataFrame(data = [[800.0]], columns=['column'], index=['index'])
df
df.dtypes
#df['column'] = df['column'].astype(object) # need this to prevent ValueError: setting an array element with a sequence.
df.loc['index', 'new_column'] = [400]
df
df.dtypes
Uncomment that type casting line and the error is gone. There is a longer explanation on the point of “you need the column to be object type to insert lists without this error”:https://stackoverflow.com/questions/33221655/valueerror-setting-an-array-element-with-a-sequence-for-pandas. What i still don’t understand is how setting the dtype of an existing column can solve the error for a not yet existing new_column
. Wild guess is pandas just sets the new column dtype to the most common existing dtype, so fixing the existing column dtype propagated the solution to the new column.
But another problem is when you change it to df.loc['index', 'new_column'] = [400,200]
, it’s again ValueError: Must have equal len keys and value when setting with an iterable
, which will disappear if you assign to existing column df.loc['index', 'column'] = [400,200]
rather than new_column
, so the hack is to use some placeholder values to create and fill new_column
first (and ensure it is object dtype), then you can assign df.loc['index', 'new_column'] = [400,200]
with no issues.
For your code, you can specify what columns you want first at dataframe construction time (their dtypes will automatically be set to the most general type object
because there are no values for them to infer)
dic = {'A' : [1,2],
"B" : [1,2,3]
}
#df = pd.DataFrame()
df = pd.DataFrame(columns=['category','name_list'])
df.dtypes
for row, (k ,v) in enumerate(dic.items()) :
df.loc[row,"category"] = k
df
df.loc[row,"name_list"] = v
df
If you had constructed the empty DataFrame without specifying columns, you can see it successfully runs df.loc[row,"category"] = k
for the first row then breaks with ValueError: Must have equal len keys and value when setting with an iterable
when trying to run df.loc[row,"name_list"] = v
. This is the same issue as the df.loc['index', 'new_column'] = [400,200]
described earlier.
To summarize:
- Generally, always specify as much as you know about the data structure (types,shape) early on when initializing a structure
- Be flexible to changing types (not only to solve API errors in this case but maybe also MemoryError and other issues in future)
Now i’m taking a break, mindblown, if anything is still not clear, ask away!