Storing a list in pandas cell

Hey I have a dictionary where the key is a category and the value is a list of people in that category, I want to convert it into a data frame so that one pandas cell has the entire list by category.

    name_category = pd.DataFrame()
    p=0

    for key, value in name_list.items():
       name_category.loc[p,'category']= key
       name_category.loc[p,'name_list']= value
       p=p+1

getting the error : ValueError: Must have equal len keys and value when setting with an iterable
I will really appreciate the help.

Could you provide a full runnable example?
This error looks like a problem with pandas versions.
You can use enumerate built-in of python to avoid writing p=p+1

Hi

This error can appear because your dictionary has lists with different lengths . I try to get the same error with this example :

dic = {"A" : [1,2,3] , "B" : [1,2] }  #has list with different lenghts
dic2 = {"A" : [1,2,3] , "B" : [1,2,4] } #correct dict

If i try to pass the correct dict to DataFrame get :

df2 = pd.DataFrame(dic2)
>>> df2
   A  B
0  1  1
1  2  2
2  3  4

But if try so the same with varible dic :

dfw = pd.DataFrame(dic)
ValueError: arrays must all be same length

#or using for : 

for k ,v in dic.items() :
     df.loc[p,"nom"] = k
     df.loc[p,"list"] = v

ValueError: Must have equal len keys and value when setting with an iterable

In future , you can use the constructor to create dataframes passin a dictionary as parameter.

Thanks of unlocking a really difficult ValueError, such that i can’t even find the answer on stackoverflow (had to use chinese blog.csdn.net)

Beware of confusing 2 separate Errors here, the second of which is a little harder to explain.

I will explain ValueError: Must have equal len keys and value when setting with an iterable first.

df = pd.DataFrame({'AAA': range(5), 'BBB': list('abcde'), 'CCC': ['a', 12, 1.2, ['alist'], (1, 2)]})
df
#df.loc[df.AAA >= 3, 'CCC'] = [1, 3, 3] # ValueError: Must have equal len keys and value when setting with an iterable
df.loc[df.AAA >= 3, 'CCC'] = [1, 3]   # 2 rows matched, so assign to the 2 rows in order
df

You can run this example and see [1,3] is successfully assigned but [1,3,3] fails. This is because the code is expecting to assign to 2 rows (in this eg.) so you should be giving 2 values in the iterable. This is what the equal len keys and value in the error refers to.

But i can see assigning a list of values to a set of rows and single column is not what you want to achieve, but you wish to put a list in a single specific row and column. Now lets transition to ValueError: setting an array element with a sequence

Here is how to create it:

df = pd.DataFrame(data = [[800.0]], columns=['column'], index=['index'])

df
df.dtypes

#df['column'] = df['column'].astype(object)   # need this to prevent ValueError: setting an array element with a sequence.
df.loc['index', 'new_column'] = [400]
df
df.dtypes

Uncomment that type casting line and the error is gone. There is a longer explanation on the point of “you need the column to be object type to insert lists without this error”:https://stackoverflow.com/questions/33221655/valueerror-setting-an-array-element-with-a-sequence-for-pandas. What i still don’t understand is how setting the dtype of an existing column can solve the error for a not yet existing new_column. Wild guess is pandas just sets the new column dtype to the most common existing dtype, so fixing the existing column dtype propagated the solution to the new column.

But another problem is when you change it to df.loc['index', 'new_column'] = [400,200], it’s again ValueError: Must have equal len keys and value when setting with an iterable, which will disappear if you assign to existing column df.loc['index', 'column'] = [400,200] rather than new_column, so the hack is to use some placeholder values to create and fill new_column first (and ensure it is object dtype), then you can assign df.loc['index', 'new_column'] = [400,200] with no issues.

For your code, you can specify what columns you want first at dataframe construction time (their dtypes will automatically be set to the most general type object because there are no values for them to infer)

dic = {'A' : [1,2],
       "B" : [1,2,3] 
      }

#df = pd.DataFrame()
df = pd.DataFrame(columns=['category','name_list'])
df.dtypes


for row, (k ,v) in enumerate(dic.items()) :
    df.loc[row,"category"] = k
    df
    df.loc[row,"name_list"] = v
    df

If you had constructed the empty DataFrame without specifying columns, you can see it successfully runs df.loc[row,"category"] = k for the first row then breaks with ValueError: Must have equal len keys and value when setting with an iterable when trying to run df.loc[row,"name_list"] = v. This is the same issue as the df.loc['index', 'new_column'] = [400,200] described earlier.

To summarize:

  1. Generally, always specify as much as you know about the data structure (types,shape) early on when initializing a structure
  2. Be flexible to changing types (not only to solve API errors in this case but maybe also MemoryError and other issues in future)

Now i’m taking a break, mindblown, if anything is still not clear, ask away!