I am trying to unstack() data in a Pandas dataframe, but I keep getting this error, and I'm not sure why. Here is my code so far with a sample of my data. My attempt to fix it was to remove all rows where voteId was not a number, which did not work with my actual dataset. This is happening both in an Anaconda notebook (where I am developing) and in my production env when I deploy the code.
I could not figure out how to reproduce the error in my sample code... possibly due to a typecasting issue that doesnt exist when you instantiate the dataframe like I did in the sample?
#dataset simulate likely input
# d = {'vote': [100, 50,1,23,55,67,89,44],
# 'vote2': [10, 2,18,26,77,99,9,40],
# 'ballot1': ['a','b','a','a','b','a','c','c'],
# 'voteId':[1,2,3,4,5,'aaa',7,'NaN']}
# df1=pd.DataFrame(d)
#########################################################df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)
dflw=pd.DataFrame(s)
Full error message/stack trace:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-10-0a520180a8d9> in <module>()22 df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')23
---> 24 s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()25 s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)26 dflw=pd.DataFrame(s)~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in unstack(self, level, fill_value)4567 """4568 from pandas.core.reshape.reshape import unstack
-> 4569 return unstack(self, level, fill_value)4570 4571 _shared_docs['melt'] = ("""~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value)467 if isinstance(obj, DataFrame):468 if isinstance(obj.index, MultiIndex):
--> 469 return _unstack_frame(obj, level, fill_value=fill_value)470 else:471 return obj.T.stack(dropna=False)~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in _unstack_frame(obj, level, fill_value)480 unstacker = partial(_Unstacker, index=obj.index,481 level=level, fill_value=fill_value)
--> 482 blocks = obj._data.unstack(unstacker)483 klass = type(obj)484 return klass(blocks)~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in unstack(self, unstacker_func)4349 new_columns = new_columns[columns_mask]4350
-> 4351 bm = BlockManager(new_blocks, [new_columns, new_index])4352 return bm4353 ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)3035 self._consolidate_check()3036
-> 3037 self._rebuild_blknos_and_blklocs()3038 3039 def make_empty(self, axes=None):~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)3127 3128 if (new_blknos == -1).any():
-> 3129 raise AssertionError("Gaps in blk ref_locs")3130 3131 self._blknos = new_blknosAssertionError: Gaps in blk ref_locs