AssertionError: Gaps in blk ref_locs when unstack() dataframe

2024/10/14 13:19:12

I am trying to unstack() data in a Pandas dataframe, but I keep getting this error, and I'm not sure why. Here is my code so far with a sample of my data. My attempt to fix it was to remove all rows where voteId was not a number, which did not work with my actual dataset. This is happening both in an Anaconda notebook (where I am developing) and in my production env when I deploy the code.

I could not figure out how to reproduce the error in my sample code... possibly due to a typecasting issue that doesnt exist when you instantiate the dataframe like I did in the sample?

#dataset simulate likely input
# d = {'vote': [100, 50,1,23,55,67,89,44], 
#      'vote2': [10, 2,18,26,77,99,9,40], 
#      'ballot1': ['a','b','a','a','b','a','c','c'],
#      'voteId':[1,2,3,4,5,'aaa',7,'NaN']}
# df1=pd.DataFrame(d)
#########################################################df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format) 
dflw=pd.DataFrame(s)

Full error message/stack trace:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-0a520180a8d9> in <module>()22 df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')23 
---> 24 s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()25 s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)26 dflw=pd.DataFrame(s)~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in unstack(self, level, fill_value)4567         """4568         from pandas.core.reshape.reshape import unstack
-> 4569         return unstack(self, level, fill_value)4570 4571     _shared_docs['melt'] = ("""~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value)467     if isinstance(obj, DataFrame):468         if isinstance(obj.index, MultiIndex):
--> 469             return _unstack_frame(obj, level, fill_value=fill_value)470         else:471             return obj.T.stack(dropna=False)~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in _unstack_frame(obj, level, fill_value)480         unstacker = partial(_Unstacker, index=obj.index,481                             level=level, fill_value=fill_value)
--> 482         blocks = obj._data.unstack(unstacker)483         klass = type(obj)484         return klass(blocks)~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in unstack(self, unstacker_func)4349         new_columns = new_columns[columns_mask]4350 
-> 4351         bm = BlockManager(new_blocks, [new_columns, new_index])4352         return bm4353 ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)3035         self._consolidate_check()3036 
-> 3037         self._rebuild_blknos_and_blklocs()3038 3039     def make_empty(self, axes=None):~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)3127 3128         if (new_blknos == -1).any():
-> 3129             raise AssertionError("Gaps in blk ref_locs")3130 3131         self._blknos = new_blknosAssertionError: Gaps in blk ref_locs
Answer

To get the real data triggered the exception, add extra debug information

Modify ~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py

add lines to class BlockManager()

def __init__(self)print('BlockManager blocks')pprint(self.blocks)print('BlockManager axes')pprint(self.axes)

You will the data:

_unstack_frame level -1 fill_value None vote  vote2
ballot1 voteId              
NaN     xx      100.0   10.0
False   aaa      50.1    2.0
-1      \n        1.0   18.0
True    NaN      23.0   26.0
b       False    55.0   77.0
a       \        67.0   99.0
c                89.0    9.08        44.0    NaN

Modify ~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py

def __unstack_frame(self, ...)from pprint import pprintprint('_unstack_frame level {} fill_value {} {}'.format(level, fill_value, type(obj)))pprint(obj)

You will see data:

BlockManager blocks
(FloatBlock: slice(0, 16, 1), 16 x 8, dtype: float64,)
BlockManager axes
[MultiIndex(levels=[[u'vote', u'vote2'], [False, 8, u'\n', u' ', u'\', u'aaa', u'xx']],labels=[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], [-1, 0, 1, 2, 3, 4, 5, 6, -1, 0, 1, 2, 3, 4, 5, 6]],names=[None, u'voteId']),Index([nan, -1, False, True, u'', u'a', u'b', u'c'], dtype='object', name=u'ballot1')]

I did trigger an exception with another example:

File "/usr/lib64/python2.7/site-packages/pandas/core/internals.py", line 2902, in _rebuild_blknos_and_blklocsraise AssertionError("Gaps in blk ref_locs")
AssertionError: Gaps in blk ref_locs

with debugging info

BlockManager blocks
(FloatBlock: [-1, -1, -1], 3 x 2, dtype: float64,)
BlockManager axes
[Index([aaa, bbb, ccc], dtype='object'), Int64Index([0, 1], dtype='int64')]
https://en.xdnf.cn/q/69409.html

Related Q&A

Python does not consider distutils.cfg

I have tried everything given and the tutorials all point in the same direction about using mingw as a compiler in python instead of visual c++.I do have visual c++ and mingw both. Problem started comi…

Is it possible to dynamically generate commands in Python Click

Im trying to generate click commands from a configuration file. Essentially, this pattern:import click@click.group() def main():passcommands = [foo, bar, baz] for c in commands:def _f():print("I a…

Different accuracy between python keras and keras in R

I build a image classification model in R by keras for R.Got about 98% accuracy, while got terrible accuracy in python.Keras version for R is 2.1.3, and 2.1.5 in pythonfollowing is the R model code:mod…

Named Entity Recognition in aspect-opinion extraction using dependency rule matching

Using Spacy, I extract aspect-opinion pairs from a text, based on the grammar rules that I defined. Rules are based on POS tags and dependency tags, which is obtained by token.pos_ and token.dep_. Belo…

Python Socket : AttributeError: __exit__

I try to run example from : https://docs.python.org/3/library/socketserver.html#socketserver-tcpserver-example in my laptop but it didnt work.Server :import socketserverclass MyTCPHandler(socketserver.…

How to save pygame Surface as an image to memory (and not to disk)

I am developing a time-critical app on a Raspberry PI, and I need to send an image over the wire. When my image is captured, I am doing like this:# pygame.camera.Camera captures images as a Surface pyg…

Plotting Precision-Recall curve when using cross-validation in scikit-learn

Im using cross-validation to evaluate the performance of a classifier with scikit-learn and I want to plot the Precision-Recall curve. I found an example on scikit-learn`s website to plot the PR curve …

The SECRET_KEY setting must not be empty || Available at Settings.py

I tried to find this bug, but dont know how to solve it.I kept getting error message "The SECRET_KEY setting must not be empty." when executing populate_rango.pyI have checked on settings.py …

Pandas: Applying Lambda to Multiple Data Frames

Im trying to figure out how to apply a lambda function to multiple dataframes simultaneously, without first merging the data frames together. I am working with large data sets (>60MM records) and I …

scipy.minimize - TypeError: numpy.float64 object is not callable running

Running the scipy.minimize function "I get TypeError: numpy.float64 object is not callable". Specifically during the execution of:.../scipy/optimize/optimize.py", line 292, in function_w…