TypeError: object of type numpy.int64 has no len()

2024/11/16 1:22:26

I am making a DataLoader from DataSet in PyTorch.

Start from loading the DataFrame with all dtype as an np.float64

result = pd.read_csv('dummy.csv', header=0, dtype=DTYPE_CLEANED_DF)

Here is my dataset classes.

from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):def __init__(self, result):headers = list(result)headers.remove('classes')self.x_data = result[headers]self.y_data = result['classes']self.len = self.x_data.shape[0]def __getitem__(self, index):x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)return (x, y)def __len__(self):return self.len

Prepare the train_loader and test_loader

train_size = int(0.5 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True, num_workers=1)
test_loader = DataLoader(dataset=train_dataset)

Here is my csv file

When I try to iterate over the train_loader. It raises the error

for i , (data, target) in enumerate(train_loader):print(i)TypeError                                 Traceback (most recent call last)
<ipython-input-32-0b4921c3fe8c> in <module>
----> 1 for i , (data, target) in enumerate(train_loader):2     print(i)/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)635                 self.reorder_dict[idx] = batch636                 continue
--> 637             return self._process_next_batch(batch)638 639     next = __next__  # Python 2 compatibility/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)656         self._put_indices()657         if isinstance(batch, ExceptionWrapper):
--> 658             raise batch.exc_type(batch.exc_msg)659         return batch660 TypeError: Traceback (most recent call last):File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loopsamples = collate_fn([dataset[i] for i in batch_indices])File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>samples = collate_fn([dataset[i] for i in batch_indices])File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__return self.dataset[self.indices[idx]]File "<ipython-input-27-107e03bc3c6a>", line 12, in __getitem__x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__return self._getitem_axis(maybe_callable, axis=axis)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2091, in _getitem_axisreturn self._get_list_axis(key, axis=axis)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2070, in _get_list_axisreturn self.obj._take(key, axis=axis)File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 2789, in _takeverify=True)File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 4537, in takenew_labels = self.axes[axis].take(indexer)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2195, in takereturn self._shallow_copy(taken)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/range.py", line 267, in _shallow_copyreturn self._int64index._shallow_copy(values, **kwargs)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/numeric.py", line 68, in _shallow_copyreturn self._shallow_copy_with_infer(values=values, **kwargs)File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 538, in _shallow_copy_with_inferif not len(values) and 'dtype' not in kwargs:
TypeError: object of type 'numpy.int64' has no len()

Related issues:
https://github.com/pytorch/pytorch/issues/10165
https://github.com/pytorch/pytorch/pull/9237
https://github.com/pandas-dev/pandas/issues/21946

Questions:
How to workaround pandas issue here?

Answer

Reference:
https://github.com/pytorch/pytorch/issues/9211

Just add .tolist() to indices line.

def random_split(dataset, lengths):"""Randomly split a dataset into non-overlapping new datasets of given lengths.Arguments:dataset (Dataset): Dataset to be splitlengths (sequence): lengths of splits to be produced"""if sum(lengths) != len(dataset):raise ValueError("Sum of input lengths does not equal the length of the input dataset!")indices = randperm(sum(lengths)).tolist()return [Subset(dataset, indices[offset - length:offset]) for offset, length in zip(_accumulate(lengths), lengths)]
https://en.xdnf.cn/q/71399.html

Related Q&A

VS Code Pylance not highlighting variables and modules

Im using VS Code with the Python and Pylance extensions. Im having a problem with the Pylance extension not doing syntax highlight for things like modules and my dataframe. I would expect the modules…

How to compute Spearman correlation in Tensorflow

ProblemI need to compute the Pearson and Spearman correlations, and use it as metrics in tensorflow.For Pearson, its trivial :tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)But for Spe…

Pytorch loss is nan

Im trying to write my first neural network with pytorch. Unfortunately, I encounter a problem when I want to get the loss. The following error message: RuntimeError: Function LogSoftmaxBackward0 return…

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive deb…

Discrepancies between R optim vs Scipy optimize: Nelder-Mead

I wrote a script that I believe should produce the same results in Python and R, but they are producing very different answers. Each attempts to fit a model to simulated data by minimizing deviance usi…

C++ class not recognized by Python 3 as a module via Boost.Python Embedding

The following example from Boost.Python v1.56 shows how to embed the Python 3.4.2 interpreter into your own application. Unfortunately that example does not work out of the box on my configuration with…

Python NET call C# method which has a return value and an out parameter

Im having the following static C# methodpublic static bool TryParse (string s, out double result)which I would like to call from Python using the Python NET package.import clr from System import Double…

ValueError: Length of passed values is 7, index implies 0

I am trying to get 1minute open, high, low, close, volume values from bitmex using ccxt. everything seems to be fine however im not sure how to fix this error. I know that the index is 7 because there …

What is pythons strategy to manage allocation/freeing of large variables?

As a follow-up to this question, it appears that there are different allocation/deallocation strategies for little and big variables in (C)Python. More precisely, there seems to be a boundary in the ob…

Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds. from sklearn.neighbors import KNeighborsClassifierknn_clf = KNeig…