Vectorization: Not a valid collection

2024/10/7 22:19:21

I wanna vectorize a txt file containing my training corpus for the OneClassSVM classifier. For that I'm using CountVectorizer from the scikit-learn library. Here's below my code:

def file_to_corpse(file_name, stop_words):array_file = []with open(file_name) as fd:corp = fd.readlines()array_file = np.array(corp)stwf = stopwords.words('french')for w in stop_words:stwf.append(w)vectorizer = CountVectorizer(decode_error = 'replace', stop_words=stwf, min_df=1)X = vectorizer.fit_transform(array_file)return X

When I run my function on my file (around 206346 line) I get the following error and I can't seem to understand it:

Traceback (most recent call last):File "svm.py", line 93, in <module>clf_svm.fit(training_data)File "/home/imane/anaconda/lib/python2.7/site-packages/sklearn/svm/classes.py", line 1028, in fitsuper(OneClassSVM, self).fit(X, np.ones(_num_samples(X)), sample_weight=sample_weight,File "/home/imane/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 122, in _num_samples" a valid collection." % x)
TypeError: Singleton array array(<536172x13800 sparse matrix of type '<type 'numpy.int64'>'with 1952637 stored elements in Compressed Sparse Row format>, dtype=object) cannot be considered a valid collection.

Can somebody please help me solve this problem? I've been stuck for a while :).

Answer

If you look at the source, you can find it here for instance, you can find that it checks for this condition to be true (x being your array)

if len(x.shape) == 0:

if so, it will raise this exception

TypeError("Singleton array %r cannot be considered a valid collection." % x)

What I would suggest is that you try to find out if array_file or your return value from this function has a shape length > 0

https://en.xdnf.cn/q/70191.html

Related Q&A

Solve a simple packing combination with dependencies

This is not a homework question, but something that came up from a project I am working on. The picture above is a packing configuration of a set of boxes, where A,B,C,D is on the first layer and E,F,G…

ImportError: cannot import name FFProbe

I cant get the ffprobe package to work in Python 3.6. I installed it using pip, but when I type import ffprobe it saysTraceback (most recent call last): File "<stdin>", line 1, in <m…

Generate larger synthetic dataset based on a smaller dataset in Python

I have a dataset with 21000 rows (data samples) and 102 columns (features). I would like to have a larger synthetic dataset generated based on the current dataset, say with 100000 rows, so I can use it…

Executing python script in android terminal emulator

I installed python 2.7 in my Android device and I tried executing a python script by typing the command in terminal emulator. The problem is that although I use the full path for python the following e…

How to return error messages in JSON with Bottle HTTPError?

I have a bottle server that returns HTTPErrors as such:return HTTPError(400, "Object already exists with that name")When I receive this response in the browser, Id like to be able to pick out…

Cant execute msg (and other) Windows commands via subprocess

I have been having some problems with subprocess.call(), subprocess.run(), subprocess.Popen(), os.system(), (and other functions to run command prompt commands) as I cant seem to get the msg command to…

Django development server stops after logging into admin

I have installed django 3.0 in python 3.7 and started a basic django project. I have created a superuser and run the development server using python manage.py runserver. When I go to localhost:8000/adm…

fastai.fastcore patch decorator vs simple monkey-patching

Im trying to understand the value-added of using fastais fastcore.basics.patch_to decorator. Heres the fastcore way: from fastcore.basics import patch_toclass _T3(int):pass@patch_to(_T3) def func1(self…

Adding user to group on creation in Django

Im looking to add a User to a group only if a field of this User is specified as True once the User is created. Every User that is created would have a UserProfile associated with it. Would this be the…

imgradient matlab equivalent in Python

I am searching for an imgradient MATLAB equivalent in Python. I am aware of cv2.Sobel() and cv2.Laplacian() but it doesnt work as imgradient works in MATLAB. If I could get source code of imgradient.m…