Understanding DictVectorizer in scikit-learn?

2024/10/2 1:26:18

I'm exploring the different feature extraction classes that scikit-learn provides. Reading the documentation I did not understand very well what DictVectorizer can be used for? Other questions come to mind. For example, how can DictVectorizer be used for text classification?, i.e. how does this class help handle labelled textual data? Could anybody provide a short example apart from the example that I already read at the documentation web page?

Answer

say your feature space is length, width and height and you have had 3 observations; i.e. you measure length, width & height of 3 objects:

       length  width  height
obs.1       1      0       2
obs.2       0      1       1
obs.3       3      2       1

another way to show this is to use a list of dictionaries:

[{'height': 1, 'length': 0, 'width': 1},   # obs.2{'height': 2, 'length': 1, 'width': 0},   # obs.1{'height': 1, 'length': 3, 'width': 2}]   # obs.3

DictVectorizer goes the other way around; i.e given the list of dictionaries builds the top frame:

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> d = [{'height': 1, 'length': 0, 'width': 1},
...      {'height': 2, 'length': 1, 'width': 0},
...      {'height': 1, 'length': 3, 'width': 2}]
>>> v.fit_transform(d)
array([[ 1.,  0.,  1.],   # obs.2[ 2.,  1.,  0.],   # obs.1[ 1.,  3.,  2.]])  # obs.3# height, len., width   
https://en.xdnf.cn/q/70901.html

Related Q&A

Parsing RSS with Elementtree in Python

How do you search for namespace-specific tags in XML using Elementtree in Python?I have an XML/RSS document like:<?xml version="1.0" encoding="UTF-8"?> <rss version=&quo…

String module object has no attribute join

So, I want to create a user text input box in Pygame, and I was told to look at a class module called inputbox. So I downloaded inputbox.py and imported into my main game file. I then ran a function in…

TypeError: the JSON object must be str, not Response with Python 3.4

Im getting this error and I cant figure out what the problem is:Traceback (most recent call last):File "C:/Python34/Scripts/ddg.py", line 8, in <module>data = json.loads(r)File "C:…

Redirect while passing message in django

Im trying to run a redirect after I check to see if the user_settings exist for a user (if they dont exist - the user is taken to the form to input and save them).I want to redirect the user to the app…

Django sorting by date(day)

I want to sort models by day first and then by score, meaning Id like to see the the highest scoring Articles in each day. class Article(models.Model):date_modified = models.DateTimeField(blank=True, n…

ImportError: No module named pynotify. While the module is installed

So this error keeps coming back.Everytime I try to tun the script it returns saying:Traceback (most recent call last):File "cli.py", line 11, in <module>import pynotify ImportError: No …

business logic in Django

Id like to know where to put code that doesnt belong to a view, I mean, the logic.Ive been reading a few similar posts, but couldnt arrive to a conclusion. What I could understand is:A View is like a …

Faster alternatives to Pandas pivot_table

Im using Pandas pivot_table function on a large dataset (10 million rows, 6 columns). As execution time is paramount, I try to speed up the process. Currently it takes around 8 secs to process the whol…

How can I temporarily redirect the output of logging in Python?

Theres already a question that answers how to do this regarding sys.stdout and sys.stderr here: https://stackoverflow.com/a/14197079/198348 But that doesnt work everywhere. The logging module seems to …

trouble with creating a virtual environment in Windows 8, python 3.3

Im trying to create a virtual environment in Python, but I always get an error no matter how many times I re-install python-setuptools and pip. My computer is running Windows 8, and Im using Python 3.3…