textcat - architecture extra fields not permitted

2024/10/1 7:44:29

I've been trying to practise what I've learned from this tutorial:(https://realpython.com/sentiment-analysis-python/) using PyCharm.

And this line:

textcat.add_label("pos")

generated a warning: Cannot find reference 'add_label' in '(Doc) -> Doc | (Doc) -> Doc'

I understand that this is because "nlp.create_pipe()" generates a Doc not a string, but (essentially because I don't know what to do in this case!) I ran the script anyway, but then I got the an error from this line:

textcat = nlp.create_pipe("textcat", config={"architecture": "simple_cnn"})

Error msg:

raise ConfigValidationError(
thinc.config.ConfigValidationError:Config validation errortextcat -> architecture extra fields not permitted{'nlp': <spacy.lang.en.English object at 0x0000015E74F625E0>, 'name': 'textcat', 'architecture': 'simple_cnn', 'model': {'@architectures': 'spacy.TextCatEnsemble.v2', 'linear_model': {'@architectures': 'spacy.TextCatBOW.v1', 'exclusive_classes': True, 'ngram_size': 1, 'no_output_layer': False}, 'tok2vec': {'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v1', 'width': 64, 'rows': [2000, 2000, 1000, 1000, 1000, 1000], 'attrs': ['ORTH', 'LOWER', 'PREFIX', 'SUFFIX', 'SHAPE', 'ID'], 'include_static_vectors': False}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 64, 'window_size': 1, 'maxout_pieces': 3, 'depth': 2}}}, 'threshold': 0.5, '@factories': 'textcat'}

I'm using:

  • Pycharm v: 2019.3.4
  • python v: 3.8.6
  • spaCy v: 3.0.5
Answer

Man! Did the that full spaCy upgrade really obliterate that tutorial or what...

There's a couple things you might be able to get around. I haven't fully fixed that broken tutorial. It's on the To-Do list. However, I did get around the exact issue you're having.

textcat = nlp.create_pipe("textcat", config={"architecture": "simple_cnn"})

This create_pipe behavior has been deprecated so you can just directly add to the workflow with add_pipe. So one thing you could do is the following:

from spacy.pipeline.textcat import single_label_cnn_config<more good code>nlp = spacy.load("en_core_web_trf")
if "textcat" not in nlp.pipe_names:nlp.add_pipe('textcat', config=single_label_cnn_config, last=True)
textcat = nlp.get_pipe('textcat')
textcat.add_label("pos")
textcat.add_label("neg")

Let me know if this makes sense and helps. I'll try to revamp the tutorial entirely from spaCy in the coming weeks.

https://en.xdnf.cn/q/70982.html

Related Q&A

cv2.rectangle() calls overloaded method, although I give other parameter

cv2.rectangle has two ways of calling:img = cv.rectangle( img, pt1, pt2, color[, thickness[, lineType[, shift]]] ) img = cv.rectangle( img, rec, color[, thickness[, lineType[, shift]]]source:h…

Converting xls to csv in Python 3 using xlrd

Im using Python 3.3 with xlrd and csv modules to convert an xls file to csv. This is my code:import xlrd import csvdef csv_from_excel():wb = xlrd.open_workbook(MySpreadsheet.xls)sh = wb.sheet_by_name(S…

HTMLParser.HTMLParser().unescape() doesnt work

I would like to convert HTML entities back to its human readable format, e.g. £ to £, &deg; to etc.Ive read several posts regarding this question Converting html source content into read…

What security issues need to be addressed when working with Google App Engine?

Ive been considering using Google App Engine for a few hobby projects. While they wont be handling any sensitive data, Id still like to make them relatively secure for a number of reasons, like learnin…

Supporting multiple Python module versions (with the same version of Python)

I looked around but cannot find a clear answer to my question.I have a very legitimate need for supporting N-versions of the same Python module.If they are stored in the same same package/directory, th…

ImportError: cannot import name signals

Im using Django 1.3.0 with Python 2.7.1. In every test I write the following imports I get the importError above:from django.utils import unittest from django.test.client import ClientThe full stack tr…

Return a Pandas DataFrame as a data_table from a callback with Plotly Dash for Python

I would like to read a .csv file and return a groupby function as a callback to be displayed as a simple data table with "dash_table" library. @Lawliets helpful answer shows how to do that wi…

Nose: How to skip tests by default?

I am using Pythons nose and I have marked some of my tests as "slow", as explained in the attrib plugin documentation.I would like to skip all "slow" Tests by default when running n…

SQLAlchemy ORM select multiple entities from subquery

I need to query multiple entities, something like session.query(Entity1, Entity2), only from a subquery rather than directly from the tables. The docs have something about selecting one entity from a s…

How to ensure data is received between commands

Im using Paramiko to issue a number of commands and collect results for further analysis. Every once in a while the results from the first command are note fully returned in time and end up in the out…