I am new to PyTorch and recently, I have been trying to work with Transformers. I am using pretrained tokenizers provided by HuggingFace.
I am successful in downloading and running them. But if I try to save them and load again, then some error occurs.
If I use AutoTokenizer.from_pretrained
to download a tokenizer, then it works.
[1]: tokenizer = AutoTokenizer.from_pretrained('distilroberta-base')text = "Hello there"enc = tokenizer.encode_plus(text)enc.keys()Out[1]: dict_keys(['input_ids', 'attention_mask'])
But if I save it using tokenizer.save_pretrained("distilroberta-tokenizer")
and try to load it locally, then it fails.
[2]: tmp = AutoTokenizer.from_pretrained('distilroberta-tokenizer')---------------------------------------------------------------------------
OSError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)238 resume_download=resume_download,
--> 239 local_files_only=local_files_only,240 )/opt/conda/lib/python3.7/site-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)266 # File, but it doesn't exist.
--> 267 raise EnvironmentError("file {} not found".format(url_or_filename))268 else:OSError: file distilroberta-tokenizer/config.json not foundDuring handling of the above exception, another exception occurred:OSError Traceback (most recent call last)
<ipython-input-25-3bd2f7a79271> in <module>
----> 1 tmp = AutoTokenizer.from_pretrained("distilroberta-tokenizer")/opt/conda/lib/python3.7/site-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)193 config = kwargs.pop("config", None)194 if not isinstance(config, PretrainedConfig):
--> 195 config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)196 197 if "bert-base-japanese" in pretrained_model_name_or_path:/opt/conda/lib/python3.7/site-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)194 195 """
--> 196 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)197 198 if "model_type" in config_dict:/opt/conda/lib/python3.7/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)250 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"251 )
--> 252 raise EnvironmentError(msg)253 254 except json.JSONDecodeError:OSError: Can't load config for 'distilroberta-tokenizer'. Make sure that:- 'distilroberta-tokenizer' is a correct model identifier listed on 'https://huggingface.co/models'- or 'distilroberta-tokenizer' is the correct path to a directory containing a config.json file
Its saying 'config.josn' is missing form the directory. On checking the directory, I am getting list of these files:
[3]: !ls distilroberta-tokenizerOut[3]: merges.txt special_tokens_map.json tokenizer_config.json vocab.json
I know this problem has been posted earlier but none of them seems to work. I have also tried to follow the docs but still can't make it work.
Any help would be appreciated.