Check image urls using python-markdown

2024/10/8 4:23:56

On a website I'm creating I'm using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems I'm requiring editors to upload all images to the site and then embed them (I'm using a markdown editor which I've patched to allow easy embedding of those images using standard markdown syntax).

However, I'd like to enforce the no-external-images policy in my code.

One way would be writing a regex to extract image URLs from the markdown sourcecode or even run it through the markdown renderer and use a DOM parser to extract all src attributes from img tags.

However, I'm curious if there's some way to hook into Python-Markdown to extract all image links or execute custom code (e.g. raising an exception if the link is external) during parsing.

Answer

One approach would be to intercept the <img> node at a lower level just after Markdown parses and constructs it:

import re
from markdown import Markdown
from markdown.inlinepatterns import ImagePattern, IMAGE_LINK_RERE_REMOTEIMG = re.compile('^(http|https):.+')class CheckImagePattern(ImagePattern):def handleMatch(self, m):node = ImagePattern.handleMatch(self, m)# check 'src' to ensure it is localsrc = node.attrib.get('src')if src and RE_REMOTEIMG.match(src):print 'ILLEGAL:', m.group(9)# or alternately you could raise an error immediately# raise ValueError("illegal remote url: %s" % m.group(9))return nodeDATA = '''
![Alt text](/path/to/img.jpg)
![Alt text](http://remote.com/path/to/img.jpg)
'''mk = Markdown()
# patch in the customized image pattern matcher with url checking
mk.inlinePatterns['image_link'] = CheckImagePattern(IMAGE_LINK_RE, mk)
result = mk.convert(DATA)
print result

Output:

ILLEGAL: http://remote.com/path/to/img.jpg
<p><img alt="Alt text" src="/path/to/img.jpg" />
<img alt="Alt text" src="http://remote.com/path/to/img.jpg" /></p>
https://en.xdnf.cn/q/70156.html

Related Q&A

How to unittest command line arguments?

I am trying to supply command line arguments to Python unittest and facing some issues. I have searched on internet and found a way to supply arguments asunittest.main(argv=[myArg])The issue is this wo…

different foreground colors for each line in wxPython wxTextCtrl

I have a multilinewx.TextCtrl()object which I set its forground and Background colors for writing strings.I need to write different lines with different colors ,wx.TextCtrl.setForgroundcolor()changes a…

Access deprecated attribute validation_data in tf.keras.callbacks.Callback

I decided to switch from keras to tf.keras (as recommended here). Therefore I installed tf.__version__=2.0.0 and tf.keras.__version__=2.2.4-tf. In an older version of my code (using some older Tensorfl…

How to unpickle a file that has been hosted in a web URL in python

The normal way to pickle and unpickle an object is as follows:Pickle an object:import cloudpickle as cpcp.dump(objects, open("picklefile.pkl", wb))UnPickle an object: (load the pickled file):…

Control tick-labels from multi-level FactorRange

Ive got a three-level bokeh.models.FactorRange which I use to draw tick labels on a vbar-plot. The problem is that there are dozens of factors in total and the lowest-level labels get very cramped.I ca…

PyTorch torch_sparse installation without CUDA

I am new in PyTorch and I have faced one issue, namely I cannot get my torch_sparse module properly installed. In general, I wanted to use module torch_geometric - this I have installed. However, when …

Escaping XPath literal with Python

Im writing a common library to setup an automation test suite with Selenium 2.0 Pythons webdriver.def verify_error_message_present(self, message):try:self.driver.find_element_by_xpath("//span[@cla…

How to return two values in cython cdef without gil (nogil)

I have a function and I am trying to return a number and a vector of ints. What I have is cdef func() nogil:cdef vector[int] vectcdef int a_number...return a_number, vectbut this will give errors like …

Alias for a chain of commands

I have a tool with commands: step1, step2 and step3.I can chain them by calling:$ tool step1 step2 step3I would like to have an alias named all to run all the steps by calling:$ tool allI have found a …

Generate misspelled words (typos)

I have implemented a fuzzy matching algorithm and I would like to evaluate its recall using some sample queries with test data. Lets say I have a document containing the text:{"text": "T…