How to isolate titles from these image URLs?

2024/11/16 7:44:06

I have a list of image urls contained in 'images'. I am trying to isolate the title from these image urls so that I can display, on the html, the image (using the whole url) and the corresponding title.

So far I have this:

titles = [image[149:199].strip() for image in images]

This gives me the stripped title in the following format (I provide two examples to show the pattern)

le_Art_Project.jpg/220px- Rembrandt_van_Rijn_-Self-Portrait-_Google_Art_Project.jpg

and

cene_of_the_Prodigal_Son_-Google_Art_Project.jpg/220px-Rembrandt-Rembrandt_and_Saskia_in_the_Scene_of_the_Prodigal_Son-_Google_Art_Project.jpg

The bits in bold (above) are the bits I would like to remove. From the start I would like to remove everything before 220px and from the end: _-_Google_Art_Project.jpg

A newbie to python, I am struggling with syntax and furthermore as I am doing this while referring to the loop of images (list), the string manipulation is not straightforward and I am unsure of how to approach this.

The whole code for reference is below:

webscraper.py:

@app.route('/') #this is what we type into our browser to go to pages. we create these using routes
@app.route('/home')
def home():images=imagescrape()titles=[image[99:247].strip() for image in images]images_titles=zip(images,titles)return render_template('home.html',images=images,images_titles=images_titles)

What I've tried / am trying:

x = txt.strip("_-_Google_Art_Project.jpg")

Looking into strip - to get rid of the last part of the unwanted string.

I am unsure of how to combine this with getting rid of the leading string that I want to remove and also do so in the most elegant way given the structure/code I already have.

Visually, I am trying to remove the leading text as shown highlighted, as well as the last part of the string which is _-_Google_Art_Project.jpg.

Visual of HTML displayed:

Visual of HTML displayed

UPDATE:

Based on an answer below - which is very helpful but doesn't quite perfectly solve it, I am trying this approach (without using the unquote import if possible and pure python string manipulation)

def titleextract(url):#return unquote(url[58:url.rindex("/",58)-8].replace('_',''))title=url[58:]return title

The above, returns:

Rembrandt_van_Rijn_-_Self-Portrait_-_Google_Art_Project.jpg/220pxRembrandt_van_Rijn_-_Self-Portrait_-_Google_Art_Project.jpg

but I want:

Rembrandt_van_Rijn_-_Self-Portrait

or for the second title/image in the list:

Rembrandt_van_Rijn_-_Saskia_van_Uylenburgh%2C_the_Wife_of_the_Artist_-_Google_Art_Project.jpg/220px-Rembrandt_van_Rijn_-_Saskia_van_Uylenburgh%2C_the_Wife_of_the_Artist_-_Google_Art_Project.jpg

I want:

Rembrandt_van_Rijn_-_Saskia_van_Uylenburgh%2C_the_Wife_of_the_Artist
Answer

cene_of_the_Prodigal_Son_-_Google_Art_Project.jpg/220px-Rembrandt_-Rembrandt_and_Saskia_in_the_Scene_of_the_Prodigal_Son-_Google_Art_Project.jpg

You have this string and want to remove. Let's say I have this stored in x

y = x.lsplit("px-")[1] 
z = x.rsplit("_Google_Art")[0]

This makes a list with 2 elements: stuff before "px-" in the string, and stuff after. We're just grabbing the stuff after, since you wanted to remove the stuff before. If "px-" isn't always in the string, then we need to find something else to split on. Then we split on something towards the end, and grab the stuff before it.

Edit: Addressing comment on how to split in that loop.. I think you are referring to this: titles=[image[149:199].strip() for image in images]

List comps are great but sometimes it's easier to just write it out. Haven't tested this but here's the idea:

titles = []
for image in images:title = image[149:199].strip()cleaned_left = title.lsplit("px-")[1]cleaned_title = title.rsplit("_Google_Art")[0]titles.append(cleaned_title)
https://en.xdnf.cn/q/120352.html

Related Q&A

Columns and rows concatenation with a commun value in another column

In the below mentioned table, I want to concatenate the columns Tri_gram_sents and Value together and then all rows which has the same number in column sentence.Tri_gram_sents Value …

Python Indentation Error [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.This question was caused by a typo or a problem that can no longer be reproduced. While similar q…

Allow form submission only once a day django

I want to allow users to submit a django form once, and only once everyday. After submitting the form, the form wouldnt even show (server-side checkings, I dont want to use JS or client side thing; eas…

Counting percentage of element occurence from an attribute in a class. Python

I have a class called transaction that have these attributes Transaction([time_stamp, time_of_day, day_of_month ,week_day, duration, amount, trans_type, location])an example of the data set is as sucht…

AWS | Syntax error in module: invalid syntax

I have created python script which is uploaded as a zip file in AWS Lambda function with stompy libraries bundled in them.Logs for python 2.7:-Response: nullRequest ID: "c334839f-ee46-11e8-8970-61…

Algorithm for finding if an array is balanced [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable…

Merging two dataframes in python pandas [duplicate]

This question already has answers here:Pandas Merging 101(8 answers)Closed 5 years ago.I have a dataframe A:a 1 a 2 b 1 b 2Another dataframe B:a 3 a 4 b 3I want my result dataframe to be like a 1 3 a …

Searching for the best fit price for multiple customers [duplicate]

This question already has an answer here:Comparing multiple price options for many customers algorithmically(1 answer)Closed 10 years ago.A restatement of Comparing multiple price options for many cust…

Can we chain the ternary operator in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.The com…

evaluate a python string expression using dictionary values

I am parsing a text file which contain python "string" inside it. For e.g.:my_home1 in houses.split(,) and 2018 in iphone.split(,) and 14 < maskfor the example above, I wrote a possible di…