Downloading Books from website with python

2024/11/17 4:27:54

I'm downloading books from the website, and almost my code runs smoothly, but when I try to open the pdf Book on my PC. An error generated by Adobe Acrobat Reader that this is not supported file type.

Error Image

Here is the image of the Book formate, and I'm sure my code needs to be a correction because the formate of the book on the website is different from normally PDF Files.

Book Formate Image

Code:

import requests
from bs4 import BeautifulSoup
url = 'https://global.oup.com/education/support-learning-anywhere/key-resources-online/?region=international&utm_campaign=learninganywhere&utm_source=umbraco&utm_medium=display&utm_content=support_learning_key_resources&utm_team=int#Primary'response = requests.get(url)
soup     = BeautifulSoup(response.content, 'html.parser')
table_data = soup.find_all('td')books_url_list = []
for link in table_data:books_url = link.find('a')['href']books_url_list.append(books_url+'.pdf')book = books_url_list[1]
book_response = requests.get(book)with open('books.pdf', 'wb') as f:f.write(book_response.content)

`

Answer

Well, I inspected element from website, then I find no '.pdf' files. We can inspect one book page using following link: https://en.calameo.com/read/000777721d10096b9e9ca?authid=gWc48kAQQoD0&region=international

After inspecting the element, I find is not pdf. It's just an image in the page.

https://p.calameoassets.com/200406174654-2bfa9441783e162c8da42a712feda3e2/p1.svgz

https://p.calameoassets.com/200406174654-2bfa9441783e162c8da42a712feda3e2/p2.svgz

....

https://p.calameoassets.com/200406174654-2bfa9441783e162c8da42a712feda3e2/p98.svgz

And so on.

So, you can write a code to download this image.

https://en.xdnf.cn/q/119114.html

Related Q&A

Discord.py How can I make a bot delete messages after a specific amount of time

I have a discord bot that sends images to users when they use the !img command, I dont want people to request an image and then have it sit there until someone deletes it. Is there any way I can make i…

How to encode and decode a column in python pandas?

load = pd.DataFrame({A:list(abcdef),B:[4,5,4,5,5,4],C:[7,8,9,4,2,0],D:[1,3,5,4,2,0],E:[5,3,6,9,2,4],F:list(aaabbb)})How to encode and decode column F.Expected Output:Should have two more columns with e…

Pygame module not found [duplicate]

This question already has answers here:Why do I get a "ModuleNotFoundError" in VS Code despite the fact that I already installed the module?(23 answers)Closed 3 months ago.I have installed p…

Working with Lists and tuples

My data looks like:X=[1,2,3,4]But I need it to look like: Y=[(1,2,3,4)]How does one do this in python?

Football pygame, need help on timer [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 4 years ago.Improve…

Unable to Install GDAL Using PIP on Python

Im trying to install gdal in python3.8.8 (Windows 10) and im getting below error I have install Visual Studio Build Tools 2019 and reboot my PC Downgrade my Python from 3.9.5 to 3.8.8 C:\Program Files…

Append text to the last line of file with python

First, I use echo hello, >> a.txt to create a new file with one line looks like that. And I know \n is at the last of the line.Then I get some data from python, for example "world", I w…

Python Caesar Cipher [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 2…

Indent Expected? [duplicate]

This question already has answers here:Im getting an IndentationError (or a TabError). How do I fix it?(6 answers)Closed 7 months ago.Im sort of new to python and working on a small text adventure its…

Convert QueryDict to key-value pair dictionary

I have a QueryDict that I get from request.POST in this format: <QueryDict: {name: [John], urls: [google.com/\r\nbing.com/\r\naskjeeves.com/], user_email: [[email protected]]}>Why are the values …