Python urllib2 or requests post method [duplicate]

2024/11/24 0:24:27

I understand in general how to make a POST request using urllib2 (encoding the data, etc.), but the problem is all the tutorials online use completely useless made-up example urls to show how to do it (someserver.com, coolsite.org, etc.), so I can't see the specific html that corresponds to the example code they use. Even python.org's own tutorial is totally useless in this regard.

I need to make a POST request to this url:

https://patentscope.wipo.int/search/en/search.jsf

The relevant part of the code is this (I think):

<form id="simpleSearchSearchForm" name="simpleSearchSearchForm" method="post" action="/search/en/search.jsf" enctype="application/x-www-form-urlencoded" style="display:inline">
<input type="hidden" name="simpleSearchSearchForm" value="simpleSearchSearchForm" />
<div class="rf-p " id="simpleSearchSearchForm:sSearchPanel" style="text-align:left;z-index:-1;"><div class="rf-p-hdr " id="simpleSearchSearchForm:sSearchPanel_header">

Or maybe it's this:

<input id="simpleSearchSearchForm:fpSearch" type="text" name="simpleSearchSearchForm:fpSearch" class="formInput" dir="ltr" style="width: 400px; height: 15px; text-align: left; background-image: url(&quot;https://patentscope.wipo.int/search/org.richfaces.resources/javax.faces.resource/org.richfaces.staticResource/4.5.5.Final/PackedCompressed/classic/org.richfaces.images/inputBackgroundImage.png&quot;); background-position: 1px 1px; background-repeat: no-repeat;">

If I want to encode JP2014084003 as the search term, what is the corresponding value in the html to use? input id? name? value?

Addendum: this answer does not answer my question, because it just repeats the information I've already looked at in the python docs page.

UPDATE:

I found this, and tried out the code in there, specifically:

import requestsheaders = {'User-Agent': 'Mozilla/5.0'}
payload = {'name':'simpleSearchSearchForm:fpSearch','value':'2014084003'}
link    = 'https://patentscope.wipo.int/search/en/search.jsf'
session = requests.Session()
resp    = session.get(link,headers=headers)
cookies = requests.utils.cookiejar_from_dict(requests.utils.dict_from_cookiejar(session.cookies))
resp    = session.post(link,headers=headers,data=payload,cookies =cookies)r = session.get(link)f = open('htmltext.txt','w')f.write(r.content)f.close()

I get a successful response (200) but the data, once again is simply the data in the original page, so I don't know whether I'm posting to the form correctly and there's something else I need to do to get it to return the data from the search results page, or if I'm still posting the data wrong.

And yes, I realize that this uses requests instead of urllib2, but all I want to be able to do is get the data.

Answer

This is not the most straight forward post request, if you look in developer tools or firebug you can see the formdata from a successful browser post:

enter image description here

All that is pretty straight forward bar the fact you see some : embedded in the keys which may be a bit confusing, simpleSearchSearchForm:commandSimpleFPSearch is the key and Search.

The only thing that you cannot hard code is javax.faces.ViewState, we need to make a request to the site and then parse that value which we can do with BeautifulSoup:

import requests
from bs4 import BeautifulSoupurl = "https://patentscope.wipo.int/search/en/search.jsf"data = {"simpleSearchSearchForm": "simpleSearchSearchForm","simpleSearchSearchForm:j_idt341": "EN_ALLTXT","simpleSearchSearchForm:fpSearch": "automata","simpleSearchSearchForm:commandSimpleFPSearch": "Search","simpleSearchSearchForm:j_idt406": "workaround"}
head = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}with requests.Session() as s:# Get the cookies and the source to parse the Viewstate tokeninit = s.get(url)soup = BeautifulSoup(init.text, "lxml")val = soup.select_one("#j_id1:javax.faces.ViewState:0")["value"]# update post data dictdata["javax.faces.ViewState"] = valr = s.post(url, data=data, headers=head)print(r.text)

If we run the code above:

In [13]: import requestsIn [14]: from bs4 import BeautifulSoupIn [15]: url = "https://patentscope.wipo.int/search/en/search.jsf"In [16]: data = {"simpleSearchSearchForm": "simpleSearchSearchForm",....:         "simpleSearchSearchForm:j_idt341": "EN_ALLTXT",....:         "simpleSearchSearchForm:fpSearch": "automata",....:         "simpleSearchSearchForm:commandSimpleFPSearch": "Search",....:         "simpleSearchSearchForm:j_idt406": "workaround"}In [17]: head = {....:     "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}In [18]: with requests.Session() as s:....:         init = s.get(url)....:         soup = BeautifulSoup(init.text, "lxml")....:         val = soup.select_one("#j_id1:javax.faces.ViewState:0")["value"]....:         data["javax.faces.ViewState"] = val....:         r = s.post(url, data=data, headers=head)....:         print("\n".join([s.text.strip() for s in BeautifulSoup(r.text,"lxml").select("span.trans-section")]))....:     Fuzzy genetic learning automata classifier
Fuzzy genetic learning automata classifier
FINITE AUTOMATA MANAGER
CELLULAR AUTOMATA MUSIC GENERATOR
CELLULAR AUTOMATA MUSIC GENERATOR
ANALOG LOGIC AUTOMATA
Incremental automata verification
Cellular automata music generator
Analog logic automata
Symbolic finite automata

You will see it matches the webpage. If you want to scrape sites you need to get familiar with developer tools/firebug etc.. to watch how the requests are made and then try to mimic. To get firebug open, right click on the page and select inspect element, click the network tab and submit your request. You just have to select the requests from the list then select whatever tab you want info on i.e params for out post request:

enter image description here

You may also find this answer useful on how to approach posting to a site.

https://en.xdnf.cn/q/120598.html

Related Q&A

How to fix Django NoReverseMatch Error

I have built a simple blog app with Django and I have had some issue with NoReverseMatch. Here are my problem screenshots: https://prnt.sc/imqx70 https://prnt.sc/imqwptHere is my code on Github: https:…

Selenium NoSuchElementException with driver.find_element_by_xpath

First question: How do I make python minimize chrome? Second question: When getting to the end page using the next button how do I tell python to go on.. and not give me an error?driver.get("htt…

Traverse Non-Binary Tree [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.Want to improve this question? Add details and clarify the problem by editing this post.Closed 9 years ago.Improve…

Muscle alignment in python

I have a problem with printing my output from muscle aligning in python. My code is:from Bio.Align.Applications import MuscleCommandline from StringIO import StringIO from Bio import AlignIOdef align_v…

why does my function is returning data type None??: Python datatype None

This is my python code for printing an absolute number. My function is returning type None. I am not getting what I have done wrong. Please help me. def n(num):if num<0:return (num*-1)no = input(&qu…

np.sum Not returning total counts

I am trying this code but it does not return total count for zero[x][y], in this case it should return 5 but all it displays 255 five time. THIS CODE IS FOR CONNECTED COMPONENTS AND ZERO IS ONE COMPONE…

Python Printing Dictionary Key and Value side by side

I want a program that prints Key and Value side by side for the following code:This is a Dictionary:d = {M: [Name1, Name2, Name3], F: [Name1,Name2,Name3]}I want the a program that prints in the followi…

can i use rusts match in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.This question does not appear to be about programming within the scope defined in the help center.Cl…

Consolidating Elements of a List [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 6…

html textarea via python using post function

<div><textarea cols="25" rows="12" name="box" id="box" tabindex="2">TEXT GOES HERE </textarea></div><div><p class=&quo…