i tried to implement async and await inside spawn child process. But it didn't worked. Please see this
Expected output
*************
http://www.stevecostellolaw.com/*************
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/personal-injury.html*************
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/#*************
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/home.html*************
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/about-us.html*************
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/*************
Becoz each time spawn child found await
it will go back to python script and print *************
it and then print URL. Ignore 2 times printing of same url here.
Output which i m getting
C:\Users\ASUS\Desktop\searchermc>node app.js
server running on port 3000DevTools listening on ws://127.0.0.1:52966/devtools/browser/933c20c7-e295-4d84-a4b8-eeb5888ecbbf
[3020:120:0402/105304.190:ERROR:device_event_log_impl.cc(214)] [10:53:04.188] USB: usb_device_handle_win.cc:1056 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[3020:120:0402/105304.190:ERROR:device_event_log_impl.cc(214)] [10:53:04.189] USB: usb_device_handle_win.cc:1056 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)*************
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/*************
Please see the app.js
code below
// form submit request
app.post('/formsubmit', function(req, res){csvData = req.files.csvfile.data.toString('utf8');filteredArray = cleanArray(csvData.split(/\r?\n/))csvData = get_array_string(filteredArray)csvData = csvData.trim()var keywords = req.body.keywordskeywords = keywords.trim()// Send request to python scriptvar spawn = require('child_process').spawn;var process = spawn('python', ["./webextraction.py", csvData, keywords, req.body.full_search])var outarr = []// process.stdout.on('data', (data) => {// console.log(`stdout: ${data}`);// });process.stdout.on('data', async function(data){console.log("\n ************* ")console.log(data.toString().trim())await outarr.push(data.toString().trim())console.log("\n ************* ")});});
Python function which is sending in the URLs when the if condition matched
# Function for searching keyword start
def search_keyword(href, search_key):extension_list = ['mp3', 'jpg', 'exe', 'jpeg', 'png', 'pdf', 'vcf']if(href.split('.')[-1] not in extension_list):try: content = selenium_calling(href)soup = BeautifulSoup(content,'html.parser')search_string = re.sub("\s+"," ", soup.body.text)search_string = search_string.lower()res = [ele for ele in search_key if(ele.lower() in search_string)]outstr = getstring(res)outstr = outstr.lstrip(", ")if(len(res) > 0):print(href)found_results.append(href)href_key_dict[href] = outstrreturn 1else:notfound_results.append(href)except Exception as err:pass
I want to do all this because of the python script which takes more time to execute and thus give timeout error each time, so i am thinking to get intermediate ouput of the python script in my nodejs script. you can see the error i m getting in below image.