Question 1

I'm parsing data about car production with BeautifulSoup (see also my first question):

from bs4 import BeautifulSoup
import stringhtml = """
<h4>Production Capacity (year)</h4><div class="profile-area">Vehicle 1,140,000 units /year</div>
<h4>Output</h4><div class="profile-area">Vehicle 809,000 units ( 2016 ) </div><div class="profile-area">Vehicle 815,000 units ( 2015 ) </div><div class="profile-area">Vehicle 836,000 units ( 2014 ) </div><div class="profile-area">Vehicle 807,000 units ( 2013 ) </div><div class="profile-area">Vehicle 760,000 units ( 2012 ) </div><div class="profile-area">Vehicle 805,000 units ( 2011 ) </div>
"""
soup = BeautifulSoup(html, 'lxml')for item in soup.select("div.profile-area"):produkz = item.text.strip()produkz = produkz.replace('\n',':')prev_h4 = str(item.find_previous_sibling('h4'))if "Models" in prev_h4:models=produkzelse:models=""if "Capacity" in prev_h4:capacity=produkzelse:capacity=""if "( 2015 )" in produkz:prod15=produkzelse:prod15=""if "( 2016 )" in produkz:prod16=produkzelse:prod16=""if "( 2017 )" in produkz:prod17=produkzelse:prod17=""print(models+';'+capacity+';'+prod15+';'+prod16+';'+prod17)

My problem is, that the next loop on all matching HTML occurrences ("div.profile-area") overwrites my result:

;Vehicle 1,140,000 units /year;;;;;;
;;;;;;Vehicle 809,000 units ( 2016 );
;;;;;Vehicle 815,000 units ( 2015 );;
;;;;Vehicle 836,000 units ( 2014 );;;
;;;Vehicle 807,000 units ( 2013 );;;;
;;Vehicle 760,000 units ( 2012 );;;;;
;;;;;;;

My desired result is:

;Vehicle 1,140,000 units /year;Vehicle 760,000 units ( 2012 );Vehicle 807,000 units ( 2013 );Vehicle 836,000 units ( 2014 );Vehicle 815,000 units ( 2015 );Vehicle 809,000 units ( 2016 );

I would be glad if you could show me a better way to structure my code. Thanks in advance.

Question 2

This is my solution, You need to take care of each element tag and parse it accordingly. I went further to your problem and offered a more flexible way to access each data value. hope it helps.

import refrom bs4 import BeautifulSouphtml_doc = """
<h4>Production Capacity (year)</h4><div class="profile-area">Vehicle 1,140,000 units /year</div>
<h4>Output</h4><div class="profile-area">Vehicle 809,000 units ( 2016 ) </div><div class="profile-area">Vehicle 815,000 units ( 2015 ) </div><div class="profile-area">Vehicle 836,000 units ( 2014 ) </div><div class="profile-area">Vehicle 807,000 units ( 2013 ) </div><div class="profile-area">Vehicle 760,000 units ( 2012 ) </div><div class="profile-area">Vehicle 805,000 units ( 2011 ) </div>"""soup = BeautifulSoup(html_doc, 'html.parser')
h4_elements = soup.find_all('h4')
profile_areas = soup.find_all('div', attrs={'class': 'profile-area'})
print('\n')
print("++++++++++++++++++++++++++++++++++++")
print("Element counts")
print("++++++++++++++++++++++++++++++++++++")
print("Total H4: {}".format(len(h4_elements)))
print("++++++++++++++++++++++++++++++++++++")
print("Total profile-area: {}".format(len(profile_areas)))
print("++++++++++++++++++++++++++++++++++++")
print('\n')for i in h4_elements:print("++++++++++++++++++++++++++++++++++++")print(i.text.rstrip().lstrip())print("++++++++++++++++++++++++++++++++++++")del profile_areas[0]for j in profile_areas:raw = re.sub('[^A-Za-z0-9]+', ' ', j.text.replace(',','').lstrip().rstrip())raw = raw.rstrip()el = raw.split(' ')print('Type: {} '.format(el[0]))print('Sold: {} {} '.format(el[1], el[2]))print('Year: {} '.format(el[3]))print("++++++++++++++++++++++++++++++++++++")

The output is the following:

 ++++++++++++++++++++++++++++++++++++
Production Capacity (year)
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 809000 units 
Year: 2016 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 815000 units 
Year: 2015 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 836000 units 
Year: 2014 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 807000 units 
Year: 2013 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 760000 units 
Year: 2012 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 805000 units 
Year: 2011 
++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++
Output
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 815000 units 
Year: 2015 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 836000 units 
Year: 2014 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 807000 units 
Year: 2013 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 760000 units 
Year: 2012 
++++++++++++++++++++++++++++++++++++
Type:Vehicle 
Sold: 805000 units 
Year: 2011 
++++++++++++++++++++++++++++++++++++

Python: How to access and iterate over a list of div class element using (BeautifulSoup)

Related Q&A

What should I worry about Python template engines and web frameworks? [closed]

Value Search from Dictionary via User Input

Read and aggregate data from CSV file

nltk cant using ImportError: cannot import name compat

Fitting and Plotting Lognormal

Is there any way to install nose in Maya?

Basic python socket server application doesnt result in expected output

creating dictionaries to list order of ranking

Python: How to use MFdataset in netCDF4

Pyspark: Concat function generated columns into new dataframe