I am trying to read the text data from the Url mentioned in the code. But it throws an error:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
url="https://cdn.upgrad.com/UpGrad/temp/d934844e-5182-4b58-b896-4ba2a499aa57/companies.txt"
c=pd.read_csv(url, encoding='utf-8')
Seems like there was some encoding issues with df.read_csv() it never splitted the code:
#!/usr/bin/env python3
import requests
import pandas as pd
url = "https://cdn.upgrad.com/UpGrad/temp/d934844e-5182-4b58-b896-4ba2a499aa57/companies.txt"
r = requests.get(url)
df = None
if r.status_code == 200: rows = r.text.split('\r\n')header = rows[0].split('\t')data = []for n in range(1, len(rows)):cols = rows[n].split('\t')data.append(cols)df = pd.DataFrame(columns=header, data=data)
else:print("error: unable to load {}".format(url))sys.exit(-1)
print(df.shape)
print(df.head(2))$ ./test.py
(66369, 10)permalink name homepage_url category_list status country_code state_code region city founded_at
0 /Organization/-Fame #fame http://livfame.com Media operating IND 16 Mumbai Mumbai
1 /Organization/-Qounter :Qounter http://www.qounter.com Application Platforms|Real Time|Social Network... operating USA DE DE - Other Delaware City 04-09-2014