Removing Characters from python Output

2024/10/5 14:53:26

I did alot of work to remove the characters from the spark python output like u u' u" [()/'" which are creating problem for me to do the further work. So please put a focus on the same .

I have the input like,

(u"(u'[25145,   12345678'", 0.0)
(u"(u'[25146,   25487963'", 43.0) when i applied code to summing out the result. this gives me the output like
(u'(u"(u\'[54879,    5125478\'"', 0.0)
(u"(u'[25145,   25145879'", 11.0)
(u'(u"(u\'[56897,    22548793\'"', 0.0) so i want to remove all the character like (u'(u"(u\'["'') 

I want output like

54879,5125478,0.025145,25145879,11.0

the code is i tried is

from pyspark import SparkContext
import os
import syssc = SparkContext("local", "aggregate")file1 = sc.textFile("hdfs://localhost:9000/data/first/part-00000")
file2 = sc.textFile("hdfs://localhost:9000/data/second/part-00000")file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))result = file3.map(lambda x: ((x[0]+', '+x[1],float(x[2][:-1])))).reduceByKey(lambda a,b:a+b).coalesce(1)result.saveAsTextFile("hdfs://localhost:9000/Test1")
Answer

I think your only problem is that you have to reformat you result before saving it to the file, i.e. something like:

result.map(lambda x:x[0]+','+str(x[1])).saveAsTextFile("hdfs://localhost:9000/Test1")
https://en.xdnf.cn/q/120293.html

Related Q&A

How to make a tkinter entry default value permanent

I am writing a program in python that will take in specific formats, a Phone number and dollar/cent values. How can I make tkinter have default value which is permanent, not deletable. For example (XXX…

distribute value in buckets

Consider below DF, I have an input number=4 to be inserted evenly in different hour buckets.p_hourly mins 0 2020-09-10 07:00:00 60.0 1 2020-09-10 08:00:00 60.0 2 2020-09-10 09:00:00 60…

for loop over list break and continue

To specify the problem correctly :i apologize for the confusion Having doubts with breaking early from loop . I have folders - 1995,1996 to 2014 . Each folder has xml files. In some xml files the entr…

ImportError: cannot import name loads from json (unknown location)

Previos title was: AttributeError: module json has no attribute loads I changed it because it looks similar to this but at the link that i provided, the problem seems that the person was having a file…

How can I filter the domains served by a CDN from a list of domain names?

I have a list of domains and I need to filter the domains served by a CDN(Content Delivery Network). I am going to use python script to do that. At the first I was thinking I can identify them from the…

Convert int(round(time.time())) to C# [closed]

Its difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying thi…

How to iterate over all elements of a 2D matrix using only one loop using python

I know you can iterate over a 2d matrix using two indexes like this: import numpy as npA = np.zeros((10,10))for i in range(0,10):for j in range(0,10):if (i==j):A[i,j] = 4Is there a way of doing this us…

Parse table names from a bunch SQL statements

I have an table with thousands of SQL statements in a column called Queries. Any ideas on how to get just the table names from the statements by using a regular expression?

click multiple buttons with same class names in Python

This a column in a table this column contains buttons, on pressing each buttons a pdf is downloadedThe buttons have the same class names and I want to click on all the buttons.This is what I did, but i…

Python equivalent to subset function in r [duplicate]

This question already has answers here:subsetting a Python DataFrame(6 answers)Closed 4 years ago.I dont know python at all but the project Im currently working on must be done using it, I have this r …