how to make a unique data from strings

2024/10/5 20:41:52

I have a data like this . the strings are separated by comma.

"India1,India2,myIndia     "
"Where,Here,Here   "
"Here,Where,India,uyete"
"AFD,TTT"

What I am trying to do is to put them all in one column (one under each other) So it will become like this

India1
India2
myIndia
Where
Here
Here
Here
Where
India
uyete
AFD
TTT

Then I keep the unique ones which lead to this

India1
India2
myIndia
Where
Here
India
uyete
AFD
TTT

So I have the first data in a .txtformat and I have tried to use numpyfor this

This is my code

#!/usr/bin/python
import numpy as np# give a name to my data 
file_name = 'path to my data/test.txt'
# set my output 
with open ( 'output.txt' , 'w' ) as out:# read all the linesfor n , line in enumerate ( open ( file_name ).readlines ( ) ):# split each stirg from another one by a commaitem1 = file_name.split ( ',' )myList = ','.join ( map ( str , item1 ) )item2 = np.unique ( myList , return_inverse=True )# save the data into outout.write ( item2 )

I was getting TypeError: expected a character buffer object

I have searched it and I found several post like TypeError: expected a character buffer object - while trying to save integer to textfile

and If I added out.seek ( 0 ) I still got the same error

but by changing it to out.write ( str(item2 )) thanks to TypeError: expected a character buffer object I get no error however, the output is showing this

(array(['/path to the file/test.txt'], dtype='|S29'), array([0]))

Below is given a soltuion which I tried to use

import csvdata = []
def remove_quotes(file):for line in file:yield line.strip ( '"\n' )
with open ( 'test.txt' ) as f:reader = csv.reader ( remove_quotes ( f ) )for row in reader:data.extend ( row )

No error but also data is not generated

Answer

stack.txt below contains this:

"India1,India2,myIndia"
"Where,Here,Here"
"Here,Where,India,uyete"
"AFD,TTT"

Here you go:

from collections import OrderedDictwith open("stack.txt", "r") as f:# read your data in from the gist site and strip off any new-line charactersdata = [eval(line.strip()) for line in f.readlines()]# get individual words into a listindividual_elements = [word for row in data for word in row.split(",")]# remove duplicates and preserve orderuniques = OrderedDict.fromkeys(individual_elements)   # convert from OrderedDict object to plain listfinal = [word for word in uniques]print(final)

Which yields this:

['India1', 'India2', 'myIndia', 'Where', 'Here', 'India', 'uyete', 'AFD', 'TTT']

Edit: To get your desired output, just print the list in the format you want:

print("\n".join(final))

Which is equivalent, from an output standpoint, to this:

for x in final:print(x)

Which yields this:

India1
India2
myIndia
Where
Here
India
uyete
AFD
TTT
https://en.xdnf.cn/q/119629.html

Related Q&A

How to read complex data from TB size binary file, fast and keep the most accuracy?

Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below: file=open(filename,rb) bytes=file.read(8) print(bytes) b\x14\x00\x80?\xb5\x0c\xf81I tried np.fromfile np.fromfile(np…

How to get spans text without inner attributes text with selenium?

<span class="cname"><em class="multiple">2017</em> Ford </span> <span class="cname">Toyota </span>I want to get only "FORD" …

List of 2D arrays with different size into 3D array [duplicate]

This question already has answers here:How do you create a (sometimes) ragged array of arrays in Numpy?(2 answers)Closed last year.I have a program that generating 2D arrays with different number of r…

How can I read data from database and show it in a PyQt table

I am trying to load data from database that I added to the database through this code PyQt integration with Sqlalchemy .I want the data from the database to be displayed into a table.I have tried this …

Python: Cubic Spline Regression for a time series data

I have the data as shown below. I want to find a CUBIC SPLINE curve that fits the entire data set (link to sample data). Things Ive tried so far:Ive gone through scipys Cubic Spline Functions, but all …

python CSV , find max and print the information

My aim is to find the max of the individual column and print out the information. But there is problem when I print some of the information. For example CSIT135, nothing was printed out. CSIT121 only p…

Error on python3 on windows subsystem for linux for fenics program

Im just starting to use fenics in python3 on windows subsystem ubuntu, and when I open the first titurial file I got this error. Solving linear variational problem. Traceback (most recent call last): …

python regex: how to remove hex dec characters from string [duplicate]

This question already has answers here:What does a leading `\x` mean in a Python string `\xaa`(2 answers)Closed 8 years ago.text="\xe2\x80\x94" print re.sub(r(\\(?<=\\)x[a-z0-9]{2})+,&quo…

Iterating through list and getting even and odd numbers

yet one more exercise that I seem to have a problem with. Id say Ive got it right, but Python knows better. The body of the task is:Write a function that takes a list or tuple of numbers. Return a two-…

Cannot import tensorflow-gpu

I have tried to import tensorflow-gpu and Im getting the same error with different versions of CUDA and cuDNN. My GPU is compatible with CUDA and I have no problems installing but when I try to import …