Truncating column width in pandas

2024/10/13 13:21:04

I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?

Answer

If you can read the whole thing into memory, you can use the str method for vector operations:

>>> df = pd.read_csv("toolong.csv")
>>> dfa                       b  c
0  1  1256378916212378918293  2[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> dfa           b  c
0  1  1256378916  2[1 rows x 3 columns]

Also note that you can get a Series with lengths using

>>> df["b"].str.len()
0    10
Name: b, dtype: int64

I was originally wondering if

>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})a      b  c
0  1  12563  2[1 rows x 3 columns]

would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.

https://en.xdnf.cn/q/69529.html

Related Q&A

Django - CreateView with multiple models

Can I use Django CreateViews to make a form that add data to multiple tables? Ive created a model called UserMeta to store some additional informations of my users. The ProblemI want to create a view …

Is there a way to pass dictionary in tf.data.Dataset w/ tf.py_func?

Im using tf.data.Dataset in data processing and I want to do apply some python code with tf.py_func.BTW, I found that in tf.py_func, I cannot return a dictionary. Is there any way to do it or workaroun…

How to split only on carriage returns with readlines in python?

I have a text file that contains both \n and \r\n end-of-line markers. I want to split only on \r\n, but cant figure out a way to do this with pythons readlines method. Is there a simple workaround for…

Python + MySQLdb executemany

Im using Python and its MySQLdb module to import some measurement data into a Mysql database. The amount of data that we have is quite high (currently about ~250 MB of csv files and plenty of more to c…

How to popup success message in odoo?

I am sending invitation by clicking button after clicking button and successfully sending invitation there is pop up message of successfully invitation send. But the problem is that the main heading of…

How to make ttk.Scale behave more like tk.Scale?

Several Tk widgets also exist in Ttk versions. Usually they have the same general behaviour, but use "styles" and "themes" rather than per-instance appearance attributes (such as bg…

pandas cut multiple columns

I am looking to apply a bin across a number of columns.a = [1, 2, 9, 1, 5, 3] b = [9, 8, 7, 8, 9, 1]c = [a, b]print(pd.cut(c, 3, labels=False))which works great and creates:[[0 0 2 0 1 0] [2 2 2 2 2 0]…

Tracking the number of recursive calls without using global variables in Python

How to track the number of recursive calls without using global variables in Python. For example, how to modify the following function to keep track the number of calls?def f(n):if n == 1:return 1else…

Match string in python regardless of upper and lower case differences [duplicate]

This question already has answers here:Case insensitive in(12 answers)Closed 9 years ago.Im trying to find a match value from a keyword using python. My values are stored in a list (my_list) and in the…

Can celery celerybeat use a Database Scheduler without Django?

I have a small infrastructure plan that does not include Django. But, because of my experience with Django, I really like Celery. All I really need is Redis + Celery to make my project. Instead of usin…