I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?
I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?
If you can read the whole thing into memory, you can use the str
method for vector operations:
>>> df = pd.read_csv("toolong.csv")
>>> dfa b c
0 1 1256378916212378918293 2[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> dfa b c
0 1 1256378916 2[1 rows x 3 columns]
Also note that you can get a Series with lengths using
>>> df["b"].str.len()
0 10
Name: b, dtype: int64
I was originally wondering if
>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})a b c
0 1 12563 2[1 rows x 3 columns]
would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.