reading tab-delimited data without header in pandas

2024/9/25 22:21:28

I'm having trouble using pandas to open tab-delimited data without headers.

My test data (actually contains 200 lines, of which I am showing the first 10):

Tag19184    CTAAC   hffef   1   a   36  -   chr1    10006   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10012   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10018   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10024   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10030   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10036   0   36M 36
Tag19184    CTAAC   hffef   1   a   36  -   chr1    10042   0   36M 36
Tag20198    CTAAC   hffef   1   a   36  -   chr1    10048   0   36M 36
Tag20198    CTAAC   hffef   1   a   36  -   chr1    10054   0   36M 36
Tag45093    CTAAC   hffef   1   a   36  -   chr1    10060   0   36M 36

My code:

import pandas as pd
df = pd.read_csv('in_test.txt',sep='\t',header=None)
print df

However, I get the following output, which I don't think I can use to further process data (?):

<class 'pandas.core.frame.DataFrame'>
Int64Index: 200 entries, 0 to 199
Data columns:
X.1     200  non-null values
X.2     200  non-null values
X.3     200  non-null values
X.4     200  non-null values
X.5     200  non-null values
X.6     200  non-null values
X.7     200  non-null values
X.8     200  non-null values
X.9     200  non-null values
X.10    200  non-null values
X.11    200  non-null values
X.12    200  non-null values
dtypes: int64(5), object(7)

The tutorial here suggests that print df should just give me the corresponding data frame. What am I doing wrong?

Answer

I think you are getting the it read correctly, but:

  1. See: change pandas 0.13.0 "print dataframe" to print dataframe like in earlier versions, this is what pandas do in the older versions. So, update will solve it.
  2. You can use ipython notebook, where DataFrames will show up as HTML tables.
  3. You can use df.head(5) (similar to r's head) to get the first a few rows just to make sure your DataFrame is correct.
https://en.xdnf.cn/q/71527.html

Related Q&A

Python Try/Except with multiple except blocks

try:raise KeyError() except KeyError:print "Caught KeyError"raise Exception() except Exception:print "Caught Exception"As expected, raising Exception() on the 5th line isnt caught i…

How to install trax, jax, jaxlib on M1 Mac on macOS 12?

trax New to trax, Im trying to run it locally (macOS 12.1, Apple Silicon ARM M1 processor, 8GB RAM, Anaconda), but Im running into some issues. In an environment with python 3.8.5, I installed trax run…

How do I match a word in a text file using python?

I want to search and match a particular word in a text file.with open(wordlist.txt, r) as searchfile:for line in searchfile:if word in line:print lineThis code returns even the words that contain subst…

Unable to Delete Videos with the Youtube Data API

Cant get deleting videos to work using the Youtube Data API. Im using the Python Client Library.All of this seems straight from the docs, so Im really confused as to why its not working. Heres my fun…

LLDB Python scripting in Xcode

Ive just discovered this handy feature of LLDB that allows me to write Python scripts that have access to variables in the frame when Im on a breakpoint in LLDB. However Im having a few issues when usi…

What technologies exist to create stand alone executables for Python 3?

Other than cx_Freeze, are there any other current maintained tool suites to generate stand alone executables for Python 3k?Are there any other techniques for minimizing preinstallation requirements un…

running multiple threads in python, simultaneously - is it possible?

Im writing a little crawler that should fetch a URL multiple times, I want all of the threads to run at the same time (simultaneously).Ive written a little piece of code that should do that.import thre…

Drawing bounding rectangles around multiple objects in binary image in python

I am trying to write some easy code in python to produce bounding rectangles around objects in a binary image, where there may be 1 or more objects. This is fairly easy to achieve with cv2.boundingRec…

Replicating YEARFRAC() function from Excel in Python

So I am using python in order to automate some repetitive tasks I must do in excel. One of the calculations I need to do requires the use of yearfrac(). Has this been replicated in python?I found this…

creating a pandas dataframe from a database query that uses bind variables

Im working with an Oracle database. I can do this much:import pandas as pdimport pandas.io.sql as psqlimport cx_Oracle as odbconn = odb.connect(_user +/+ _pass +@+ _dbenv)sqlStr = "SELECT * FROM c…