Python: handling a large set of data. Scipy or Rpy? And how?

2024/10/1 5:25:02

In my python environment, the Rpy and Scipy packages are already installed.

The problem I want to tackle is such:

1) A huge set of financial data are stored in a text file. Loading into Excel is not possible

2) I need to sum a certain fields and get the totals.

3) I need to show the top 10 rows based on the totals.

Which package (Scipy or Rpy) is best suited for this task?

If so, could you provide me some pointers (e.g. documentation or online example) that can help me to implement a solution?

Speed is a concern. Ideally scipy and Rpy can handle the large files when even when the files are so large that they cannot be fitted into memory

Answer

Neither Rpy or Scipy is necessary, although numpy may make it a bit easier. This problem seems ideally suited to a line-by-line parser. Simply open the file, read a row into a string, scan the row into an array (see numpy.fromstring), update your running sums and move to the next line.

https://en.xdnf.cn/q/71000.html

Related Q&A

Jupyter notebook - cant import python functions from other folders

I have a Jupyter notebook, I want to use local python functions from other folders in my computer. When I do import to these functions I get this error: "ModuleNotFoundError: No module named xxxxx…

Can pandas plot a time-series without trying to convert the index to Periods?

When plotting a time-series, I observe an unusual behavior, which eventually results in not being able to format the xticks of the plot. It seems that pandas internally tries to convert the index into …

pip install syntax for allowing insecure

I tried to run$pip install --upgrade --allow-insecure setuptoolsbut it doesnt seem to work? is my syntax wrong?this is on ubuntu 13.10 I need --allow-insecure as I havent been able to the get the co…

how do I determine the locations of the points after perspective transform, in the new image plane?

Im using OpenCV+Python+Numpy and I have three points in the image, I know the exact locations of those points.(P1, P2);N1I am going to transform the image to another view, (for example I am transformin…

How to do a simple Gaussian mixture sampling and PDF plotting with NumPy/SciPy?

I add three normal distributions to obtain a new distribution as shown below, how can I do sampling according to this distribution in python?import matplotlib.pyplot as plt import scipy.stats as ss im…

Python dict.get() or None scenario [duplicate]

This question already has answers here:Truth value of a string in python(4 answers)Closed 7 years ago.I am attempting to access a dictionarys values based on a list of keys I have. If the key is not pr…

p-values from ridge regression in python

Im using ridge regression (ridgeCV). And Ive imported it from: from sklearn.linear_model import LinearRegression, RidgeCV, LarsCV, Ridge, Lasso, LassoCVHow do I extract the p-values? I checked but rid…

AutoTokenizer.from_pretrained fails to load locally saved pretrained tokenizer (PyTorch)

I am new to PyTorch and recently, I have been trying to work with Transformers. I am using pretrained tokenizers provided by HuggingFace.I am successful in downloading and running them. But if I try to…

How to scroll down in an instagram pop-up frame with Selenium

I have a python script using selenium to go to a given Instagram profile and iterate over the users followers. On the instagram website when one clicks to see the list of followers, a pop-up opens with…

Get starred messages from GMail using IMAP4 and python

I found many dummy info about working with IMAP, but I didnt understand how to use it for my purposes. I found how I can get ALL messages from mailbox and ALL SEEN messages, but how should I work with …