Pandas Merge two DataFrames without some columns

2024/10/18 15:18:45

Context

I'm trying to merge two big CSV files together.

Problem

Let's say I've one Pandas DataFrame like the following...

EntityNum    foo   ...
------------------------
1001.01      100
1002.02       50
1003.03      200

And another one like this...

EntityNum    a_col    b_col
-----------------------------------
1001.01      alice        7  
1002.02        bob        8
1003.03        777        9

I'd like to join them like this:

EntityNum    foo    a_col
----------------------------
1001.01      100    alice
1002.02       50      bob
1003.03      200      777

So Keep in mind, I don't want b_col in the final result. How do I I accomplish this with Pandas?

Using SQL, I should probably have done something like:

SELECT t1.*, t2.a_col FROM table_1 as t1LEFT JOIN table_2 as t2ON t1.EntityNum = t2.EntityNum; 

Search

I know it is possible to use merge. This is what I've tried:

import pandas as pddf_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')

But I'm stuck when it comes to avoiding some of the unwanted columns in the final dataframe.

Answer

You can first access the relevant dataframe columns via their labels (e.g. df_a[['EntityNum', 'foo']] and then join those.

df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')

Note that the default behavior for merge is to do an inner join.

https://en.xdnf.cn/q/72571.html

Related Q&A

Getting Youtube data using Python

Im trying to learn how to analyze social media data available on the web and Im starting with Youtube.from apiclient.errors import HttpError from outh2client.tools import argparser from apiclient.disco…

matplotlib does not show legend in scatter plot

I am trying to work on a clustering problem for which I need to plot a scatter plot for my clusters.%matplotlib inline import matplotlib.pyplot as plt df = pd.merge(dataframe,actual_cluster) plt.scatte…

Correct usage of asyncio.Conditions wait_for() method

Im writing a project using Pythons asyncio module, and Id like to synchronize my tasks using its synchronization primitives. However, it doesnt seem to behave as Id expect.From the documentation, it se…

displaying newlines in the help message when using pythons optparse

Im using the optparse module for option/argument parsing. For backwards compatibility reasons, I cant use the argparse module. How can I format my epilog message so that newlines are preserved?In th…

How to use a learnable parameter in pytorch, constrained between 0 and 1?

I want to use a learnable parameter that only takes values between 0 and 1. How can I do this in pytorch? Currently I am using: self.beta = Parameter(torch.Tensor(1)) #initialize zeros(self.beta)But I…

generating a CSV file online on Google App Engine

I am using Google App Engine (python), I want my users to be able to download a CSV file generated using some data from the datastore (but I dont want them to download the whole thing, as I re-order th…

Python equivalence of Rs match() for indexing

So i essentially want to implement the equivalent of Rs match() function in Python, using Pandas dataframes - without using a for-loop. In R match() returns a vector of the positions of (first) matches…

Why doesnt Pydantic validate field assignments?

I want to use Pydantic to validate fields in my object, but it seems like validation only happens when I create an instance, but not when I later modify fields. from pydantic import BaseModel, validato…

Format OCR text annotation from Cloud Vision API in Python

I am using the Google Cloud Vision API for Python on a small program Im using. The function is working and I get the OCR results, but I need to format these before being able to work with them.This is …

Does pybtex support accent/special characters in .bib file?

from pybtex.database.input import bibtex parser = bibtex.Parser() bibdata = parser.parse_file("sample.bib")The above code snippet works really well in parsing a .bib file but it seems not to …