Question 1

I wish to select some specific rows based on two column values. For example:

d = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]}
df = pd.DataFrame(d)
print dfOut:f1  f2  f3  item  user
0   9   4   4     5     1
1  16   5   5     6     2
2  17   6   5     7     3
3  18   5   8     8     4

I want to select the rows based on the values of 'user' and 'item'. Given an 2d numpy array which stores the [user, item] values pairs:

samples = np.array([[1,5],[3,7],[3,7],[2,6]]) 
Out: 
array([[1, 5],[3, 7],[3, 7],[2, 6]])

Then the expected output is:

    Out:f1  f2  f3  item  user
0   9   4   4     5     1
2  17   6   5     7     3
2  17   6   5     7     3
1  16   5   5     6     2

Then, my final objective is to get an 2d numpy array stores all the columns values except item and user, which is:

Out: 
array([[9, 4, 4],[17, 6, 5],[17, 6, 5],[16, 5, 5]])

As we can see, it is the values of columns f1, f2, f3.

How can I do this?

Question 2

If you make samples a DataFrame with columns user and item, then you can obtain the desired values with an inner join. By default, pd.merge merges on all columns of samples and df shared in common -- in this case, that would be user and item. Hence,

result = pd.merge(samples, df, how='inner')

yields

   user  item  f1  f2  f3
0     1     5   9   4   4
1     3     7  17   6   5
2     3     7  17   6   5
3     2     6  16   5   5

import numpy as np
import pandas as pdd = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]}
df = pd.DataFrame(d)
samples = np.array([[1,5],[3,7],[3,7],[2,6]]) 
samples = pd.DataFrame(samples, columns=['user', 'item'])result = pd.merge(samples, df, how='inner')
result = result[['f1', 'f2', 'f3']]
result = result.values
print(result)

yields

[[  9.   4.   4.][ 17.   6.   5.][ 17.   6.   5.][ 16.   5.   5.]]

Select pandas frame rows based on two columns values

Related Q&A

Using scipy sparse matrices to solve system of equations

Segmentation Fault in Pandas read_csv

Multiple subprocesses with timeouts

what is the difference between tfidf vectorizer and tfidf transformer

Use Pandas string method contains on a Series containing lists of strings

Is this the correct way of whitening an image in python?

Python zlib output, how to recover out of mysql utf-8 table?

Incorrect user for supervisord celeryd

Pandas drop rows where column contains *

How to stop scrapy spider after certain number of requests?