python - Efficiently select rows that match one of several values in Pandas DataFrame -
problem
given data in pandas dataframe following:
name amount --------------- alice 100 bob 50 charlie 200 alice 30 charlie 10 i want select rows name 1 of several values in collection {alice, bob}
name amount --------------- alice 100 bob 50 alice 30 question
what efficient way in pandas?
options see them
- loop through rows, handling logic python
select , merge many statements following
merge(df[df.name = specific_name] specific_name in names) #perform sort of join
what performance trade-offs here? when 1 solution better others? solutions missing?
while example above uses strings actual job uses matches on 10-100 integers on millions of rows , fast numpy operations may relevant.
you can use isin series method:
in [11]: df['name'].isin(['alice', 'bob']) out[11]: 0 true 1 true 2 false 3 true 4 false name: name, dtype: bool in [12]: df[df.name.isin(['alice', 'bob'])] out[12]: name amount 0 alice 100 1 bob 50 3 alice 30
Comments
Post a Comment