[Python, 데이터분석] 데이터 프레임 합치기, 조인 예제(DataFrame Join, Merge)
| 데이터 프레임 조인하기(DataFrame Join)
판다스(Pandas)에서는 데이터 프레임간에 SQL문의 테이블 간 조인 연산처럼 데이터 프레임을 합칠 수 있는 기능을 지원합니다.
아래 그림에 나와있는 모든 조인 연산을 지원합니다.
<출처 : https://www.dofactory.com/sql/join>
| 데이터 프레임 조인 예제(DataFrame Join Example)
import numpy as np
import pandas as pd
df = pd.DataFrame([{'Name': 'Chris', 'Item Purchased': 'Sponge', 'Cost': 22.50},
{'Name': 'Kevyn', 'Item Purchased': 'Kitty Litter', 'Cost': 2.50},
{'Name': 'Filip', 'Item Purchased': 'Spoon', 'Cost': 5.00}],
index=['Store 1', 'Store 1', 'Store 2'])
df['Date'] = ['December 1', 'January 1', 'mid-May']
df['Delivered'] = True
df['Feedback'] = ['Positive', None, 'Negative']
print(df)
'''
Cost Item Purchased Name Date Delivered Feedback
Store 1 22.5 Sponge Chris December 1 True Positive
Store 1 2.5 Kitty Litter Kevyn January 1 True None
Store 2 5.0 Spoon Filip mid-May True Negative
'''
adf = df.reset_index() # 인덱스 재설정 원래 인덱스를 index column으로 빼고 0~N으로 인덱스 대체
adf['Date'] = pd.Series({0:'December 1', 2:'mid-May'})
print(adf)
'''
index Cost Item Purchased ... Date Delivered Feedback
0 Store 1 22.5 Sponge ... December 1 True Positive
1 Store 1 2.5 Kitty Litter ... NaN True None
2 Store 2 5.0 Spoon ... mid-May True Negative
'''
staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR'},
{'Name': 'Sally', 'Role': 'Course liasion'},
{'Name': 'James', 'Role': 'Grader'}])
staff_df = staff_df.set_index('Name') # Name 컬럼을 인덱스로 뺌
print(staff_df)
'''
Role
Name
Kelly Director of HR
Sally Course liasion
James Grader
'''
student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business'},
{'Name': 'Mike', 'School': 'Law'},
{'Name': 'Sally', 'School': 'Engineering'}])
student_df = student_df.set_index('Name')
print(student_df)
'''
School
Name
James Business
Mike Law
Sally Engineering
'''
# staff_df 와 student_df를 인덱스를 기준으로 outer_join
outer_join = pd.merge(staff_df, student_df, how='outer', left_index=True, right_index=True)
print(outer_join)
'''
Role School
Name
James Grader Business
Kelly Director of HR NaN
Mike NaN Law
Sally Course liasion Engineering
'''
# staff_df 와 student_df를 인덱스를 기준으로 inner_join
inner_join = pd.merge(staff_df, student_df, how='inner', left_index=True, right_index=True)
print(inner_join)
'''
Role School
Name
Sally Course liasion Engineering
James Grader Business
'''
# staff_df 와 student_df를 인덱스를 기준으로 left_join
left_join = pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
print(left_join)
'''
Role School
Name
Kelly Director of HR NaN
Sally Course liasion Engineering
James Grader Busines
'''
# staff_df 와 student_df를 인덱스를 기준으로 right_join
right_join = pd.merge(staff_df, student_df, how='right', left_index=True, right_index=True)
print(right_join)
'''
Role School
Name
James Grader Business
Mike NaN Law
Sally Course liasion Engineering
'''
staff_df = staff_df.reset_index()
student_df = student_df.reset_index()
# Name 컬럼을 기준으로 left 조인
left_join2 = pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
print(left_join2)
'''
Name Role School
0 Kelly Director of HR NaN
1 Sally Course liasion Engineering
2 James Grader Business
'''
staff_df = pd.DataFrame([{'First Name': 'Kelly', 'Last Name': 'Desjardins', 'Role': 'Director of HR'},
{'First Name': 'Sally', 'Last Name': 'Brooks', 'Role': 'Course liasion'},
{'First Name': 'James', 'Last Name': 'Wilde', 'Role': 'Grader'}])
student_df = pd.DataFrame([{'First Name': 'James', 'Last Name': 'Hammond', 'School': 'Business'},
{'First Name': 'Mike', 'Last Name': 'Smith', 'School': 'Law'},
{'First Name': 'Sally', 'Last Name': 'Brooks', 'School': 'Engineering'}])
# 복수의 컬럼을 기준으로 join
multiple_join = pd.merge(staff_df, student_df, how='inner', left_on=['First Name','Last Name'], \
right_on=['First Name','Last Name'])
print(multiple_join)
'''
First Name Last Name Role School
0 Sally Brooks Course liasion Engineering
'''
참고자료 : https://www.coursera.org/learn/python-data-analysis
출처: https://engkimbs.tistory.com/689?category=763908 [새로비]