Crawling/데이터 시각화

데이터시각화_10가지_Bubble chart

km1n 2022. 1. 6. 17:09

파이썬 시각화 차트 종류

1. Column/Bar chart
2. Dual Axis, 파레토 chart
3. Pie chart
4. Line chart
5. Scatter chart
6. Bubble chart
7. Heat map
8. Histogram
9. Box plot
10. Geo chart

 

 

예시 상황 : 2017년 보스톤 마라톤 비상상황에서 2시간 지난 후 앰뷸런스 5대를 어디에 배치시키는 것이 좋을까?
5K, 10K, 15K, 20K, 25K, 30K, 35K, 40K 8개 지점 중 어디가 적합할지 찾아보자
사람들이 많이 모이는 지점을 찾기

 

marathon_results_2017 = pd.read_csv('./data_boston/marathon_results_2017.csv')
marathon_results_2017.info()
marathon_2017 = marathon_results_2017.drop(['Unnamed: 0', 'Bib', 'Citizen', 'Unnamed: 9', 'Proj Time'], axis='columns')
marathon_2017.head()

 

초 단위로 바꾸기 : to_timedelta

# pd.to_timedelt() 적용하기
# .astype('m8[s]').astype(np.int64) 적용하기
# -> 초단위로 바꾼 후, int 형태로 변환
marathon_2017['5K'] = pd.to_timedelta(marathon_2017['5K']).astype('m8[s]').astype(np.int64)
marathon_2017['10K'] = pd.to_timedelta(marathon_2017['10K']).astype('m8[s]').astype(np.int64)
marathon_2017['15K'] = pd.to_timedelta(marathon_2017['15K']).astype('m8[s]').astype(np.int64)
marathon_2017['20K'] = pd.to_timedelta(marathon_2017['20K']).astype('m8[s]').astype(np.int64)
marathon_2017['Half'] = pd.to_timedelta(marathon_2017['Half']).astype('m8[s]').astype(np.int64)
marathon_2017['25K'] = pd.to_timedelta(marathon_2017['25K']).astype('m8[s]').astype(np.int64)
marathon_2017['30K'] = pd.to_timedelta(marathon_2017['30K']).astype('m8[s]').astype(np.int64)
marathon_2017['35K'] = pd.to_timedelta(marathon_2017['35K']).astype('m8[s]').astype(np.int64)
marathon_2017['40K'] = pd.to_timedelta(marathon_2017['40K']).astype('m8[s]').astype(np.int64)
marathon_2017['Pace'] = pd.to_timedelta(marathon_2017['Pace']).astype('m8[s]').astype(np.int64)
marathon_2017['Official Time'] = pd.to_timedelta(marathon_2017['Official Time']).astype('m8[s]').astype(np.int64)
marathon_2017.info()

 

2시간 정도 지났을 때 참가자들의 위치(Latitude, Longitude)를 파악해서 어느 위치에 몰려있는지 파악해보자

check_time = 7200    # 2시간
Lat = 0  # 초기값
Long = 0 # 초기값
Location = ''

# 5K, 10K, 15K, 20K, 25K, 30K, 35K, 40K
points = [[42.247835,-71.474357], [42.274032,-71.423979], [42.282364,-71.364801], [42.297870,-71.284260],
          [42.324830,-71.259660], [42.345680,-71.215169], [42.352089,-71.124947], [42.351510,-71.086980]]

 

# 206.py

marathon_location = pd.DataFrame(columns=['Lat','Long'])
marathon_location
marathon_2017.iterrows()
from tqdm import tqdm_notebook

# %%time
# 26000개 행을 돌면서 위치가 어디인지 판단한다.
# iterrows() : 각각의 행을 돌아라는 뜻
for index, record in tqdm_notebook(marathon_2017.iterrows()): # 각각 참가자들에 대해서
    if (record['40K'] < check_time):
        Lat = points[7][0]
        Long = points[7][1]
    elif (record['35K'] < check_time):
        Lat = points[6][0]
        Long = points[6][1]
    elif (record['30K'] < check_time):
        Lat = points[5][0]
        Long = points[5][1]
    elif (record['25K'] < check_time):
        Lat = points[4][0]
        Long = points[4][1]
    elif (record['20K'] < check_time):
        Lat = points[3][0]
        Long = points[3][1]
    elif (record['15K'] < check_time):
        Lat = points[2][0]
        Long = points[2][1]
    elif (record['10K'] < check_time):
        Lat = points[1][0]
        Long = points[1][1]
    elif (record['5K'] < check_time):
        Lat = points[0][0]
        Long = points[0][1]
    else:    
        Lat = points[0][0]
        Long = points[0][1]
    marathon_location = marathon_location.append({'Lat' : Lat, 'Long' : Long}, ignore_index=True)
print(len(marathon_location))
marathon_location
marathon_location.groupby(['Lat', 'Long']).size()
marathon_count = marathon_location.groupby(['Lat', 'Long']).size().reset_index(name='Count')
marathon_count

> 2시간 정도 지났을 때, 참가자들은 7개 정도 지역에 분포되어 있다

 

 

Bubble chart(지도) 그리기 : scatter chart의 응용버전

import matplotlib.pyplot as plt

# figure size 지정
plt.figure(figsize=(20, 10))

# scatter chart 적용
plt.scatter(marathon_count.Lat, marathon_count.Long, s=marathon_count.Count, alpha=0.5)

# 타이틀, 라벨 달기
plt.title('Runners location at 2nd hours')
plt.xlabel('Latitude')
plt.ylabel('Longitude')

# 위치 별 Count값 넣기
for i, txt in enumerate(marathon_count.Count):
    plt.annotate(txt, (marathon_count.Lat[i], marathon_count.Long[i]), fontsize=20)

plt.show()