3.2 GeoPandas - Obyvatelstvo zemí EU (pokračování)¶
Navážeme na předchozí cvičení.
Nahrání dat¶
In [1]:
Copied!
import pandas as pd
filename = "demo_r_pjangrp3$defaultview_linear.csv.gz"
nuts_data = pd.read_csv(filename)
nuts_data.head()
import pandas as pd
filename = "demo_r_pjangrp3$defaultview_linear.csv.gz"
nuts_data = pd.read_csv(filename)
nuts_data.head()
Out[1]:
| DATAFLOW | LAST UPDATE | freq | sex | unit | age | geo | TIME_PERIOD | OBS_VALUE | OBS_FLAG | CONF_STATUS | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ESTAT:DEMO_R_PJANGRP3$DEFAULTVIEW(1.0) | 14/10/25 23:00:00 | A:Annual | F:Females | NR:Number | TOTAL:Total | AL:Albania | 2015 | 1424597 | NaN | NaN |
| 1 | ESTAT:DEMO_R_PJANGRP3$DEFAULTVIEW(1.0) | 14/10/25 23:00:00 | A:Annual | F:Females | NR:Number | TOTAL:Total | AL:Albania | 2016 | 1417141 | NaN | NaN |
| 2 | ESTAT:DEMO_R_PJANGRP3$DEFAULTVIEW(1.0) | 14/10/25 23:00:00 | A:Annual | F:Females | NR:Number | TOTAL:Total | AL:Albania | 2017 | 1423050 | NaN | NaN |
| 3 | ESTAT:DEMO_R_PJANGRP3$DEFAULTVIEW(1.0) | 14/10/25 23:00:00 | A:Annual | F:Females | NR:Number | TOTAL:Total | AL:Albania | 2018 | 1431715 | NaN | NaN |
| 4 | ESTAT:DEMO_R_PJANGRP3$DEFAULTVIEW(1.0) | 14/10/25 23:00:00 | A:Annual | F:Females | NR:Number | TOTAL:Total | AL:Albania | 2019 | 1432833 | NaN | NaN |
V tabulce chybí geometrická část popisu NUTS jednotek. Geodata NUTS jednotek stáhněme ve formátu OGC GeoPackage ze stránek (Eurostatu). Data načteme pomocí metody read_file() knihovny GeoPandas.
In [2]:
Copied!
import geopandas as gpd
nuts = gpd.read_file("NUTS_RG_20M_2024_3035.gpkg")
nuts.head()
import geopandas as gpd
nuts = gpd.read_file("NUTS_RG_20M_2024_3035.gpkg")
nuts.head()
Out[2]:
| NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | AL011 | 3 | AL | Dibër | Dibër | NaN | NaN | NaN | POLYGON ((5169746.713 2142724.535, 5198279.451... |
| 1 | AL012 | 3 | AL | Durrës | Durrës | NaN | NaN | NaN | POLYGON ((5118781.291 2103409.375, 5141704.829... |
| 2 | AL013 | 3 | AL | Kukës | Kukës | NaN | NaN | NaN | POLYGON ((5200419.912 2147880.017, 5198279.451... |
| 3 | AL014 | 3 | AL | Lezhë | Lezhë | NaN | NaN | NaN | POLYGON ((5112296.856 2131907.92, 5108760.398 ... |
| 4 | AL015 | 3 | AL | Shkodër | Shkodër | NaN | NaN | NaN | POLYGON ((5100056.794 2130793.954, 5098686.356... |
Vybere prvky na NUTS urovni 0.
In [3]:
Copied!
nuts0 = nuts[nuts["LEVL_CODE"] == 0]
nuts0.head()
nuts0 = nuts[nuts["LEVL_CODE"] == 0]
nuts0.head()
Out[3]:
| NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| 79 | SI | 0 | SI | Slovenija | Slovenija | NaN | NaN | NaN | POLYGON ((4786947.85 2658724.94, 4788640.346 2... |
| 80 | SK | 0 | SK | Slovensko | Slovensko | NaN | NaN | NaN | POLYGON ((5019481.071 2968215.317, 5028815.352... |
| 111 | TR | 0 | TR | Türkiye | Türkiye | NaN | NaN | NaN | MULTIPOLYGON (((6366766.004 2448944.963, 63694... |
| 137 | UA | 0 | UA | Ukraina | Україна | NaN | NaN | NaN | MULTIPOLYGON (((5885782.205 3503801.659, 58987... |
| 138 | XK | 0 | XK | Kosovo* | Kosovo* | NaN | NaN | NaN | POLYGON ((5254073.84 2267297.916, 5255677.894 ... |
Vybrané prvky zobrazíme v interaktivní mapě pomocí metody explore().
In [4]:
Copied!
nuts0.explore()
nuts0.explore()
Out[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Úkol: Porovnej s metodou plot().
Přidání informací o obyvatelstvu¶
Převezmeme funkci population_by_nuts() z minulého cvičení.
In [5]:
Copied!
def population_by_nuts(df, year, level=0, sex='T:Total', nuts=None):
data = df[(df["TIME_PERIOD"] == year) & (df["sex"] == sex) & (df["age"] == 'TOTAL:Total') &
(df["geo"].apply(lambda x: len(x.split(":")[0]) == level+2))].copy()
data["NUTS_ID"] = data["geo"].apply(lambda x: x.split(":")[0])
if nuts is not None:
data = data[data["NUTS_ID"].str.match(nuts)]
return data[["NUTS_ID", "OBS_VALUE"]]
population_by_nuts(nuts_data, 2024).head()
def population_by_nuts(df, year, level=0, sex='T:Total', nuts=None):
data = df[(df["TIME_PERIOD"] == year) & (df["sex"] == sex) & (df["age"] == 'TOTAL:Total') &
(df["geo"].apply(lambda x: len(x.split(":")[0]) == level+2))].copy()
data["NUTS_ID"] = data["geo"].apply(lambda x: x.split(":")[0])
if nuts is not None:
data = data[data["NUTS_ID"].str.match(nuts)]
return data[["NUTS_ID", "OBS_VALUE"]]
population_by_nuts(nuts_data, 2024).head()
Out[5]:
| NUTS_ID | OBS_VALUE | |
|---|---|---|
| 38761 | AT | 9158750 |
| 39241 | BE | 11817096 |
| 39879 | BG | 6445481 |
| 40249 | CH | 8962258 |
| 40599 | CY | 966365 |
In [6]:
Copied!
popu = population_by_nuts(nuts_data, 2024)
nuts0_popu = pd.merge(nuts0, popu, how='left', left_on='NUTS_ID', right_on='NUTS_ID')
nuts0_popu.head()
popu = population_by_nuts(nuts_data, 2024)
nuts0_popu = pd.merge(nuts0, popu, how='left', left_on='NUTS_ID', right_on='NUTS_ID')
nuts0_popu.head()
Out[6]:
| NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | geometry | OBS_VALUE | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SI | 0 | SI | Slovenija | Slovenija | NaN | NaN | NaN | POLYGON ((4786947.85 2658724.94, 4788640.346 2... | 2123949.0 |
| 1 | SK | 0 | SK | Slovensko | Slovensko | NaN | NaN | NaN | POLYGON ((5019481.071 2968215.317, 5028815.352... | 5424687.0 |
| 2 | TR | 0 | TR | Türkiye | Türkiye | NaN | NaN | NaN | MULTIPOLYGON (((6366766.004 2448944.963, 63694... | 85372377.0 |
| 3 | UA | 0 | UA | Ukraina | Україна | NaN | NaN | NaN | MULTIPOLYGON (((5885782.205 3503801.659, 58987... | NaN |
| 4 | XK | 0 | XK | Kosovo* | Kosovo* | NaN | NaN | NaN | POLYGON ((5254073.84 2267297.916, 5255677.894 ... | NaN |
Zobrazíme prvky znázorňující počet obyvatel.
In [7]:
Copied!
nuts0_popu.explore(column='OBS_VALUE', legend=False, cmap='OrRd')
nuts0_popu.explore(column='OBS_VALUE', legend=False, cmap='OrRd')
Out[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Definujeme novou funkci, která výpočet automatizuje.
In [8]:
Copied!
def nuts_population(df_geo, df_data, year, level):
nuts_level = df_geo[df_geo["LEVL_CODE"] == level]
popu = population_by_nuts(df_data, year, level)
return pd.merge(nuts_level, popu, how='left', left_on='NUTS_ID', right_on='NUTS_ID')
nuts_popu = nuts_population(nuts, nuts_data, 2024, 1)
nuts_popu.explore(column='OBS_VALUE', legend=False, cmap='OrRd')
def nuts_population(df_geo, df_data, year, level):
nuts_level = df_geo[df_geo["LEVL_CODE"] == level]
popu = population_by_nuts(df_data, year, level)
return pd.merge(nuts_level, popu, how='left', left_on='NUTS_ID', right_on='NUTS_ID')
nuts_popu = nuts_population(nuts, nuts_data, 2024, 1)
nuts_popu.explore(column='OBS_VALUE', legend=False, cmap='OrRd')
Out[8]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [9]:
Copied!
nuts_population(nuts, nuts_data, 2015, 2).explore(column='OBS_VALUE', legend=False, cmap='OrRd')
nuts_population(nuts, nuts_data, 2015, 2).explore(column='OBS_VALUE', legend=False, cmap='OrRd')
Out[9]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Hustota obyvatelstva¶
In [10]:
Copied!
nuts0_popu.area.head()
nuts0_popu.area.head()
Out[10]:
0 1.995113e+10 1 4.895014e+10 2 7.820335e+11 3 5.990874e+11 4 1.109603e+10 dtype: float64
Vypočteme hustotu obyvatelstva (počet osob/km2).
In [11]:
Copied!
(nuts0_popu["OBS_VALUE"] / (nuts0_popu.area / 1e6)).head(10)
(nuts0_popu["OBS_VALUE"] / (nuts0_popu.area / 1e6)).head(10)
Out[11]:
0 106.457555 1 110.820661 2 109.167156 3 NaN 4 NaN 5 NaN 6 109.142504 7 NaN 8 383.947907 9 57.678605 dtype: float64
In [12]:
Copied!
nuts0_popu["density"] = nuts0_popu["OBS_VALUE"] / (nuts0_popu.area / 1e6)
nuts0_popu.head()
nuts0_popu["density"] = nuts0_popu["OBS_VALUE"] / (nuts0_popu.area / 1e6)
nuts0_popu.head()
Out[12]:
| NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | geometry | OBS_VALUE | density | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SI | 0 | SI | Slovenija | Slovenija | NaN | NaN | NaN | POLYGON ((4786947.85 2658724.94, 4788640.346 2... | 2123949.0 | 106.457555 |
| 1 | SK | 0 | SK | Slovensko | Slovensko | NaN | NaN | NaN | POLYGON ((5019481.071 2968215.317, 5028815.352... | 5424687.0 | 110.820661 |
| 2 | TR | 0 | TR | Türkiye | Türkiye | NaN | NaN | NaN | MULTIPOLYGON (((6366766.004 2448944.963, 63694... | 85372377.0 | 109.167156 |
| 3 | UA | 0 | UA | Ukraina | Україна | NaN | NaN | NaN | MULTIPOLYGON (((5885782.205 3503801.659, 58987... | NaN | NaN |
| 4 | XK | 0 | XK | Kosovo* | Kosovo* | NaN | NaN | NaN | POLYGON ((5254073.84 2267297.916, 5255677.894 ... | NaN | NaN |
In [13]:
Copied!
nuts0_popu.dropna(subset=["OBS_VALUE"], inplace=True)
nuts0_popu.explore(column='density', legend=True, cmap='OrRd')
nuts0_popu.dropna(subset=["OBS_VALUE"], inplace=True)
nuts0_popu.explore(column='density', legend=True, cmap='OrRd')
Out[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [14]:
Copied!
vmax = nuts0_popu["density"].drop(nuts0_popu["density"].idxmax()).max()
nuts0_popu.explore(column='density', legend=True, cmap='OrRd', vmax=vmax)
vmax = nuts0_popu["density"].drop(nuts0_popu["density"].idxmax()).max()
nuts0_popu.explore(column='density', legend=True, cmap='OrRd', vmax=vmax)
Out[14]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Časová řada¶
In [15]:
Copied!
level_code = 0
nuts_dens = nuts[nuts["LEVL_CODE"] == level_code].copy()
nuts_dens.set_index("NUTS_ID", inplace=True)
years = nuts_data["TIME_PERIOD"].unique()
for year in years:
print(f"{year}...")
nuts_popu_year = nuts_population(nuts, nuts_data, year, level_code)
nuts_popu_year.set_index("NUTS_ID", inplace=True)
nuts_dens[f"density_{year}"] = nuts_popu_year["OBS_VALUE"] / (nuts_popu_year.area / 1e6)
level_code = 0
nuts_dens = nuts[nuts["LEVL_CODE"] == level_code].copy()
nuts_dens.set_index("NUTS_ID", inplace=True)
years = nuts_data["TIME_PERIOD"].unique()
for year in years:
print(f"{year}...")
nuts_popu_year = nuts_population(nuts, nuts_data, year, level_code)
nuts_popu_year.set_index("NUTS_ID", inplace=True)
nuts_dens[f"density_{year}"] = nuts_popu_year["OBS_VALUE"] / (nuts_popu_year.area / 1e6)
2015... 2016... 2017... 2018... 2019... 2020... 2021... 2022... 2023... 2024...
In [16]:
Copied!
nuts_dens.head()
nuts_dens.head()
Out[16]:
| LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | geometry | density_2015 | density_2016 | density_2017 | density_2018 | density_2019 | density_2020 | density_2021 | density_2022 | density_2023 | density_2024 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NUTS_ID | ||||||||||||||||||
| SI | 0 | SI | Slovenija | Slovenija | NaN | NaN | NaN | POLYGON ((4786947.85 2658724.94, 4788640.346 2... | 103.396326 | 103.462187 | 103.547746 | 103.597116 | 104.300234 | 105.049715 | 105.707122 | 105.617051 | 106.107851 | 106.457555 |
| SK | 0 | SK | Slovensko | Slovensko | NaN | NaN | NaN | POLYGON ((5019481.071 2968215.317, 5028815.352... | 110.752469 | 110.852632 | 111.038352 | 111.197228 | 111.346380 | 111.498616 | 111.537595 | 111.025461 | 110.904522 | 110.820661 |
| TR | 0 | TR | Türkiye | Türkiye | NaN | NaN | NaN | MULTIPOLYGON (((6366766.004 2448944.963, 63694... | 99.351115 | 100.687566 | 102.060676 | 103.333836 | 104.859802 | 106.331753 | 106.919151 | 108.282150 | 109.048460 | 109.167156 |
| UA | 0 | UA | Ukraina | Україна | NaN | NaN | NaN | MULTIPOLYGON (((5885782.205 3503801.659, 58987... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| XK | 0 | XK | Kosovo* | Kosovo* | NaN | NaN | NaN | POLYGON ((5254073.84 2267297.916, 5255677.894 ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
In [17]:
Copied!
def show_density(df, year, base="density"):
column = f"{base}_{year}"
return df.explore(column, legend=True, cmap='OrRd')
show_density(nuts_dens, 2024)
def show_density(df, year, base="density"):
column = f"{base}_{year}"
return df.explore(column, legend=True, cmap='OrRd')
show_density(nuts_dens, 2024)
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [18]:
Copied!
for year in years[1:]:
print(f"{year}...")
prev_year = year - 1
nuts_dens[f"density_trend_{year}"] = nuts_dens[f"density_{year}"] - nuts_dens[f"density_{prev_year}"]
nuts_dens.head()
for year in years[1:]:
print(f"{year}...")
prev_year = year - 1
nuts_dens[f"density_trend_{year}"] = nuts_dens[f"density_{year}"] - nuts_dens[f"density_{prev_year}"]
nuts_dens.head()
2016... 2017... 2018... 2019... 2020... 2021... 2022... 2023... 2024...
Out[18]:
| LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | geometry | density_2015 | density_2016 | ... | density_2024 | density_trend_2016 | density_trend_2017 | density_trend_2018 | density_trend_2019 | density_trend_2020 | density_trend_2021 | density_trend_2022 | density_trend_2023 | density_trend_2024 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NUTS_ID | |||||||||||||||||||||
| SI | 0 | SI | Slovenija | Slovenija | NaN | NaN | NaN | POLYGON ((4786947.85 2658724.94, 4788640.346 2... | 103.396326 | 103.462187 | ... | 106.457555 | 0.065861 | 0.085559 | 0.049371 | 0.703118 | 0.749481 | 0.657406 | -0.090070 | 0.490799 | 0.349704 |
| SK | 0 | SK | Slovensko | Slovensko | NaN | NaN | NaN | POLYGON ((5019481.071 2968215.317, 5028815.352... | 110.752469 | 110.852632 | ... | 110.820661 | 0.100163 | 0.185720 | 0.158876 | 0.149152 | 0.152237 | 0.038978 | -0.512133 | -0.120939 | -0.083861 |
| TR | 0 | TR | Türkiye | Türkiye | NaN | NaN | NaN | MULTIPOLYGON (((6366766.004 2448944.963, 63694... | 99.351115 | 100.687566 | ... | 109.167156 | 1.336450 | 1.373110 | 1.273160 | 1.525967 | 1.471951 | 0.587398 | 1.362999 | 0.766310 | 0.118696 |
| UA | 0 | UA | Ukraina | Україна | NaN | NaN | NaN | MULTIPOLYGON (((5885782.205 3503801.659, 58987... | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| XK | 0 | XK | Kosovo* | Kosovo* | NaN | NaN | NaN | POLYGON ((5254073.84 2267297.916, 5255677.894 ... | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 27 columns
In [19]:
Copied!
show_density(nuts_dens, 2022, "density_trend")
show_density(nuts_dens, 2022, "density_trend")
Out[19]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [20]:
Copied!
def show_density_diff(df, start, end):
column = f"density_trend_{end}_{start}"
df[column] = df[f"density_{end}"] - df[f"density_{start}"]
return df.explore(column, legend=True, cmap='OrRd', vmax=df[column].mean())
show_density_diff(nuts_dens, 2015, 2024)
def show_density_diff(df, start, end):
column = f"density_trend_{end}_{start}"
df[column] = df[f"density_{end}"] - df[f"density_{start}"]
return df.explore(column, legend=True, cmap='OrRd', vmax=df[column].mean())
show_density_diff(nuts_dens, 2015, 2024)
Out[20]:
Make this Notebook Trusted to load map: File -> Trust Notebook