This project used World Bank development indicators to explore trends in population growth, urbanisation, and fertility rates across continents and income groups. The objective was to uncover insights about global development patterns over time using Python.
pd.read_csv() and inspected structure with .info() and .head() to understand column types and missing data. groupby() and mean() to aggregate indicators by continent and income level.# Plot the BirthRate versus Internet Users categorised by Income Group (Figure 1)
vis1 = sns.lmplot(
data=data,
x='BirthRate',
y='InternetUsers',
fit_reg=False,
hue='IncomeGroup',
height=10
)
# Create the dataframe
country_data = pd.DataFrame({
'CountryName': np.array(Countries_2012_Dataset),
'CountryCode': np.array(Codes_2012_Dataset),
'CountryRegion': np.array(Regions_2012_Dataset)
})
# Merge country data to the original dataframe (Table 1)
merged_data = pd.merge(
left=data,
right=country_data,
how='inner',
on='CountryCode'
)
merged_data.head()
# Create a data frame with the life expectancy
life_exp_data = pd.DataFrame({
'CountryCode': np.array(Country_Code),
'LifeExp1960': np.array(Life_Expectancy_At_Birth_1960),
'LifeExp2013': np.array(Life_Expectancy_At_Birth_2013)
})
# Merge the data frame with the life expectancy
merged_data1 = pd.merge(
left=merged_data,
right=life_exp_data,
how='inner',
on='CountryCode'
)
# Explore the dataset (Table 2)
merged_data1.head()
# Plot the BirthRate versus LifeExpectancy categorised by Country Region in 1960 (Figure 3)
vis3 = sns.lmplot(
data=merged_data1,
x='BirthRate',
y='LifeExp1960',
fit_reg=False,
hue='CountryRegion',
height=10
)
# Plot the BirthRate versus LifeExpectancy categorised by Country Region in 2013 (Figure 4)
vis4 = sns.lmplot(
data=merged_data1,
x='BirthRate',
y='LifeExp2013',
fit_reg=False,
hue='CountryRegion',
height=10
)
Filtering and reshaping the dataset for multi-variable analysis was complex due to inconsistent column names and missing data. I solved this by methodically renaming columns and using .dropna() to exclude incomplete records while maintaining dataset integrity.