Global Population Trends Exploration

This project used World Bank development indicators to explore trends in population growth, urbanisation, and fertility rates across continents and income groups. The objective was to uncover insights about global development patterns over time using Python.

💻 Tech Stack:

Python for data handling and exploration

Pandas for data manipulation

Matplotlib for visualisations

🧪 Data Pipeline:

Load & inspect data: Loaded the dataset using pd.read_csv() and inspected structure with .info() and .head() to understand column types and missing data.

Cleaning: Renamed columns, removed irrelevant rows, and addressed missing values for smoother analysis.

Initial Exploaration: Examined fertility rates, population growth, and urban population across income levels and continents.

Grouping & Summarisation: Used groupby() and mean() to aggregate indicators by continent and income level.

Visualisation: Created scatter plots, line plots, and box plots to reveal relationships between population metrics and economic status.

📊 Code Snippets & Visualisations:

# Plot the BirthRate versus Internet Users categorised by Income Group (Figure 1) vis1 = sns.lmplot( data=data, x='BirthRate', y='InternetUsers', fit_reg=False, hue='IncomeGroup', height=10 ) # Create the dataframe country_data = pd.DataFrame({ 'CountryName': np.array(Countries_2012_Dataset), 'CountryCode': np.array(Codes_2012_Dataset), 'CountryRegion': np.array(Regions_2012_Dataset) }) # Merge country data to the original dataframe (Table 1) merged_data = pd.merge( left=data, right=country_data, how='inner', on='CountryCode' ) merged_data.head() # Create a data frame with the life expectancy life_exp_data = pd.DataFrame({ 'CountryCode': np.array(Country_Code), 'LifeExp1960': np.array(Life_Expectancy_At_Birth_1960), 'LifeExp2013': np.array(Life_Expectancy_At_Birth_2013) }) # Merge the data frame with the life expectancy merged_data1 = pd.merge( left=merged_data, right=life_exp_data, how='inner', on='CountryCode' ) # Explore the dataset (Table 2) merged_data1.head() # Plot the BirthRate versus LifeExpectancy categorised by Country Region in 1960 (Figure 3) vis3 = sns.lmplot( data=merged_data1, x='BirthRate', y='LifeExp1960', fit_reg=False, hue='CountryRegion', height=10 ) # Plot the BirthRate versus LifeExpectancy categorised by Country Region in 2013 (Figure 4) vis4 = sns.lmplot( data=merged_data1, x='BirthRate', y='LifeExp2013', fit_reg=False, hue='CountryRegion', height=10 )

Table 1: Merged DF — **Table 1** Merged DF

Table 2: Merged DF 2 — **Table 2** Merged DF 2

Figure 1: BirthRate versus Internet Users categorised by Income Group — **Figure 1** BirthRate versus Internet Users categorised by Income Group

Figure 2: BirthRate versus Internet Users categorised by Country Region — **Figure 2** BirthRate versus Internet Users categorised by Country Region

Figure 3: BirthRate versus LifeExpectancy in 1960 — **Figure 3** BirthRate versus LifeExpectancy in 1960

Figure 4: BirthRate versus LifeExpectancy in 2013 — **Figure 4** BirthRate versus LifeExpectancy in 2013

🌟 Key Insights:

Countries with lower income levels showed higher fertility rates and population growth.

Urban population tends to correlate with income level, especially in developed regions.

Africa stands out with higher fertility rates and population growth compared to other continents.

🧗🏾 Challenge Faced:

Filtering and reshaping the dataset for multi-variable analysis was complex due to inconsistent column names and missing data. I solved this by methodically renaming columns and using .dropna() to exclude incomplete records while maintaining dataset integrity.

Global Population Trends Exploration

💻 Tech Stack:

🧪 Data Pipeline:

📊 Code Snippets & Visualisations:

🌟 Key Insights:

🧗🏾 Challenge Faced: