HR Salary Prediction: Model Comparison

📽️ HR Salary Prediction: Model Comparison

This project implemented and compared five different regression algorithms to predict employee salaries based on position levels within an organization. I built a comprehensive machine learning pipeline comparing linear regression, polynomial regression, support vector regression (SVR), decision tree regression, and random forest regression to identify the optimal model for HR compensation analysis.

💻 Tech Stack:

Python for machine learning model development and comparison

Scikit-learn for multiple regression algorithms, feature scaling, and model training

Pandas for dataset loading and initial data exploration

Matplotlib for data visualisations

NumPy for numerical operations and grid generation

🧪 Data Pipeline:

Data Preparation: Loaded position-salary dataset using pd.read_csv() and extracted features using iloc[:, 1:-1] (position levels) and target variable using iloc[:, -1] (salaries), strategically excluding the first column containing position titles.

Linear regression baseline: Implemented a standardLinearRegression() model using .fit(X, y) to establish a baseline for salary prediction based on position level with a straight-line relationship.

Polynomial feature engineering: Applied PolynomialFeatures(degree=4) to transform the single position level feature into polynomial terms (x, x², x³, x⁴), creating a richer feature space to capture non-linear salary progression patterns.

Support Vector Regression: Implemented feature scaling using StandardScaler()for both X and y variables, then trained an SVR model with RBF kernel to handle non-linear relationships while managing the different scales between position levels and salary amounts.

Decision Tree : Built a DecisionTreeRegressor() model that creates hierarchical decision rules to predict salaries, capturing complex non-linear patterns without requiring feature scaling.

Random Forest Regression: Implemented RandomForestRegressor() with multiple decision trees to reduce overfitting and improve prediction stability through ensemble learning.

SVR Inverse scaling: Applied sc_X.inverse_transform() and sc_y.inverse_transform() to convert scaled predictions back to original salary units, with proper reshaping using .reshape(-1, 1) for visualization.

Model Visualisations: Created scatter plots with plt.scatter() for actual data points and plt.plot() for the linear regression line, showing the limitation of straight-line salary prediction. Generated similar visualizations for the polynomial model using lin_reg_2.predict(poly_reg.fit_transform(X)) to display the curved relationship between position levels and salaries.

Model comparison: Made direct salary predictions for position level 6.5 using both lin_reg.predict([[6.5]]) and lin_reg_2.predict(poly_reg.fit_transform([[6.5]])) to compare model outputs for intermediate position levels.

📊 Code Snippets & Visualisations:

# Importing Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing dataset dataset = pd.read_csv('Position_Salaries.csv') X = dataset.iloc[:, 1:-1].values y = dataset.iloc[:, -1].values # Training the Decision Tree Regression from sklearn.tree import DecisionTreeRegressor regressor = DecisionTreeRegressor(random_state = 0) regressor.fit(X, y) # Predicting New Result regressor.predict([[6.5]]) # Visualising Results (Figure 1) X_grid = np.arange(min(X), max(X), 0.01) # 0.01 was adjusted from 0.1 to increase the resolution X_grid = X_grid.reshape((len(X_grid), 1)) plt.scatter(X, y, color = 'red') plt.plot(X_grid, regressor.predict(X_grid), color = 'blue') plt.title('HR Salary Predictions (Decision Tree Regression)') plt.xlabel('Position level') plt.ylabel('Salary') plt.show() # Evaluating the Model Performance from sklearn.metrics import r2_score # Since the model was trained on the whole dataset, we evaluate on the whole dataset y_pred = regressor.predict(X) r2_score(y, y_pred)

Linear Regression Model — **Figure 1** Linear Regression

Linear Regression Evaluation — **Figure 1a** Model Evaluation – Linear Regression

Polynomial Regression Model — **Figure 2** Polynomial Regression

Polynomial Regression Evaluation — **Figure 2a** Model Evaluation – Polynomial Regression

SVR Evaluation — **Figure 3a** Model Evaluation – SVR

Random Forest Model — **Figure 4** Random Forest

Random Forest Evaluation — **Figure 4a** Model Evaluation – Random Forest

Decision Tree Model — **Figure 5** Decision Tree

Decision Tree Evaluation — **Figure 5a** Model Evaluation – Decision Tree

🌟 Key Insights:

N.B: Due to small dataset there is no train-test split to avoid model overfitting

Linear regression showed limitations in capturing the exponential nature of executive compensation at higher position levels

Polynomial regression successfully modeled smooth non-linear salary curves typical in corporate hierarchies

SVR with proper scaling handled the high salary variance effectively while maintaining smooth predictions

Decision tree regression captured salary jumps at specific position levels but risked overfitting

Random forest regression provided stable predictions by averaging multiple decision trees, reducing variance

🧗🏾 Challenge Faced:

The SVR model visualization presented scaling complications because support vector regression requires feature scaling for optimal performance, but the visualization needed to display results in original salary units. The challenge was handling the forward and inverse transformations correctly. This was resolved by implementing a multi-step process: using sc_X.transform(X_grid) to scale the grid for SVR prediction, then applying sc_y.inverse_transform() to convert predictions back to actual salary values, with careful attention to array reshaping using .reshape(-1, 1) to maintain proper dimensionality throughout the scaling pipeline. This approach ensured accurate model performance while maintaining interpretable visualizations in original salary units.

HR Salary Prediction: Model Comparison

📽️ HR Salary Prediction: Model Comparison

💻 Tech Stack:

🧪 Data Pipeline:

📊 Code Snippets & Visualisations:

🌟 Key Insights:

🧗🏾 Challenge Faced: