Stock Market Analysis

This project analyzed historical stock price data for major companies and the S&P 500 index to understand price movements, correlations, and daily return patterns. I built an interactive dashboard using Python's data science stack to visualize both raw and normalized stock performance alongside risk metrics.

💻 Tech Stack:

Python for data manipulation and analysis
Pandas for computations
Matplotlib & Seaborn for static visualisations
Plotly for interactive charts and dashboards
NumPy & SciPy for statistical computations

🧪 Data Pipeline:

Explore data: Loaded stock price data using pd.read_csv() and explored the dataset structure with .info(), .describe(), and .head() to understand the time series format and identify key stocks. Checked for missing values using .isnull().sum() and calculated basic statistics like mean returns and standard deviation to assess data completeness and variability.
Price Normalisation: Created a custom normalize() function to standardize all stock prices to their starting values, enabling fair comparison of relative performance across different price ranges.
Daily Returns Calculations: Built a daily_return() function using nested loops to compute percentage daily returns: ((current_price - previous_price) / previous_price) * 100 for each stock.
Visualisation: Developed reusable plotting functions show_plot() and interactive_plot() to create both static matplotlib charts and interactive Plotly visualizations for raw prices, normalized prices, and daily returns.
Correlation Analysis: Generated a correlation matrix using .corr() and visualized it with a Seaborn heatmap to identify relationships between stock movements.
Distribution Analysis: Created histograms and compiled distribution plots using Plotly's create_distplot() to analyze the statistical properties of daily returns.

📊 Code Snippets & Visualisations:

def show_plot(df, title):
    df.plot(x='Date', figsize=(12, 8), linewidth=3, title=title)
    plt.xlabel('Date')
    plt.ylabel('Price')
    plt.grid()
    plt.show()

# Plot the data (Figure 1)
show_plot(stocks_df, 'STOCKS DATA')

# Normalized Stock Data (Figure 2)
def normalize(df):
    x = df.copy()
    for i in x.columns[1:]:
        x[i] = x[i] / x[i][0]
    return x

normalize(stocks_df)

# Create Interactive chart of Stock Data (Figure 3)
def interactive_plot(df, title):
    fig = px.line(title=title)
    for i in df.columns[1:]:
        fig.add_scatter(x=df['Date'], y=df[i], name=i)
    fig.update_layout(
        xaxis_title="Date",
        yaxis_title="Price"
    )
    fig.show()

interactive_plot(stocks_df, 'STOCKS DATA')

# Create Interactive chart of Normalized Stock Data (Figure 4)
interactive_plot(normalize(stocks_df), 'STOCKS DATA')

# Calculate stocks daily returns
def daily_return(df):
    df_daily_return = df.copy()
    for i in df.columns[1:]:  # loop through columns
        for j in range(1, len(df)):  # loop through rows
            df_daily_return[i][j] = ((df[i][j] - df[i][j - 1]) / df[i][j - 1]) * 100
        df_daily_return[i][0] = 0
    return df_daily_return

# Get the daily returns (Figure 5)
stocks_daily_return = daily_return(stocks_df)
stocks_daily_return

interactive_plot(stocks_daily_return, 'Stocks Daily returns')

# Daily Return Correlation
cm = stocks_daily_return.drop(columns=['Date']).corr()
cm

# Heatmap showing correlations (Figure 6)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, cmap='RdYlGn')  # `annot=True` displays values on the heatmap
plt.show()

# Histogram of daily returns (Figure 7)
stocks_daily_return.hist(bins=50, figsize=(20, 10))
plt.show()

Figure 4: stocks data normal — **Figure 4**

🌟 Key Insights:

Stock movements are highly correlated across companies, indicating that broader market forces often drive price trends rather than company-specific factors.
Daily returns fluctuate far more than overall price trends suggest, revealing short-term volatility that long-term averages tend to conceal.
Volatility patterns differ by stock, with some showing consistently wider swings in daily returns — signalling higher risk and potential reward compared to more stable peers.

🧗🏾 Challenge Faced:

The daily returns calculation initially produced incorrect values for the first row of each stock. After debugging, the issue was that there's no previous day to calculate a return from for the first entry. This was solved by explicitly setting the first day's return to 0 using df_daily_return[i][0] = 0 after the loop calculation, ensuring accurate percentage calculations for all subsequent days.

View on GitHub

← Back to Projects