Timesheet Analyses (Dashboard)¶

📦0. Load Excel Files¶

In [8]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

print("Project Pulse")
display(pulse_df.head())

print("Praise Story")
display(praise_df.head())
Project Pulse
STAGE STARTED AT ENDED AT DURATION (H) COST ($) REMARKS
0 Analysis 2025-04-05 18:30:00 2025-04-05 19:00:00 0.500000 13.000000 Planning
1 Analysis 2025-04-05 23:33:00 2025-04-05 23:37:00 0.066667 1.733333 Setup environment
2 Implementation 2025-04-06 07:30:00 2025-04-06 08:26:00 0.933333 24.266667 Setup project
3 Implementation 2025-04-06 08:26:00 2025-04-06 10:00:00 1.566667 40.733333 Add roles to User module
4 Implementation 2025-04-06 10:43:00 2025-04-06 11:36:00 0.883333 22.966667 Backend of Project module completed
Praise Story
STAGE STARTED AT ENDED AT DURATION (H) COST ($) REMARKS
0 Analysis 2025-05-06 20:28:00 2025-05-06 21:00:00 0.533333 13.866667 Logo
1 Analysis 2025-05-07 11:47:00 2025-05-07 12:25:00 0.633333 16.466667 Screen Plan
2 Analysis 2025-05-07 12:25:00 2025-05-07 12:33:00 0.133333 3.466667 Project Plan
3 Implementation 2025-05-12 10:10:00 2025-05-12 11:25:00 1.250000 32.500000 Testimony migration/Status helper
4 Implementation 2025-05-12 19:45:00 2025-05-12 22:04:00 2.316667 60.233333 Testimony backend module

📊 1. Stage-Level Summary¶

Goal: Understand how time and cost are distributed across different project stages. Method: Group by STAGE, then aggregate total DURATION (H) and COST ($).

In [9]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Group and aggregate total duration and cost by stage for both timesheets
pulse_summary = pulse_df.groupby("STAGE")[["DURATION (H)", "COST ($)"]].sum().reset_index()
praise_summary = praise_df.groupby("STAGE")[["DURATION (H)", "COST ($)"]].sum().reset_index()

# Display the results to the user
print("Pulse Summary")
display(pulse_summary.head())

print("Praise Summary")
display(praise_summary.head())
Pulse Summary
STAGE DURATION (H) COST ($)
0 Analysis 0.566667 14.733333
1 Implementation 72.133333 1875.466667
2 Testing 14.216667 369.633333
Praise Summary
STAGE DURATION (H) COST ($)
0 Analysis 1.300000 33.800000
1 Design 6.650000 172.900000
2 Implementation 52.966667 1377.133333
3 Testing 4.283333 111.366667

🪜 2. Cumulative Burndown¶

Goal: Track overall resource consumption over time. Method: Sort by STARTED AT and compute the cumulative sum of DURATION (H) and COST ($) over time.

In [4]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Ensure STARTED AT is in datetime format
pulse_df["STARTED AT"] = pd.to_datetime(pulse_df["STARTED AT"])
praise_df["STARTED AT"] = pd.to_datetime(praise_df["STARTED AT"])

# Sort by STARTED AT
pulse_sorted = pulse_df.sort_values("STARTED AT").copy()
praise_sorted = praise_df.sort_values("STARTED AT").copy()

# Compute cumulative sums
pulse_sorted["CUMULATIVE DURATION (H)"] = pulse_sorted["DURATION (H)"].cumsum()
pulse_sorted["CUMULATIVE COST ($)"] = pulse_sorted["COST ($)"].cumsum()

praise_sorted["CUMULATIVE DURATION (H)"] = praise_sorted["DURATION (H)"].cumsum()
praise_sorted["CUMULATIVE COST ($)"] = praise_sorted["COST ($)"].cumsum()

# Keep only relevant columns for display
pulse_cumulative = pulse_sorted[["STARTED AT", "CUMULATIVE DURATION (H)", "CUMULATIVE COST ($)"]]
praise_cumulative = praise_sorted[["STARTED AT", "CUMULATIVE DURATION (H)", "CUMULATIVE COST ($)"]]

# Display the results to the user
print("Pulse Cumulative")
display(pulse_cumulative.head())

print("Praise Cumulative")
display(praise_cumulative.head())
Pulse Cumulative
STARTED AT CUMULATIVE DURATION (H) CUMULATIVE COST ($)
0 2025-04-05 18:30:00 0.500000 13.000000
1 2025-04-05 23:33:00 0.566667 14.733333
2 2025-04-06 07:30:00 1.500000 39.000000
3 2025-04-06 08:26:00 3.066667 79.733333
4 2025-04-06 10:43:00 3.950000 102.700000
Praise Cumulative
STARTED AT CUMULATIVE DURATION (H) CUMULATIVE COST ($)
0 2025-05-06 20:28:00 0.533333 13.866667
1 2025-05-07 11:47:00 1.166667 30.333333
2 2025-05-07 12:25:00 1.300000 33.800000
3 2025-05-12 10:10:00 2.550000 66.300000
4 2025-05-12 19:45:00 4.866667 126.533333

📅 3. Daily Activity Trend¶

Goal: Identify which days were the most active. Method: Extract date from STARTED AT, then group by date and sum DURATION (H).

In [6]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Extract date from STARTED AT
pulse_df["DATE"] = pulse_df["STARTED AT"].dt.date
praise_df["DATE"] = praise_df["STARTED AT"].dt.date

# Group by date and sum duration
pulse_daily_activity = pulse_df.groupby("DATE")["DURATION (H)"].sum().reset_index()
praise_daily_activity = praise_df.groupby("DATE")["DURATION (H)"].sum().reset_index()

# Rename for clarity
pulse_daily_activity.columns = ["DATE", "TOTAL DURATION (H)"]
praise_daily_activity.columns = ["DATE", "TOTAL DURATION (H)"]

# Display the results to the user
print("Pulse Daily Activity")
display(pulse_daily_activity.head())

print("Praise Daily Activity")
display(praise_daily_activity.head())
Pulse Daily Activity
DATE TOTAL DURATION (H)
0 2025-04-05 0.566667
1 2025-04-06 5.433333
2 2025-04-07 0.750000
3 2025-04-09 5.683333
4 2025-04-10 5.733333
Praise Daily Activity
DATE TOTAL DURATION (H)
0 2025-05-06 0.533333
1 2025-05-07 0.766667
2 2025-05-12 3.566667
3 2025-05-13 1.816667
4 2025-05-14 4.066667

🕐 4. Hourly Productivity Heatmap¶

Goal: Determine peak working hours. Method: Extract hour from STARTED AT, then plot frequency or total duration per hour.

In [10]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Extract hour from STARTED AT
pulse_df["HOUR"] = pulse_df["STARTED AT"].dt.hour
praise_df["HOUR"] = praise_df["STARTED AT"].dt.hour

# Group by hour and sum duration
pulse_hourly_activity = pulse_df.groupby("HOUR")["DURATION (H)"].sum().reset_index()
praise_hourly_activity = praise_df.groupby("HOUR")["DURATION (H)"].sum().reset_index()

# Rename columns for clarity
pulse_hourly_activity.columns = ["HOUR", "TOTAL DURATION (H)"]
praise_hourly_activity.columns = ["HOUR", "TOTAL DURATION (H)"]

# Display the results to the user
print("Pulse Hourly Activity")
display(pulse_hourly_activity.head())

print("Praise Hourly Activity")
display(praise_hourly_activity.head())
Pulse Hourly Activity
HOUR TOTAL DURATION (H)
0 0 0.266667
1 5 5.350000
2 7 0.933333
3 8 2.966667
4 9 0.783333
Praise Hourly Activity
HOUR TOTAL DURATION (H)
0 0 2.100000
1 9 14.483333
2 10 1.250000
3 11 13.833333
4 12 6.683333

🔍 5. Keyword Trends in Remarks¶

Goal: See when certain types of tasks (e.g., “backend”, “migrate”) were done. Method: Search REMARKS for specific keywords and track their frequency over time.

In [12]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Define a list of relevant keywords to track
keywords = ["backend", "migrate", "frontend", "design", "status", "testing", "helper", "plan"]

# Function to track keyword frequency by date
def keyword_trends(df, keywords):
    df["DATE"] = pd.to_datetime(df["STARTED AT"]).dt.date
    trend_data = []
    for keyword in keywords:
        keyword_df = df[df["REMARKS"].str.contains(keyword, case=False, na=False)]
        daily_counts = keyword_df.groupby("DATE").size().reset_index(name="COUNT")
        daily_counts["KEYWORD"] = keyword
        trend_data.append(daily_counts)
    return pd.concat(trend_data, ignore_index=True)

# Get trends for both Pulse and Praise timesheets
pulse_keyword_trends = keyword_trends(pulse_df, keywords)
praise_keyword_trends = keyword_trends(praise_df, keywords)

# Display the results to the user
print("Pulse Keyword Trends")
display(pulse_keyword_trends.head())

print("Praise Keyword Trends")
display(praise_keyword_trends.head())
Pulse Keyword Trends
DATE COUNT KEYWORD
0 2025-04-06 1 backend
1 2025-04-06 1 status
2 2025-04-12 2 status
3 2025-04-13 2 status
4 2025-05-05 1 status
Praise Keyword Trends
DATE COUNT KEYWORD
0 2025-05-12 1 backend
1 2025-05-13 3 backend
2 2025-05-14 2 backend
3 2025-05-12 1 status
4 2025-05-12 1 helper

📈 6. Weekday Productivity¶

Goal: Analyze work patterns across days of the week. Method: Extract weekday from STARTED AT and group by it.

In [13]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Extract weekday name from STARTED AT
pulse_df["WEEKDAY"] = pulse_df["STARTED AT"].dt.day_name()
praise_df["WEEKDAY"] = praise_df["STARTED AT"].dt.day_name()

# Group by weekday and sum duration
pulse_weekday_productivity = pulse_df.groupby("WEEKDAY")["DURATION (H)"].sum().reset_index()
praise_weekday_productivity = praise_df.groupby("WEEKDAY")["DURATION (H)"].sum().reset_index()

# Optional: Sort by actual weekday order
weekday_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
pulse_weekday_productivity["WEEKDAY"] = pd.Categorical(pulse_weekday_productivity["WEEKDAY"], categories=weekday_order, ordered=True)
praise_weekday_productivity["WEEKDAY"] = pd.Categorical(praise_weekday_productivity["WEEKDAY"], categories=weekday_order, ordered=True)

pulse_weekday_productivity.sort_values("WEEKDAY", inplace=True)
praise_weekday_productivity.sort_values("WEEKDAY", inplace=True)

# Display the results to the user
print("Pulse Weekday Productivity")
display(pulse_weekday_productivity.head())

print("Praise Weekday Productivity")
display(praise_weekday_productivity.head())
Pulse Weekday Productivity
WEEKDAY DURATION (H)
1 Monday 18.600000
5 Tuesday 1.100000
6 Wednesday 5.683333
4 Thursday 16.400000
0 Friday 10.150000
Praise Weekday Productivity
WEEKDAY DURATION (H)
1 Monday 14.800000
5 Tuesday 6.416667
6 Wednesday 11.533333
4 Thursday 14.833333
0 Friday 11.466667

🔁 7. Task Flow Duration¶

Goal: Identify slow vs fast-moving stages. Method: Compare average DURATION (H) per stage or remark keyword.

In [14]:
# Import libs
import pandas as pd

# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")

# Calculate average duration per STAGE
pulse_stage_avg_duration = pulse_df.groupby("STAGE")["DURATION (H)"].mean().reset_index()
praise_stage_avg_duration = praise_df.groupby("STAGE")["DURATION (H)"].mean().reset_index()

# Rename for clarity
pulse_stage_avg_duration.columns = ["STAGE", "AVG DURATION (H)"]
praise_stage_avg_duration.columns = ["STAGE", "AVG DURATION (H)"]

# Display the results to the user
print("Pulse Stage Avg Duration")
display(pulse_stage_avg_duration.head())

print("Praise Stage Avg Duration")
display(praise_stage_avg_duration.head())
Pulse Stage Avg Duration
STAGE AVG DURATION (H)
0 Analysis 0.283333
1 Implementation 1.472109
2 Testing 1.184722
Praise Stage Avg Duration
STAGE AVG DURATION (H)
0 Analysis 0.433333
1 Design 3.325000
2 Implementation 2.037179
3 Testing 2.141667

🧮 8. Project Comparison¶

Goal: Compare time and cost patterns between “Pulse” and “Praise” projects. Method: Generate side-by-side bar plots of total/average time & cost.

In [15]:
import matplotlib.pyplot as plt

# Compute total and average metrics for Pulse and Praise
summary_comparison = pd.DataFrame({
    "Project": ["Pulse", "Praise"],
    "Total Duration (H)": [pulse_df["DURATION (H)"].sum(), praise_df["DURATION (H)"].sum()],
    "Total Cost ($)": [pulse_df["COST ($)"].sum(), praise_df["COST ($)"].sum()],
    "Average Duration (H)": [pulse_df["DURATION (H)"].mean(), praise_df["DURATION (H)"].mean()],
    "Average Cost ($)": [pulse_df["COST ($)"].mean(), praise_df["COST ($)"].mean()],
})

# Plotting side-by-side bar charts
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Total comparison
summary_comparison.set_index("Project")[["Total Duration (H)", "Total Cost ($)"]].plot.bar(ax=axes[0])
axes[0].set_title("Total Time and Cost Comparison")
axes[0].set_ylabel("Total")

# Average comparison
summary_comparison.set_index("Project")[["Average Duration (H)", "Average Cost ($)"]].plot.bar(ax=axes[1])
axes[1].set_title("Average Time and Cost per Task")
axes[1].set_ylabel("Average")

plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
 

<< Back

Davina's Jupyter Notebooks © Davina Leong, 2025