Timesheet Analyses (Dashboard)¶
📦0. Load Excel Files¶
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
print("Project Pulse")
display(pulse_df.head())
print("Praise Story")
display(praise_df.head())
Project Pulse
STAGE | STARTED AT | ENDED AT | DURATION (H) | COST ($) | REMARKS | |
---|---|---|---|---|---|---|
0 | Analysis | 2025-04-05 18:30:00 | 2025-04-05 19:00:00 | 0.500000 | 13.000000 | Planning |
1 | Analysis | 2025-04-05 23:33:00 | 2025-04-05 23:37:00 | 0.066667 | 1.733333 | Setup environment |
2 | Implementation | 2025-04-06 07:30:00 | 2025-04-06 08:26:00 | 0.933333 | 24.266667 | Setup project |
3 | Implementation | 2025-04-06 08:26:00 | 2025-04-06 10:00:00 | 1.566667 | 40.733333 | Add roles to User module |
4 | Implementation | 2025-04-06 10:43:00 | 2025-04-06 11:36:00 | 0.883333 | 22.966667 | Backend of Project module completed |
Praise Story
STAGE | STARTED AT | ENDED AT | DURATION (H) | COST ($) | REMARKS | |
---|---|---|---|---|---|---|
0 | Analysis | 2025-05-06 20:28:00 | 2025-05-06 21:00:00 | 0.533333 | 13.866667 | Logo |
1 | Analysis | 2025-05-07 11:47:00 | 2025-05-07 12:25:00 | 0.633333 | 16.466667 | Screen Plan |
2 | Analysis | 2025-05-07 12:25:00 | 2025-05-07 12:33:00 | 0.133333 | 3.466667 | Project Plan |
3 | Implementation | 2025-05-12 10:10:00 | 2025-05-12 11:25:00 | 1.250000 | 32.500000 | Testimony migration/Status helper |
4 | Implementation | 2025-05-12 19:45:00 | 2025-05-12 22:04:00 | 2.316667 | 60.233333 | Testimony backend module |
📊 1. Stage-Level Summary¶
Goal: Understand how time and cost are distributed across different project stages. Method: Group by STAGE, then aggregate total DURATION (H) and COST ($).
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Group and aggregate total duration and cost by stage for both timesheets
pulse_summary = pulse_df.groupby("STAGE")[["DURATION (H)", "COST ($)"]].sum().reset_index()
praise_summary = praise_df.groupby("STAGE")[["DURATION (H)", "COST ($)"]].sum().reset_index()
# Display the results to the user
print("Pulse Summary")
display(pulse_summary.head())
print("Praise Summary")
display(praise_summary.head())
Pulse Summary
STAGE | DURATION (H) | COST ($) | |
---|---|---|---|
0 | Analysis | 0.566667 | 14.733333 |
1 | Implementation | 72.133333 | 1875.466667 |
2 | Testing | 14.216667 | 369.633333 |
Praise Summary
STAGE | DURATION (H) | COST ($) | |
---|---|---|---|
0 | Analysis | 1.300000 | 33.800000 |
1 | Design | 6.650000 | 172.900000 |
2 | Implementation | 52.966667 | 1377.133333 |
3 | Testing | 4.283333 | 111.366667 |
🪜 2. Cumulative Burndown¶
Goal: Track overall resource consumption over time. Method: Sort by STARTED AT and compute the cumulative sum of DURATION (H) and COST ($) over time.
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Ensure STARTED AT is in datetime format
pulse_df["STARTED AT"] = pd.to_datetime(pulse_df["STARTED AT"])
praise_df["STARTED AT"] = pd.to_datetime(praise_df["STARTED AT"])
# Sort by STARTED AT
pulse_sorted = pulse_df.sort_values("STARTED AT").copy()
praise_sorted = praise_df.sort_values("STARTED AT").copy()
# Compute cumulative sums
pulse_sorted["CUMULATIVE DURATION (H)"] = pulse_sorted["DURATION (H)"].cumsum()
pulse_sorted["CUMULATIVE COST ($)"] = pulse_sorted["COST ($)"].cumsum()
praise_sorted["CUMULATIVE DURATION (H)"] = praise_sorted["DURATION (H)"].cumsum()
praise_sorted["CUMULATIVE COST ($)"] = praise_sorted["COST ($)"].cumsum()
# Keep only relevant columns for display
pulse_cumulative = pulse_sorted[["STARTED AT", "CUMULATIVE DURATION (H)", "CUMULATIVE COST ($)"]]
praise_cumulative = praise_sorted[["STARTED AT", "CUMULATIVE DURATION (H)", "CUMULATIVE COST ($)"]]
# Display the results to the user
print("Pulse Cumulative")
display(pulse_cumulative.head())
print("Praise Cumulative")
display(praise_cumulative.head())
Pulse Cumulative
STARTED AT | CUMULATIVE DURATION (H) | CUMULATIVE COST ($) | |
---|---|---|---|
0 | 2025-04-05 18:30:00 | 0.500000 | 13.000000 |
1 | 2025-04-05 23:33:00 | 0.566667 | 14.733333 |
2 | 2025-04-06 07:30:00 | 1.500000 | 39.000000 |
3 | 2025-04-06 08:26:00 | 3.066667 | 79.733333 |
4 | 2025-04-06 10:43:00 | 3.950000 | 102.700000 |
Praise Cumulative
STARTED AT | CUMULATIVE DURATION (H) | CUMULATIVE COST ($) | |
---|---|---|---|
0 | 2025-05-06 20:28:00 | 0.533333 | 13.866667 |
1 | 2025-05-07 11:47:00 | 1.166667 | 30.333333 |
2 | 2025-05-07 12:25:00 | 1.300000 | 33.800000 |
3 | 2025-05-12 10:10:00 | 2.550000 | 66.300000 |
4 | 2025-05-12 19:45:00 | 4.866667 | 126.533333 |
📅 3. Daily Activity Trend¶
Goal: Identify which days were the most active. Method: Extract date from STARTED AT, then group by date and sum DURATION (H).
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Extract date from STARTED AT
pulse_df["DATE"] = pulse_df["STARTED AT"].dt.date
praise_df["DATE"] = praise_df["STARTED AT"].dt.date
# Group by date and sum duration
pulse_daily_activity = pulse_df.groupby("DATE")["DURATION (H)"].sum().reset_index()
praise_daily_activity = praise_df.groupby("DATE")["DURATION (H)"].sum().reset_index()
# Rename for clarity
pulse_daily_activity.columns = ["DATE", "TOTAL DURATION (H)"]
praise_daily_activity.columns = ["DATE", "TOTAL DURATION (H)"]
# Display the results to the user
print("Pulse Daily Activity")
display(pulse_daily_activity.head())
print("Praise Daily Activity")
display(praise_daily_activity.head())
Pulse Daily Activity
DATE | TOTAL DURATION (H) | |
---|---|---|
0 | 2025-04-05 | 0.566667 |
1 | 2025-04-06 | 5.433333 |
2 | 2025-04-07 | 0.750000 |
3 | 2025-04-09 | 5.683333 |
4 | 2025-04-10 | 5.733333 |
Praise Daily Activity
DATE | TOTAL DURATION (H) | |
---|---|---|
0 | 2025-05-06 | 0.533333 |
1 | 2025-05-07 | 0.766667 |
2 | 2025-05-12 | 3.566667 |
3 | 2025-05-13 | 1.816667 |
4 | 2025-05-14 | 4.066667 |
🕐 4. Hourly Productivity Heatmap¶
Goal: Determine peak working hours. Method: Extract hour from STARTED AT, then plot frequency or total duration per hour.
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Extract hour from STARTED AT
pulse_df["HOUR"] = pulse_df["STARTED AT"].dt.hour
praise_df["HOUR"] = praise_df["STARTED AT"].dt.hour
# Group by hour and sum duration
pulse_hourly_activity = pulse_df.groupby("HOUR")["DURATION (H)"].sum().reset_index()
praise_hourly_activity = praise_df.groupby("HOUR")["DURATION (H)"].sum().reset_index()
# Rename columns for clarity
pulse_hourly_activity.columns = ["HOUR", "TOTAL DURATION (H)"]
praise_hourly_activity.columns = ["HOUR", "TOTAL DURATION (H)"]
# Display the results to the user
print("Pulse Hourly Activity")
display(pulse_hourly_activity.head())
print("Praise Hourly Activity")
display(praise_hourly_activity.head())
Pulse Hourly Activity
HOUR | TOTAL DURATION (H) | |
---|---|---|
0 | 0 | 0.266667 |
1 | 5 | 5.350000 |
2 | 7 | 0.933333 |
3 | 8 | 2.966667 |
4 | 9 | 0.783333 |
Praise Hourly Activity
HOUR | TOTAL DURATION (H) | |
---|---|---|
0 | 0 | 2.100000 |
1 | 9 | 14.483333 |
2 | 10 | 1.250000 |
3 | 11 | 13.833333 |
4 | 12 | 6.683333 |
🔍 5. Keyword Trends in Remarks¶
Goal: See when certain types of tasks (e.g., “backend”, “migrate”) were done. Method: Search REMARKS for specific keywords and track their frequency over time.
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Define a list of relevant keywords to track
keywords = ["backend", "migrate", "frontend", "design", "status", "testing", "helper", "plan"]
# Function to track keyword frequency by date
def keyword_trends(df, keywords):
df["DATE"] = pd.to_datetime(df["STARTED AT"]).dt.date
trend_data = []
for keyword in keywords:
keyword_df = df[df["REMARKS"].str.contains(keyword, case=False, na=False)]
daily_counts = keyword_df.groupby("DATE").size().reset_index(name="COUNT")
daily_counts["KEYWORD"] = keyword
trend_data.append(daily_counts)
return pd.concat(trend_data, ignore_index=True)
# Get trends for both Pulse and Praise timesheets
pulse_keyword_trends = keyword_trends(pulse_df, keywords)
praise_keyword_trends = keyword_trends(praise_df, keywords)
# Display the results to the user
print("Pulse Keyword Trends")
display(pulse_keyword_trends.head())
print("Praise Keyword Trends")
display(praise_keyword_trends.head())
Pulse Keyword Trends
DATE | COUNT | KEYWORD | |
---|---|---|---|
0 | 2025-04-06 | 1 | backend |
1 | 2025-04-06 | 1 | status |
2 | 2025-04-12 | 2 | status |
3 | 2025-04-13 | 2 | status |
4 | 2025-05-05 | 1 | status |
Praise Keyword Trends
DATE | COUNT | KEYWORD | |
---|---|---|---|
0 | 2025-05-12 | 1 | backend |
1 | 2025-05-13 | 3 | backend |
2 | 2025-05-14 | 2 | backend |
3 | 2025-05-12 | 1 | status |
4 | 2025-05-12 | 1 | helper |
📈 6. Weekday Productivity¶
Goal: Analyze work patterns across days of the week. Method: Extract weekday from STARTED AT and group by it.
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Extract weekday name from STARTED AT
pulse_df["WEEKDAY"] = pulse_df["STARTED AT"].dt.day_name()
praise_df["WEEKDAY"] = praise_df["STARTED AT"].dt.day_name()
# Group by weekday and sum duration
pulse_weekday_productivity = pulse_df.groupby("WEEKDAY")["DURATION (H)"].sum().reset_index()
praise_weekday_productivity = praise_df.groupby("WEEKDAY")["DURATION (H)"].sum().reset_index()
# Optional: Sort by actual weekday order
weekday_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
pulse_weekday_productivity["WEEKDAY"] = pd.Categorical(pulse_weekday_productivity["WEEKDAY"], categories=weekday_order, ordered=True)
praise_weekday_productivity["WEEKDAY"] = pd.Categorical(praise_weekday_productivity["WEEKDAY"], categories=weekday_order, ordered=True)
pulse_weekday_productivity.sort_values("WEEKDAY", inplace=True)
praise_weekday_productivity.sort_values("WEEKDAY", inplace=True)
# Display the results to the user
print("Pulse Weekday Productivity")
display(pulse_weekday_productivity.head())
print("Praise Weekday Productivity")
display(praise_weekday_productivity.head())
Pulse Weekday Productivity
WEEKDAY | DURATION (H) | |
---|---|---|
1 | Monday | 18.600000 |
5 | Tuesday | 1.100000 |
6 | Wednesday | 5.683333 |
4 | Thursday | 16.400000 |
0 | Friday | 10.150000 |
Praise Weekday Productivity
WEEKDAY | DURATION (H) | |
---|---|---|
1 | Monday | 14.800000 |
5 | Tuesday | 6.416667 |
6 | Wednesday | 11.533333 |
4 | Thursday | 14.833333 |
0 | Friday | 11.466667 |
🔁 7. Task Flow Duration¶
Goal: Identify slow vs fast-moving stages. Method: Compare average DURATION (H) per stage or remark keyword.
# Import libs
import pandas as pd
# Load Excel files
pulse_df = pd.read_excel("./data/project_pulse_tracker.xlsx")
praise_df = pd.read_excel("./data/praise_story_tracker.xlsx")
# Calculate average duration per STAGE
pulse_stage_avg_duration = pulse_df.groupby("STAGE")["DURATION (H)"].mean().reset_index()
praise_stage_avg_duration = praise_df.groupby("STAGE")["DURATION (H)"].mean().reset_index()
# Rename for clarity
pulse_stage_avg_duration.columns = ["STAGE", "AVG DURATION (H)"]
praise_stage_avg_duration.columns = ["STAGE", "AVG DURATION (H)"]
# Display the results to the user
print("Pulse Stage Avg Duration")
display(pulse_stage_avg_duration.head())
print("Praise Stage Avg Duration")
display(praise_stage_avg_duration.head())
Pulse Stage Avg Duration
STAGE | AVG DURATION (H) | |
---|---|---|
0 | Analysis | 0.283333 |
1 | Implementation | 1.472109 |
2 | Testing | 1.184722 |
Praise Stage Avg Duration
STAGE | AVG DURATION (H) | |
---|---|---|
0 | Analysis | 0.433333 |
1 | Design | 3.325000 |
2 | Implementation | 2.037179 |
3 | Testing | 2.141667 |
🧮 8. Project Comparison¶
Goal: Compare time and cost patterns between “Pulse” and “Praise” projects. Method: Generate side-by-side bar plots of total/average time & cost.
import matplotlib.pyplot as plt
# Compute total and average metrics for Pulse and Praise
summary_comparison = pd.DataFrame({
"Project": ["Pulse", "Praise"],
"Total Duration (H)": [pulse_df["DURATION (H)"].sum(), praise_df["DURATION (H)"].sum()],
"Total Cost ($)": [pulse_df["COST ($)"].sum(), praise_df["COST ($)"].sum()],
"Average Duration (H)": [pulse_df["DURATION (H)"].mean(), praise_df["DURATION (H)"].mean()],
"Average Cost ($)": [pulse_df["COST ($)"].mean(), praise_df["COST ($)"].mean()],
})
# Plotting side-by-side bar charts
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Total comparison
summary_comparison.set_index("Project")[["Total Duration (H)", "Total Cost ($)"]].plot.bar(ax=axes[0])
axes[0].set_title("Total Time and Cost Comparison")
axes[0].set_ylabel("Total")
# Average comparison
summary_comparison.set_index("Project")[["Average Duration (H)", "Average Cost ($)"]].plot.bar(ax=axes[1])
axes[1].set_title("Average Time and Cost per Task")
axes[1].set_ylabel("Average")
plt.tight_layout()
plt.show()