Introduction

This report presents a structured overview of the 11 Fitabase/Bellabeat data files collected between 2016-12-03 and 2016-11-04.
Its goal is to document, for each file:

Overview of the 11 Bellabeat data files
File Temporal granularity Data type Key variables
dailyActivity_merged.csv Daily Overall activity Daily steps, calories, intensity, distance
heartrate_seconds_merged.csv Second Heart rate bpm, timestamp
hourlyCalories_merged.csv Hourly Energy expenditure Calories per hour
hourlyIntensities_merged.csv Hourly Activity intensity Total, light, moderate, very active intensity
hourlySteps_merged.csv Hourly Walking activity Steps per hour
minuteCaloriesNarrow_merged.csv Minute Fine-grained energy expenditure Calories per minute
minuteIntensitiesNarrow_merged.csv Minute Activity intensity Intensity per minute
minuteMETsNarrow_merged.csv Minute Expenditure / effort (METs) METs
minuteSleep_merged.csv Minute Sleep (stages) Sleep level, timestamp
minuteStepsNarrow_merged.csv Minute Steps per minute Steps per minute
weightLogInfo_merged.csv Event / manual entry Weight and BMI Weight, BMI, is_manual

This document serves as a technical reference to guide exploratory data analysis (EDA), data preparation, visualisation, and more advanced analytical work on the Bellabeat dataset.

1 File: dailyActivity_merged.csv

→ Link to profiling report

1.1 General summary

This file provides daily activity measurements in wide format, with one row per user and per date. The dataset is structurally complete, with no missing values. It is directly usable for EDA and for analysing overall trends.

Some potential inconsistencies may exist: sedentary minutes fixed at 1,440, very high active minutes, atypical active distances, and a strong concentration of zero values.

1.2 Data structure

The file is in wide format. The presence of only one row per day and per person, with many attributes stored simultaneously, is characteristic of the wide format.

Each row represents a given user (Id) for a specific day (ActivityDate), and all daily measurements are stored in columns:

  • TotalSteps, TotalDistance, TrackerDistance
  • Distances by activity level
  • Active / sedentary minutes
  • Calories

1.3 Analytical value of the file

This file is central to answering key business and analytics questions:

Understand overall physical activity

  • Number of steps, daily distances, overall intensity.

Study activity levels

  • Minutes spent in each activity zone → useful to segment behaviour.

Analyse calories burned

  • Allows linking energy expenditure ↔︎ activity.

Detect temporal patterns

With ActivityDate, it is possible to analyse:

  • Weekly evolution,
  • Possible seasonality,
  • Behaviour differences on weekdays vs weekends.

↑ Return to the beginning

2 File: heartrate_seconds_merged.csv

→ Link to profiling report

2.1 General summary

The file has a long format: each row corresponds to a per-second measurement with a user identifier, a timestamp, and a heart rate value. The sample is limited to fourteen users, generating more than half a million records.

The data are raw and the temporal coverage varies between users. Values are complete and plausible, but require integrity checks.

This dataset supports fine-grained analysis of heart rate variation, temporal aggregation, and cross-analysis with daily activity.

2.2 Data structure

The file is in long format.

Each row represents a per-second heart rate measurement for a user:

  • Id: user identifier (14 distinct users),
  • Time: timestamp (date + time + second),
  • Value: heart rate (36 to 185 bpm).

This generates a very large volume: 510,597 distinct timestamps.

This long format is suitable for detailed temporal tracking, computing aggregations (per minute / hour / day), and detecting activity or rest patterns.

2.3 Analytical value of the file

Understand daily physiology

Heart rate variations make it possible to observe:

  • Sleep–wake cycles,
  • Activity peaks,
  • Post-exercise recovery,
  • Rest periods.

Essential complement to activity data

By combining this file with dailyActivity_merged.csv, one can analyse:

  • The relationship activity intensity ↔︎ heart rate,
  • Physiological consistency of measured activity levels.

Temporal analysis

The dataset allows:

  • Minute, hourly and daily aggregations,
  • Construction of time series,
  • Circadian analysis (daily biological rhythms).

↑ Return to the beginning

3 File: hourlyCalories_merged.csv

→ Link to profiling report

3.1 General summary

The file has a long format, where each row represents hourly calorie expenditure linked to a user identifier and a timestamp. The sample includes 34 users and more than 700 distinct hourly timestamps.

Hourly coverage varies from one user to another. Integrity checks should focus on the uniqueness of the IdActivityHour pair, value consistency, temporal continuity, and correct parsing of timestamps.

This dataset allows analysis of daily cycles, peaks in energy expenditure, and hourly behaviours, especially when combined with steps or heart rate.

3.2 Data structure

The file is in long format.

Each row corresponds to:

  • A user (Id),
  • A specific hour (ActivityHour, character to be converted to datetime),
  • The calories burned during that hour (Calories).

Key characteristics:

  • 34 distinct users,
  • 755 distinct hourly timestamps,
  • 444 distinct calorie values.

This long format is optimal for hourly temporal analysis: daily patterns, activity peaks, hourly behaviours, etc.

3.3 Analytical value of the file

This data is particularly useful for:

Analysing daily rhythms

  • Peaks in calorie expenditure,
  • Activity habits by hour,
  • Comparison between weekdays and weekends.

Combining with other files

With hourlySteps_merged.csv:

  • Steps → physical effort,
  • Calories → energy expenditure.

With heartrate_seconds_merged.csv:

  • Cross-analyse heart rate and hourly calories,
  • Detect correlations between heart rate and calories burned.

Identifying behaviour patterns

  • Sedentary vs active users,
  • Load of activity throughout the day,
  • Creation of energy-expenditure profiles (grouping users by their daily activity level).

↑ Return to the beginning

4 File: hourlyIntensities_merged.csv

→ Link to profiling report

4.1 General summary

The file has a long format, with each record linking a user identifier, an hourly timestamp, and two aggregated intensity measures.

The dataset is complete, with no missing values, and is based on minute-by-minute scoring aggregated at the hour level. Values are concentrated around low intensity levels, reflecting mainly sedentary behaviour with occasional higher-intensity episodes.

Recommended checks include the uniqueness of IdActivityHour pairs, consistency between TotalIntensity and AverageIntensity, temporal continuity, and identification of outliers.

The file allows analysis of daily rhythms, detection of sedentary or active behaviour, and correlation with steps, calories, or heart rate.

4.2 Data structure

The file is in long format.

Each row represents:

  • A user (Id, 34 people),
  • A specific hour (ActivityHour, character to be converted to datetime),
  • The total activity intensity during that hour (TotalIntensity),
  • The average per-minute intensity (AverageIntensity).

Column profiles show:

  • TotalIntensity ranges from 0 to 180,
  • AverageIntensity ranges from 0 to 3.

The ratio 180 / 3 = 60 indicates that Fitbit encodes intensity minute by minute, likely on a 0–3 scale, then aggregates over 60 minutes.

This file therefore provides an aggregated measure of hourly physical effort.

4.3 Analytical value of the file

This file is central for hourly behavioural analysis:

Understand daily activity patterns

  • Most active hours,
  • Periods of low activity,
  • Morning and evening routines.

Identify sedentary behaviour

  • Many hours with AverageIntensity = 0 → strong sedentariness.

Cross with:

  • hourlySteps_merged.csv → intensity vs number of steps,
  • hourlyCalories_merged.csv → intensity vs energy expenditure,
  • heartrate_seconds_merged.csv → intensity vs heart rate (physiological correlation).

User segmentation

Profile construction:

  • Highly active users: frequent intense hours,
  • Moderately active users: intermittent activity,
  • Sedentary users: almost zero intensity.

Support for circadian-rhythm analysis

Identify:

  • Energy peaks,
  • Rest periods,
  • Indirect sleep–wake patterns (very low intensity at night).

↑ Return to the beginning

5 File: hourlySteps_merged.csv

→ Link to profiling report

5.1 General summary

The file has a long format, where each record corresponds to a user identifier, an hourly timestamp, and a step count. The dataset is complete, with no missing values, and its coverage is similar to the other hourly files. The distribution is heavily concentrated on low step volumes, with a few high-activity peaks.

Recommended integrity checks include the uniqueness of IdActivityHour pairs, continuity of time series, identification of outliers, and consistency with calories and intensity.

This file is useful for analysing hourly routines, detecting sedentary behaviour, and segmenting users, especially when combined with intensity, calorie, or heart-rate data.

5.2 Data structure

The file is in long format.

Each row corresponds to:

  • An Id (user),
  • A specific hour (ActivityHour, stored as text),
  • The number of steps performed during that hour (StepTotal).

Key characteristics from the report:

  • 34 users, as in the other hourly files,
  • 755 distinct hourly timestamps,
  • No missing values.

This event-based structure (“one row = one hour”) is optimal for analysing daily behaviour.

5.3 Analytical value of the file

This file is central to understanding hourly behaviour.

Analyse daily routines

  • Morning or evening activity peaks,
  • Walking during lunch breaks,
  • Extended periods of sedentariness.

User segmentation

  • Highly active users,
  • Moderately active users,
  • Sedentary users.

Cross with:

  • hourlyIntensities_merged.csv → determine whether steps correspond to light or vigorous activity,
  • hourlyCalories_merged.csv → calories burned by hour as a function of steps,
  • dailyActivity_merged.csv → rebuild the daily total from hourly data.

Preparation for strong visualisations

  • Weekly heatmaps (days × hours),
  • Hourly trend charts,
  • Full circadian analysis.

↑ Return to the beginning

6 File: minuteCaloriesNarrow_merged.csv

→ Link to profiling report

6.1 General summary

The file contains minute-by-minute data for 34 users, with around 45,300 records in long format. Each row links a user identifier, a timestamp, and a calorie estimate. Values are complete, with no missing data.

Calories range from 0 to 23 kcal/min, with a median of 1.22. Extreme values may indicate intense activity.

The long format facilitates temporal analysis: minute-level time series, hourly or daily aggregations, peak detection, and comparison between users. Aggregation improves reliability, because minute-level measurements are noisy.

Checks such as uniqueness of the Id + timestamp pair, temporal continuity, and examination of extreme values are required.

When combined with minuteIntensitiesNarrow, minuteStepsNarrow, minuteMETsNarrow, or the hourly/daily files, this dataset supports a comprehensive analysis of behaviour and energy expenditure.

6.2 Data structure

The file is in long (narrow) format: one row = one minute of activity for a user.

It contains three columns:

  • Id,
  • Minute-level timestamp,
  • Calories.

The file covers 34 users over about one month, with ~45,300 timestamped minutes.

The Calories column represents the estimated energy expenditure for each minute as computed by Fitbit.

Statistics from the report:

  • Min = 0,
  • Max = 23.01,
  • Mean = 1.57,
  • Median = 1.22.

Interpretation:

  • Completely inactive minute → around 0–1 kcal,
  • Moderate activity → around 2–5 kcal/min,
  • Intense activity → around 6–10 kcal/min,
  • The maximum of 23 kcal/min suggests very intense effort (to be interpreted with caution).

6.3 Analytical value of the file

Minute-by-minute effort analysis

  • Detect activity peaks,
  • Identify periods of intense effort,
  • Analyse day–night cycles.

Aggregation to higher levels

  • Calories per hour (check against hourlyCalories_merged.csv),
  • Calories per day to validate dailyActivity_merged.csv.

Fine-grained behavioural analysis

  • Daily routine (active vs inactive time),
  • Lifestyle comparison between users,
  • Construction of derived indicators (daily active calories, training load, etc.).

↑ Return to the beginning

7 File: minuteIntensitiesNarrow_merged.csv

→ Link to profiling report

7.1 General summary

This file provides minute-by-minute intensity measurements for 34 users, in long format, with no missing values. Each row links a user identifier, a timestamp, and an intensity score.

The scale has four levels (0 to 3). The distribution is highly unbalanced and dominated by zero intensities, which calls for checks on temporal continuity and the consistency of active minutes.

The dataset is clean and well structured. It enables analysis of activity transitions, circadian patterns, and active minutes, particularly via daily or hourly aggregation. When combined with minuteCaloriesNarrow, minuteMETsNarrow, or dailyActivity_merged.csv, it becomes useful for understanding behaviour and identifying active or sedentary sequences.

7.2 Data structure

The file is in long (narrow) format, i.e. one row = one minute of measurement for a user.

The file contains no missing values, like the other minute-level files.

The Intensity variable follows Fitbit’s minute-level scale:

Intensity Meaning
0 Sedentary / resting
1 Light activity
2 Moderate activity
3 Vigorous activity

This is a derived score, based on movement.

7.3 Analytical value of the file

The minuteIntensitiesNarrow_merged.csv file is one of the most informative for fine-grained temporal analysis.

Identify minute-level behaviour

  • Transitions from rest to activity,
  • Sporadic vs continuous activity,
  • Night-time agitation (potential correlations with minuteSleep_merged.csv).

Detect activity patterns

  • High-resolution circadian analysis,
  • Minute-by-minute heatmaps,
  • Detection of intensity peaks.

Build derived indicators

  • Total active minutes per day,
  • Ratio of active vs sedentary minutes,
  • Duration of continuous active episodes.

Cross-analysis

  • With minuteCaloriesNarrow_merged.csv → intensity ↔︎ energy expenditure,
  • With minuteMETsNarrow_merged.csv → intensity ↔︎ metabolic equivalent,
  • With dailyActivity_merged.csv → daily consolidation,
  • With hourlyIntensities_merged.csv → validation of hourly aggregation.

↑ Return to the beginning

8 File: minuteMETsNarrow_merged.csv

→ Link to profiling report

8.1 General summary

The file groups minute-by-minute measurements for 34 users, with no missing values and three columns (Id, timestamp, METs). The data cover about 45,300 minutes and contain 141 distinct MET values.

MET (Metabolic Equivalent of Task) is usually a physiological unit:

  • 1 MET = basal metabolic rate (rest),
  • 3 METs = moderate walking,
  • 6 METs = light running,
  • > 10 METs = vigorous activity.

The statistics show a highly skewed distribution, with a median of 10 and extreme values up to 189. High METs mainly reflect activity peaks, but some levels are physiologically impossible.

The long format supports detailed temporal analysis: detection of minute-by-minute changes, construction of daily profiles, aggregation into MET-minutes, and identification of active episodes. Limitations include possible overestimation and inconsistency with standard MET definitions.

8.2 Data structure

The file is in long format, with minute-level granularity.

Each row represents the METs (Metabolic Equivalent of Task) value estimated by Fitbit for one minute of activity for a user.

Statistics from the report:

  • Min = 0,
  • Max = 189,
  • Mean = 14.23,
  • Median = 10.

A MET value above 20 is already physiologically unlikely; a MET of 189 is impossible.

The file is clean, complete, and covers 34 users over one month (~45k minutes).

8.3 Analytical value of the file

  • Compute effort zones (rest, light, moderate, vigorous) from METs (after filtering or capping extreme values).
  • Compare users or days using a standardised physiological indicator (METs).
  • Validate consistency between METs, intensity, steps, and calories.
  • Build indicators such as daily MET-minutes (sum of METs over a day), useful for physical-activity recommendations.

↑ Return to the beginning

9 File: minuteSleep_merged.csv

→ Link to profiling report

9.1 General summary

This file is specific within the Bellabeat/Fitbit dataset because it relates to minute-level sleep data, with fine granularity. It has a long format: one row per minute and per user. It contains four columns with no missing values: Id (23 users only), date (minute-level timestamp to convert), value (sleep state coded as 1 = asleep, 2 = restless, 3 = awake), and logId (sleep session/night identifier).

Data are organised into sleep sessions via logId, which allows reconstructing each night and tracking, minute by minute, transitions between sleep, restlessness, and wakefulness. They are suitable for fine-grained temporal analysis of internal sleep structure, fragmentation, and sleep–wake cycles.

The file supports the computation of key indicators (total sleep duration, awake/restless time, number of awakenings, sleep efficiency, bed/awake times) and the study of night-time behaviour and its variability.

Combining it with activity files (steps, intensities, calories) enables analyses of sleep ↔︎ physical activity.

9.2 Data structure

The file is in long format, like all minute-by-minute files.

  • One row = one minute of sleep measurement for a user.

Only 23 users have sleep data (fewer than in the activity and step files).

Important variables:

  • Id: user identifier,

  • date: minute-level timestamp (to convert to POSIXct),

  • value:

    • 1 → Asleep,
    • 2 → Restless,
    • 3 → Awake,
  • logId: sleep session identifier (used to group nights).

This file enables analysis of the internal structure of sleep, minute by minute.

9.3 Analytical value of the file

This file is rich for understanding sleep quality.

Analyse sleep structure

  • Periods of “asleep”, “restless”, and “awake” minute by minute,
  • Sleep fragmentation,
  • Sleep–wake cycles.

Compute key indicators

  • Total sleep duration,
  • Awake / restless duration,
  • Number of nocturnal awakenings,
  • Sleep efficiency (%),
  • Bedtime and wake-up time.

Study behaviours

  • Bedtime and wake-up habits,
  • Night-to-night variability,
  • Impact of physical activity on sleep quality (possible correlations with steps/intensities/calories).

The minuteSleep_merged.csv file is well suited to exploratory analysis, because its minute-level granularity allows observation of nocturnal cycles, awakenings, and phases of restless or awake sleep, and supports analysis of the quality and regularity of users’ sleep.

↑ Return to the beginning

10 File: minuteStepsNarrow_merged.csv

→ Link to profiling report

10.1 General summary

The file contains about 1.4 million rows structured minute by minute for 34 users, with no duplicates and no missing values.

Each record links a timestamp and a step count, with a strongly skewed distribution: the median is zero and high values are relatively rare, confirming that most minutes are inactive. Temporal tracking is continuous and coherent, which allows reconstruction of activity episodes.

Minutes with high step counts are plausible but should be cross-checked with intensity, calories, or METs to validate consistency.

The file is particularly well suited to studying circadian rhythms, active episodes, daily routines, and minute-level cadence, as well as to building hourly or daily aggregations.

10.2 Data structure

Each row represents one minute of activity for a user.

The number of users is consistent: 34, matching the hourly and daily files → multi-file consistency.

Long / narrow format

  • Number of observations: 1,445,040,
  • Around 45,300 distinct timestamps (minute × users over one month),
  • Number of columns: 3 (Id, timestamp, Steps),
  • 0 duplicates,
  • 0 missing values in all columns.

Summary statistics for steps:

Variable Min Max Mean Median
Steps 0 204 4.77 0

Interpretation:

  • Min = 0 → fully inactive minutes,
  • Max = 204 → ~200 steps/min ≈ brisk walking,
  • Mean = 4.77 steps/min → users were globally sedentary,
  • Median = 0 → more than 50% of the time = no steps at all.

The distribution is therefore highly right-skewed, a typical pattern for minute-level physical-activity data.

10.3 Analytical value of the file

The minuteStepsNarrow_merged.csv file is one of the richest and most informative in the Fitbit/Bellabeat dataset.

Key strengths

  • Minute-level granularity,
  • Complete dataset (0 NAs, 0 duplicates),
  • Temporal coherence,
  • High relevance for fine-grained behavioural analysis,
  • Ideal support for visualisations (heatmaps, time series).

It is a key file for exploring minute-by-minute user behaviour. It should be combined with intensity, calories, METs, and sleep to generate robust insights.

↑ Return to the beginning

11 File: weightLogInfo_merged.csv

→ Link to profiling report

11.1 General summary

The file contains spot weight measurements, structured one per row, with associated values such as BMI and body-fat percentage. The dataset is clean in appearance but very sparse: only a handful of users have recorded data, and the measurement frequency is too low to extract reliable trends.

Most columns are empty or self-reported, which considerably limits analytical reliability.

The analytical value is marginal: the sample is too small, temporal variability is almost nonexistent, and the measure depends on manual input, which is often imprecise.

11.2 Data structure

The weightLogInfo_merged.csv file contains weight measurements entered by users.

Format: wide. Each row = one weight measurement.

All descriptive variables are stored in columns, which is appropriate for simple descriptive analysis.

This is the least populated and least exploitable file in the dataset:

  • Very few records overall (often fewer than 70),
  • Only about 8 users have recorded their weight,
  • Many columns are empty or rarely filled (Fat, BMI, etc.).

Timestamps are available and, in theory, could be used to analyse weight change over time, but only if enough data were present (which is not the case here).

11.3 Analytical value of the file

Practical usefulness is very limited in the Bellabeat Coursera context:

  • Occasional BMI evaluation,
  • Possible segmentation of users by weight category, if the data were more complete,
  • Very rough correlation between weight and physical-activity level.

In practice:

  • Too few data points,
  • Too few users,
  • Values often self-reported,
  • No exploitable time series.

↑ Return to the beginning