This report presents a structured overview of the 11
Fitabase/Bellabeat data files collected between
2016-12-03 and 2016-11-04.
Its goal is to document, for each file:
| File | Temporal granularity | Data type | Key variables |
|---|---|---|---|
| dailyActivity_merged.csv | Daily | Overall activity | Daily steps, calories, intensity, distance |
| heartrate_seconds_merged.csv | Second | Heart rate | bpm, timestamp |
| hourlyCalories_merged.csv | Hourly | Energy expenditure | Calories per hour |
| hourlyIntensities_merged.csv | Hourly | Activity intensity | Total, light, moderate, very active intensity |
| hourlySteps_merged.csv | Hourly | Walking activity | Steps per hour |
| minuteCaloriesNarrow_merged.csv | Minute | Fine-grained energy expenditure | Calories per minute |
| minuteIntensitiesNarrow_merged.csv | Minute | Activity intensity | Intensity per minute |
| minuteMETsNarrow_merged.csv | Minute | Expenditure / effort (METs) | METs |
| minuteSleep_merged.csv | Minute | Sleep (stages) | Sleep level, timestamp |
| minuteStepsNarrow_merged.csv | Minute | Steps per minute | Steps per minute |
| weightLogInfo_merged.csv | Event / manual entry | Weight and BMI | Weight, BMI, is_manual |
This document serves as a technical reference to guide exploratory data analysis (EDA), data preparation, visualisation, and more advanced analytical work on the Bellabeat dataset.
This file provides daily activity measurements in wide format, with one row per user and per date. The dataset is structurally complete, with no missing values. It is directly usable for EDA and for analysing overall trends.
Some potential inconsistencies may exist: sedentary minutes fixed at 1,440, very high active minutes, atypical active distances, and a strong concentration of zero values.
The file is in wide format. The presence of only one row per day and per person, with many attributes stored simultaneously, is characteristic of the wide format.
Each row represents a given user (Id)
for a specific day (ActivityDate), and all
daily measurements are stored in columns:
TotalSteps, TotalDistance,
TrackerDistanceCaloriesThis file is central to answering key business and analytics questions:
Understand overall physical activity
Study activity levels
Analyse calories burned
Detect temporal patterns
With ActivityDate, it is possible to analyse:
The file has a long format: each row corresponds to a per-second measurement with a user identifier, a timestamp, and a heart rate value. The sample is limited to fourteen users, generating more than half a million records.
The data are raw and the temporal coverage varies between users. Values are complete and plausible, but require integrity checks.
This dataset supports fine-grained analysis of heart rate variation, temporal aggregation, and cross-analysis with daily activity.
The file is in long format.
Each row represents a per-second heart rate measurement for a user:
Id: user identifier (14 distinct users),Time: timestamp (date + time + second),Value: heart rate (36 to 185 bpm).This generates a very large volume: 510,597 distinct timestamps.
This long format is suitable for detailed temporal tracking, computing aggregations (per minute / hour / day), and detecting activity or rest patterns.
Understand daily physiology
Heart rate variations make it possible to observe:
Essential complement to activity data
By combining this file with dailyActivity_merged.csv,
one can analyse:
Temporal analysis
The dataset allows:
The file has a long format, where each row represents hourly calorie expenditure linked to a user identifier and a timestamp. The sample includes 34 users and more than 700 distinct hourly timestamps.
Hourly coverage varies from one user to another. Integrity checks
should focus on the uniqueness of the
Id–ActivityHour pair, value consistency,
temporal continuity, and correct parsing of timestamps.
This dataset allows analysis of daily cycles, peaks in energy expenditure, and hourly behaviours, especially when combined with steps or heart rate.
The file is in long format.
Each row corresponds to:
Id),ActivityHour,
character to be converted to datetime),Calories).Key characteristics:
This long format is optimal for hourly temporal analysis: daily patterns, activity peaks, hourly behaviours, etc.
This data is particularly useful for:
Analysing daily rhythms
Combining with other files
With hourlySteps_merged.csv:
Steps → physical effort,Calories → energy expenditure.With heartrate_seconds_merged.csv:
Identifying behaviour patterns
The file has a long format, with each record linking a user identifier, an hourly timestamp, and two aggregated intensity measures.
The dataset is complete, with no missing values, and is based on minute-by-minute scoring aggregated at the hour level. Values are concentrated around low intensity levels, reflecting mainly sedentary behaviour with occasional higher-intensity episodes.
Recommended checks include the uniqueness of
Id–ActivityHour pairs, consistency between
TotalIntensity and AverageIntensity, temporal
continuity, and identification of outliers.
The file allows analysis of daily rhythms, detection of sedentary or active behaviour, and correlation with steps, calories, or heart rate.
The file is in long format.
Each row represents:
Id, 34 people),ActivityHour,
character to be converted to datetime),TotalIntensity),AverageIntensity).Column profiles show:
TotalIntensity ranges from 0 to
180,AverageIntensity ranges from 0 to
3.The ratio 180 / 3 = 60 indicates that Fitbit encodes intensity minute by minute, likely on a 0–3 scale, then aggregates over 60 minutes.
This file therefore provides an aggregated measure of hourly physical effort.
This file is central for hourly behavioural analysis:
Understand daily activity patterns
Identify sedentary behaviour
AverageIntensity = 0 → strong
sedentariness.Cross with:
hourlySteps_merged.csv → intensity vs
number of steps,hourlyCalories_merged.csv → intensity
vs energy expenditure,heartrate_seconds_merged.csv →
intensity vs heart rate (physiological correlation).User segmentation
Profile construction:
Support for circadian-rhythm analysis
Identify:
The file has a long format, where each record corresponds to a user identifier, an hourly timestamp, and a step count. The dataset is complete, with no missing values, and its coverage is similar to the other hourly files. The distribution is heavily concentrated on low step volumes, with a few high-activity peaks.
Recommended integrity checks include the uniqueness of
Id–ActivityHour pairs, continuity of time
series, identification of outliers, and consistency with calories and
intensity.
This file is useful for analysing hourly routines, detecting sedentary behaviour, and segmenting users, especially when combined with intensity, calorie, or heart-rate data.
The file is in long format.
Each row corresponds to:
ActivityHour, stored
as text),StepTotal).Key characteristics from the report:
This event-based structure (“one row = one hour”) is optimal for analysing daily behaviour.
This file is central to understanding hourly behaviour.
Analyse daily routines
User segmentation
Cross with:
hourlyIntensities_merged.csv →
determine whether steps correspond to light or vigorous activity,hourlyCalories_merged.csv → calories
burned by hour as a function of steps,dailyActivity_merged.csv → rebuild the
daily total from hourly data.Preparation for strong visualisations
The file contains minute-by-minute data for 34 users, with around 45,300 records in long format. Each row links a user identifier, a timestamp, and a calorie estimate. Values are complete, with no missing data.
Calories range from 0 to 23 kcal/min, with a median of 1.22. Extreme values may indicate intense activity.
The long format facilitates temporal analysis: minute-level time series, hourly or daily aggregations, peak detection, and comparison between users. Aggregation improves reliability, because minute-level measurements are noisy.
Checks such as uniqueness of the Id + timestamp pair,
temporal continuity, and examination of extreme values are required.
When combined with minuteIntensitiesNarrow,
minuteStepsNarrow, minuteMETsNarrow, or the
hourly/daily files, this dataset supports a comprehensive analysis of
behaviour and energy expenditure.
The file is in long (narrow) format: one row = one minute of activity for a user.
It contains three columns:
Id,Calories.The file covers 34 users over about one month, with ~45,300 timestamped minutes.
The Calories column represents the estimated
energy expenditure for each minute as computed by Fitbit.
Statistics from the report:
Interpretation:
Minute-by-minute effort analysis
Aggregation to higher levels
hourlyCalories_merged.csv),dailyActivity_merged.csv.Fine-grained behavioural analysis
This file provides minute-by-minute intensity measurements for 34 users, in long format, with no missing values. Each row links a user identifier, a timestamp, and an intensity score.
The scale has four levels (0 to 3). The distribution is highly unbalanced and dominated by zero intensities, which calls for checks on temporal continuity and the consistency of active minutes.
The dataset is clean and well structured. It enables analysis of
activity transitions, circadian patterns, and active minutes,
particularly via daily or hourly aggregation. When combined with
minuteCaloriesNarrow, minuteMETsNarrow, or
dailyActivity_merged.csv, it becomes useful for
understanding behaviour and identifying active or sedentary
sequences.
The file is in long (narrow) format, i.e. one row = one minute of measurement for a user.
The file contains no missing values, like the other minute-level files.
The Intensity variable follows Fitbit’s minute-level
scale:
| Intensity | Meaning |
|---|---|
| 0 | Sedentary / resting |
| 1 | Light activity |
| 2 | Moderate activity |
| 3 | Vigorous activity |
This is a derived score, based on movement.
The minuteIntensitiesNarrow_merged.csv file is one of
the most informative for fine-grained temporal analysis.
Identify minute-level behaviour
minuteSleep_merged.csv).Detect activity patterns
Build derived indicators
Cross-analysis
minuteCaloriesNarrow_merged.csv → intensity ↔︎
energy expenditure,minuteMETsNarrow_merged.csv → intensity ↔︎
metabolic equivalent,dailyActivity_merged.csv → daily
consolidation,hourlyIntensities_merged.csv → validation of
hourly aggregation.The file groups minute-by-minute measurements for 34 users, with no
missing values and three columns (Id, timestamp,
METs). The data cover about 45,300 minutes and contain 141
distinct MET values.
MET (Metabolic Equivalent of Task) is usually a physiological unit:
The statistics show a highly skewed distribution, with a median of 10 and extreme values up to 189. High METs mainly reflect activity peaks, but some levels are physiologically impossible.
The long format supports detailed temporal analysis: detection of minute-by-minute changes, construction of daily profiles, aggregation into MET-minutes, and identification of active episodes. Limitations include possible overestimation and inconsistency with standard MET definitions.
The file is in long format, with minute-level granularity.
Each row represents the METs (Metabolic Equivalent of
Task) value estimated by Fitbit for one minute of activity for a
user.
Statistics from the report:
A MET value above 20 is already physiologically unlikely; a MET of 189 is impossible.
The file is clean, complete, and covers 34 users over one month (~45k minutes).
This file is specific within the Bellabeat/Fitbit dataset because it
relates to minute-level sleep data, with fine
granularity. It has a long format: one row per minute and per user. It
contains four columns with no missing values: Id (23 users
only), date (minute-level timestamp to convert),
value (sleep state coded as 1 = asleep, 2 = restless, 3 =
awake), and logId (sleep session/night identifier).
Data are organised into sleep sessions via logId, which
allows reconstructing each night and tracking, minute by minute,
transitions between sleep, restlessness, and wakefulness. They are
suitable for fine-grained temporal analysis of internal sleep structure,
fragmentation, and sleep–wake cycles.
The file supports the computation of key indicators (total sleep duration, awake/restless time, number of awakenings, sleep efficiency, bed/awake times) and the study of night-time behaviour and its variability.
Combining it with activity files (steps, intensities, calories) enables analyses of sleep ↔︎ physical activity.
The file is in long format, like all minute-by-minute files.
Only 23 users have sleep data (fewer than in the activity and step files).
Important variables:
Id: user identifier,
date: minute-level timestamp (to convert to
POSIXct),
value:
logId: sleep session identifier (used to group
nights).
This file enables analysis of the internal structure of sleep, minute by minute.
This file is rich for understanding sleep quality.
Analyse sleep structure
Compute key indicators
Study behaviours
The minuteSleep_merged.csv file is well suited to
exploratory analysis, because its minute-level granularity allows
observation of nocturnal cycles, awakenings, and phases of restless or
awake sleep, and supports analysis of the quality and regularity of
users’ sleep.
The file contains about 1.4 million rows structured minute by minute for 34 users, with no duplicates and no missing values.
Each record links a timestamp and a step count, with a strongly skewed distribution: the median is zero and high values are relatively rare, confirming that most minutes are inactive. Temporal tracking is continuous and coherent, which allows reconstruction of activity episodes.
Minutes with high step counts are plausible but should be cross-checked with intensity, calories, or METs to validate consistency.
The file is particularly well suited to studying circadian rhythms, active episodes, daily routines, and minute-level cadence, as well as to building hourly or daily aggregations.
Each row represents one minute of activity for a user.
The number of users is consistent: 34, matching the hourly and daily files → multi-file consistency.
Long / narrow format
Id, timestamp,
Steps),Summary statistics for steps:
| Variable | Min | Max | Mean | Median |
|---|---|---|---|---|
| Steps | 0 | 204 | 4.77 | 0 |
Interpretation:
The distribution is therefore highly right-skewed, a typical pattern for minute-level physical-activity data.
The minuteStepsNarrow_merged.csv file is one of the
richest and most informative in the Fitbit/Bellabeat dataset.
Key strengths
It is a key file for exploring minute-by-minute user behaviour. It should be combined with intensity, calories, METs, and sleep to generate robust insights.
The file contains spot weight measurements, structured one per row, with associated values such as BMI and body-fat percentage. The dataset is clean in appearance but very sparse: only a handful of users have recorded data, and the measurement frequency is too low to extract reliable trends.
Most columns are empty or self-reported, which considerably limits analytical reliability.
The analytical value is marginal: the sample is too small, temporal variability is almost nonexistent, and the measure depends on manual input, which is often imprecise.
The weightLogInfo_merged.csv file contains weight
measurements entered by users.
Format: wide. Each row = one weight measurement.
All descriptive variables are stored in columns, which is appropriate for simple descriptive analysis.
This is the least populated and least exploitable file in the dataset:
Fat,
BMI, etc.).Timestamps are available and, in theory, could be used to analyse weight change over time, but only if enough data were present (which is not the case here).
Practical usefulness is very limited in the Bellabeat Coursera context:
In practice: