Warning: To view the interactive 3D visualisation, you must run location_entropy_analysis.ipynb locally and execute the Step 6 cells. The 3D Plotly output is not fully visible from the repository files alone.
View more at Website
This repository contains a notebook-based analysis of per-user location entropy on spatio-temporal mobility traces. The project computes entropy from time-weighted location probabilities, exports ranked user results, and generates explanatory visualizations, including an interactive 3D trajectory view.
The analysis addresses a mobility-entropy assignment: estimate each user’s location entropy,
E = -sum(p(i) * log2(p(i)))
where p(i) is the probability of the user being in location i, then interpret the results and suggest product ideas from the behavioral patterns in the data.
The main workflow lives in location_entropy_analysis.ipynb and is organized into six steps:
The key modeling choice is to use time-weighted dwell share instead of raw GPS point counts. This makes the entropy metric better reflect actual mobility behavior.
location_entropy_analysis.ipynb: main analysis notebookoutputs/stepwise_location_entropy_results.csv: exported ranked per-user resultsquestion.md: original assignment briefThe notebook expects the mobility traces to be placed in:
cabspottingdata/
inside the project root, with files matching:
new_*.txt
The notebook uses:
DATA_DIR = PROJECT_ROOT / "cabspottingdata"OUTPUT_CSV = PROJECT_ROOT / "outputs" / "stepwise_location_entropy_results.csv"Use Python 3 with Jupyter Notebook or JupyterLab.
Install the main dependencies:
pip install pandas matplotlib plotly notebook
cabspottingdata/.location_entropy_analysis.ipynb in Jupyter or VS Code/Cursor.outputs/stepwise_location_entropy_results.csv.To see the interactive 3D trajectory visualisation, you must run the notebook locally and execute the Step 6 cells. The 3D view is produced inside the notebook with Plotly, so it will not be fully visible from the raw repository files alone.
The notebook produces:
The exported CSV includes:
entropynormalized_entropynum_locationstop_location_sharetotal_observed_secondstransitions_usedtransitions_skippedThe notebook shows that some users are highly routine, with most observed time concentrated in a small number of locations, while others spread their time across many locations and exhibit much higher entropy. The combination of entropy values, dominant-location share, density maps, and 3D trajectories helps explain not just how many places users visit, but how evenly their time is distributed across places and across the day.
LIMIT_USERS = None to run the full dataset.GRID_SIZE_DEGREES to make location cells coarser or finer.MAX_GAP_SECONDS to control how long inactive intervals are counted.TRAJECTORY_POINT_LIMIT if the trajectory plots look too dense or too sparse.