location_entropy

Warning: To view the interactive 3D visualisation, you must run location_entropy_analysis.ipynb locally and execute the Step 6 cells. The 3D Plotly output is not fully visible from the repository files alone.

View more at Website

Location Entropy Analysis

This repository contains a notebook-based analysis of per-user location entropy on spatio-temporal mobility traces. The project computes entropy from time-weighted location probabilities, exports ranked user results, and generates explanatory visualizations, including an interactive 3D trajectory view.

Project Goal

The analysis addresses a mobility-entropy assignment: estimate each user’s location entropy,

E = -sum(p(i) * log2(p(i)))

where p(i) is the probability of the user being in location i, then interpret the results and suggest product ideas from the behavioral patterns in the data.

What The Notebook Does

The main workflow lives in location_entropy_analysis.ipynb and is organized into six steps:

  1. Explore the raw trace schema.
  2. Load and normalize user traces.
  3. Assign latitude/longitude points to discrete grid cells.
  4. Compute time-weighted location probabilities.
  5. Calculate per-user Shannon entropy and export ranked results.
  6. Generate visualizations for interpretation.

The key modeling choice is to use time-weighted dwell share instead of raw GPS point counts. This makes the entropy metric better reflect actual mobility behavior.

Repository Contents

Data Setup

The notebook expects the mobility traces to be placed in:

cabspottingdata/

inside the project root, with files matching:

new_*.txt

The notebook uses:

Environment

Use Python 3 with Jupyter Notebook or JupyterLab.

Install the main dependencies:

pip install pandas matplotlib plotly notebook

How To Run

  1. Place the trace files in cabspottingdata/.
  2. Open location_entropy_analysis.ipynb in Jupyter or VS Code/Cursor.
  3. Run the notebook from top to bottom.
  4. Check the exported file at outputs/stepwise_location_entropy_results.csv.

Important Note On The 3D Visualisation

To see the interactive 3D trajectory visualisation, you must run the notebook locally and execute the Step 6 cells. The 3D view is produced inside the notebook with Plotly, so it will not be fully visible from the raw repository files alone.

Outputs

The notebook produces:

Main Metrics

The exported CSV includes:

Findings Summary

The notebook shows that some users are highly routine, with most observed time concentrated in a small number of locations, while others spread their time across many locations and exhibit much higher entropy. The combination of entropy values, dominant-location share, density maps, and 3D trajectories helps explain not just how many places users visit, but how evenly their time is distributed across places and across the day.

Example Product Ideas

Notes