The aim of this project is to evaluate an approach to accurately recognize a range of user’s activities and report the duration of each activity. For that purpose, tri-axial accelerometer and GPS sensors, made available in all modern smart phones, are used for the classification of four activities: resting, walking, running and driving a car. Time domain features are extracted from the GPS (User’s average speed) and tri-axis accelerometer (means, standard deviations) sensors. Accelerometer raw data is cleaned-up using the Butterworth low pass filter and a Fast Fourier Transform (FFT) is then applied to extract frequency domain features on each axis. Finally, the unsupervised k-means clustering algorithm is implemented for activities recognition and classification from time and frequency domains features. Clusters centroids are made persistent to keep activity learning and constantly improve the activity recognition accuracy.
We have considered in this study four user activities: resting, walking, running and driving a car. These activities are common for all of the users. Also, these activities are not “transient”, occurring a long period of time and then facilitating the further clusterization.
The sensors data have been gathered on a HTC Windows phone 8S from an application developed with the Windows Phone SDK 8.0. Sensors data are gathered during a 6s measurement windows. Accelerometer sampling period is set to 35ms by default, which gives up to 171 measures per axis per measurement window. GPS latitude and longitude are measured each second during the measurement window only if the user moved by more than 10m within this period of time.
We have used the following namespaces for measuring the sensors data:
In order to remove the high frequency noise occurring on the accelerometer axis measurements in the real conditions, we implemented a ButterWorth low pass filter with a cutting frequency set to 100Hz. Also, walking and running activities generates a periodic pattern on the accelerometer axis data within a frequency range from 2Hz to 6Hz while resting and driving a car activities do not match any periodic pattern data on the accelerometer axis. The periodic pattern frequency feature cannot be measured in the time domain hence the use of a Fast Fourier Transform (FFT) applied on the accelerometer sensor raw (or filtered) in order to extract the pattern frequency feature.
We describe in this part the time domain and frequency domain features extracted from the raw data and made available for the classification algorithm once the 6s measurement windows is over.
The average and standard deviation are computed for the 171 measures gathered during the measurement windows. The frequency step for the FFT on the tri-axis accelerometer data is given by the accelerometer sampling rate divided by the number of samples in the FFT result array (which is a power of 2). For instance, with 35ms sampling rate for the accelerometer sensor, we get 171 measures once the 6s measurement window is over. Thus, 256 values in the FFT result array. Therefore, the frequency step is (1/35ms)/256 = 0.111Hz. And the maximum frequency given by the FFT computation is the accelerometer sampling rate = 28.57Hz. So, by estimating the walking/running activity periodic pattern frequency at 10Hz max on the acceleration measures for each axis, the maximum accelerometer sampling rate should not be set above 50ms (Shannon law). Note also that increasing the measurement window will increase the FFT result accuracy but with a highest memory footprint since more sensors measures will be gathered. Beside the fact that such accuracy is unnecessary, increasing measurement window might generate clusters mapping errors as user activity transition can occur (from resting to walking, from walking to running) inside a (long) measurement window. On the other hand, reducing measurement window can prevent clusters mapping error but would degrade FFT results and user speed calculation. So features validity depends on some parameters that have to be carefully tuned depending the activities planned to be classified. We list here after the most important parameters:
The k-means unsupervised classification algorithm has been used to cluster features into categories mapping onto user activities. It is well suited for our purpose since it is fast and we know upfront the amount of clusters which corresponds to the user activities we want to track (resting, walking, running and driving a car). The k-means clustering algorithm computes the mean value of a ten dimension vector (the ten features defined earlier) and computes the Euclidian distance in between this value and the values of each cluster mean value (cluster’s centroid). The vector is assigned to the nearest cluster (with the lowest Euclidian distance). Then, the nearest cluster’s centroid is updated to take into account the new vector it has been assigned to.
As it is an iterative approach, the k-means classification algorithm needs a high amount of input vectors to accurately define cluster’s centroid. For that reason we first need to repeat several times all defined activities until having accurate centroid values for the four clusters. Once done, we manually annotate each cluster with the name of the corresponding activity. Cluster’s centroids have to be persistent in the application in order to enable constant iterative learning. For that purpose, we record inside the phone internal file system the four cluster’s centroid values and the amount of samples used to compute it. Recording the amount of samples used to compute each cluster’s centroid is important to ensure that the cluster’s centroid value is well weighted.
You need a Microsoft Windows developer account in order to be able to deploy this application on a cell phone. (see http://msdn.microsoft.com/en-us/library/windows/apps/hh868184.aspx)
All needed external libraries are already embedded in the project zip file.
Mean and standard deviation features are computed from the accelerometer sensor raw data for each axis. To enable mean and standard deviation computation one need to install MathNet.Numeric package. If needed, to install it, just open the Solution Explorer then right click on “References” and select “Manage NuGet packages”. In the search field, enter “MathNet” and found packages will automatically pop-up. Select Math.NET Numerics and click Install.
Some data are displayed thru real time graphs and charts. This feature requires to install Silverlight Toolkit - Data Visualization (Charting) package. To install this package you have to run the following command in the Visual Studio 2013 Package Manager Console. More information on how to install packages from the Package Manager Console can be found at the following link : http://docs.nuget.org/docs/start-here/using-the-package-manager-console
PM> Install-Package SilverlightToolkit-DataViz
This will install:
Once started, you will get the following user interface:
This interface is mainly designed for a debug purpose. It displays real time data gathered for each measurement window on both accelerometer and GPS. Some sliders are also made available to modify some parameters (although we do not recommend to modify these parameters). The GPS sensor status is also given by a LED (RED : GPS sensor not ready, GREEN : GPS sensor ready).
You have nothing to do. Just take the cell phone and rest, walk, run or drive a car to get sensors data gathered.
Finally, the “statistics” button open-up a new page displaying the overall results So, at the end of the day, you can open-up the statistics window to get a status about your daily activities:
Results are given real time to the user indicating his current activity. More interesting, results are also aggregated in the form of a histogram representing the amount of vectors assigned to each activities. Doing so we are able to provide, on a daily, weekly or monthly basis, the summary of the user activity. For that purpose a naïve algorithm can be used to compute user activity as a ratio of each activity value over the total amount of vectors composing the histogram.
We have developed a methodology based on mobile phone accelerometer and GPS sensors to measure user activity. First results are very encouraging despite some instabilities noticed on the GPS measurements leading to wrong cluster mapping. This phenomena is aggravated by the fact that resting to walking and walking to running vectors are close (from a user speed standpoint of view) and very sensitive to speed (then GPS) measurement accuracy. On the other hand, GPS measures are key when dealing with driving activity. So, we could improve the clustering efficiency by disregarding the speed measurements that are below 15km/h (set to 0km/h) and rely only on the periodic pattern of each axis of the accelerometer to classify resting, walking and running activities. Also, the amount of dimensions used in the k-means vector is quite important making hard to ensure data globularity which is a requirement for the k-means algorithm to properly work [19]. The amount of dimension necessary might be refined by additional experiments.
A preliminary publication of this study can be found here after. This publication would still need to be updated with real sensors results gathered from the cell phone. Having the successful matching rate for each activity would be great. Publication