Automatic recognition of labor activity: a machine learning approach to capture activity physiological patterns using wearable sensors

Hamad Al Jassmi (Emirates Center for Mobility Research, United Arab Emirates University, Alain, United Arab Emirates and Department of Civil and Environmental Engineering, United Arab Emirates University, Alain, United Arab Emirates)

Mahmoud Al Ahmad (Department of Electrical Engineering, United Arab Emirates University, Alain, United Arab Emirates)

Soha Ahmed (Department of Electrical Engineering, United Arab Emirates University, Alain, United Arab Emirates)

Construction Innovation

ISSN: 1471-4175

Article publication date: 29 March 2021

Issue publication date: 21 October 2021

Downloads

1766

pdf (1.6 MB)

Abstract

Purpose

The first step toward developing an automated construction workers performance monitoring system is to initially establish a complete and competent activity recognition solution, which is still lacking. This study aims to propose a novel approach of using labor physiological data collected through wearable sensors as means of remote and automatic activity recognition.

Design/methodology/approach

A pilot study is conducted against three pre-fabrication stone construction workers throughout three full working shifts to test the ability of automatically recognizing the type of activities they perform in-site through their lively measured physiological signals (i.e. blood volume pulse, respiration rate, heart rate, galvanic skin response and skin temperature). The physiological data are broadcasted from wearable sensors to a tablet application developed for this particular purpose, and are therefore used to train and assess the performance of various machine-learning classifiers.

Findings

A promising result of up to 88% accuracy level for activity recognition was achieved by using an artificial neural network classifier. Nonetheless, special care needs to be taken for some activities that evoke similar physiological patterns. It is expected that blending this method with other currently developed camera-based or kinetic-based methods would yield higher activity recognition accuracy levels.

Originality/value

The proposed method complements previously proposed labor tracking methods that focused on monitoring labor trajectories and postures, by using additional rich source of information from labors physiology, for real-time and remote activity recognition. Ultimately, this paves for an automated and comprehensive solution with which construction managers could monitor, control and collect rich real-time data about workers performance remotely.

Keywords

Citation

Al Jassmi, H., Al Ahmad, M. and Ahmed, S. (2021), "Automatic recognition of labor activity: a machine learning approach to capture activity physiological patterns using wearable sensors", Construction Innovation, Vol. 21 No. 4, pp. 555-575. https://doi.org/10.1108/CI-02-2020-0018

Publisher

:

Emerald Publishing Limited

License

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

In construction sites, the simultaneous monitoring and regulation of meaningful data are crucial to detect divergences between preferred and actual performances (Navon, 2005). Nevertheless, manual data collection procedures such as direct observations and in-the-field questionnaires may be costly and do not offer an efficient means to obtain sufficient amount of reliable data punctually (Akhavian and Behzadan, 2015). The quality of manually collected data is significantly low due to human errors and judgment; it also usually suffers from inaccuracy and inconsistency (Cheng et al., 2013; Weerasinghe et al., 2012). Furthermore, at dynamic construction settings where several operations are simultaneously ongoing, monitoring the entire jobsite becomes a challenging task (Sherafat et al., 2020). This calls for alternative automated solutions that are capable of collecting and analysing data in real-time basis about the two most significant resources that govern construction productivity: workers and equipment. Amongst these two, workers can have a significant influence on construction cost and schedules, whereas labor salaries account for 30%–50% of project’s cost (Jang et al., 2011; Jarkas and Bitar, 2012). Yet, despite the several research contributions made in the recent years, the construction industry still lacks a holistic, automated, real-time performance monitoring system that aids management in monitoring worker activities in-site.

A typical data-driven and automated construction monitoring system consists of three sequential levels:

activity recognition;
activity tracking; and
performance monitoring (Sherafat et al., 2020).

Activity recognition involves the development of technology that establishes which type of activity is taking place at any given time. Activity tracking exploit information from the previous level (i.e. recognized activities) to trace workers in different time periods such that the system is responsive in real-time. Performance monitoring, on the other hand, aims to determine the progress of activities against planned schedules and pre-established key performance indicators in which preventive/corrective actions could be made.

It is therefore recognized that the first step toward developing an automated workers performance monitoring system is to initially establish a successful activity recognition solution. Over the past decade, numerous studies improved audio-based, vision-based, kinematic-based methods and/or a blend of these three methods to recognize labor activities. However, the industry still requires a more precise and generalized method to achieve the right accuracy levels for activity recognition. So far, no complete solution is available that could be converted to a commercialized application (Sherafat et al., 2020). This paper proposes a novel approach of using labor physiological data collected through wearable sensors as means of remote and automatic activity recognition, which complements the current body literature by providing additional source of information that helps increasing the accuracy of labor activity recognition classifiers. A pilot study is conducted to test the ability of automatically recognizing the type of activities workers perform in-site through their lively measured physiological signals (i.e. blood volume pulse [BVP], respiration rate [RR], heart rate [HR], galvanic skin response [GSR] and skin temperature [TEMP]) collected through wearable sensors.

The next section will review previous research that aimed for developing labor activity recognition methods in construction.

2. Recent work on workers activity recognition

As mentioned earlier, activity recognition research in construction has generally revolved around three main techniques, as follows:

audio-based;
vision-based; and
kinematic-based methods (Sherafat et al., 2020).

Audio-based methods mainly use microphones to collect audio data and signal processing techniques to clean the collected sound data and extract necessary features (Cheng et al., 2018; Yang et al., 2015; Cao et al., 2016; Sabillon et al., 2020). Construction activities are therefore recognized through machine-learning classifiers such as K-nearest neighbors (KNN) (Aha et al., 1991; Altman, 1992), support vector machines (SVM) (Cho et al., 2017), deep neural networks (Gencoglu et al., 2014) and linear logistic regression (Sumner et al., 2005). Nonetheless, audio-based methods are suited to recognize machinery activities rather than labor activities, as these rely heavily on noises generated in the workplace.

On the other hand, advances in computational capacities and computer-vision techniques have enabled practitioners and researchers to exploit camera sensors to provide semi-real-time information about worker activities at a relatively low cost. Development of camera-based methods involved object detection, object tracking and hence activity recognition (Sherafat et al., 2020). Some studies use cameras as the ground truth against which the algorithm and the output of another type of sensors can be validated (Akhavian and Behzadan, 2016; Cheng et al., 2013), whereas others use them as the main data collecting sensor (Escorcia et al., 2012; Mani et al., 2017; Weerasinghe et al., 2012). For instance, both Weerasinghe et al. (2012) and Escorcia et al. (2012) used color and depth data obtained from the Microsoft Kinect (camera-based) sensor to recognize worker’s activities and track their performance, where the output of the sensor was then processed and fed into an algorithm. Weerasinghe’s et al. (2012) algorithm differentiates between occupational rankings of different site personnel and localize their 3D positions, whereas

84` the Escorcia’s et al. (2012) algorithm tracks workers movements and recognizes their actions based on their pose. The analysis of trunk posture for construction activities was also investigated by Lee et al. (2017b) using ActiGraph GT9X Link and Zephyr BioHarness™3.

Nonetheless, despite their ease of use, camera-based methods have plenty of technical shortcomings that limits their reliability and practicality. Cameras are sensitive to environmental factors, such as illumination, dust, snow, rain and fog, except for thermal cameras; they cannot perform well in the darkness or under direct sunlight (Sherafat et al., 2020). Furthermore, when the recorded scenes are crowded or congested, these methods may fail to handle the high noise levels (Akhavian and Behzadan, 2013; Sherafat et al., 2020). Camera-based methods also require large storage sizes to handle the large size of images and videos data, making them computationally intensive compared to other options (Akhavian and Behzadan, 2013; Sherafat et al., 2020). They may also raise privacy and ethical issues, as workers may feel uncomfortable being continuously monitored (Sherafat et al., 2020). Additionally, to insure full coverage of construction sites a network of cameras is required (Cheng et al., 2017), which may be relatively expensive.

The use of kinematic-based methods such as accelerometers and gyroscopes has also been tested by numerous researchers (Akhavian and Behzadan, 2016; Akhavian and Behzadan, 2018; Cheng et al., 2013; Joshua and Varghese, 2010; Ryu et al., 2019; Valero et al., 2016; Yang et al., 2015), as they may provide a more reliable source of data for labor activity recognition. Early kinematic-based labor activity recognition research efforts may traced back to Joshua and Varghese’s (2010, 2011) work, in which they used single-wired triaxial accelerometers to recognize five of masonry workers activities: fetching bricks, twist laying, fetching mortar, spreading mortar and cutting bricks. More recently, Akhavian and Behzadan (2018) used data collected from accelerometers and gyroscopes attached to labors upper arms to design and test a construction activity recognition system for seven sequential tasks: hammering, turning the wrench, idling, loading wheelbarrows, pushing loaded wheelbarrows, unloading wheelbarrows and pushing unloaded wheelbarrows. Ryu et al. (2019) used sports watch accelerometers attached to masonry workers’ wrists to recognize four of their activities, namely, spreading mortar, carrying and laying bricks, adjusting blocks and removing remaining mortar. Yang et al. (2015) proposed an activity recognition method using smartphone accelerometer and auroscope sensors attached to rebar labors wrists and legs to collect their movement and posture data to evaluate eight of their activities, namely, standing, walking squatting, cleaning template, placing rebar, lashing rebar, welding rebar and cutting rebar.

In general, human activity recognition is a time-series classification problem, which involve typical sequential steps, including sensor readings, data preparation, feature extraction, feature selection, feature annotation, supervised learning and classifier’s model assessment (Sherafat et al., 2020). To achieve optimum activity classification accuracies from data collected through kinematic-based sensors, various machine-learning algorithms were evaluated, such as Naïve Bayes (Joshua and Varghese, 2010), decision trees (Joshua and Varghese, 2010; Akhavian and Behzadan, 2018; Ryu et al., 2019), logistic regression (Joshua and Varghese, 2010; Akhavian and Behzadan, 2018; Ryu et al., 2019), KNN (Akhavian and Behzadan, 2018; Ryu et al., 2019), SVM (Akhavian and Behzadan, 2018; Ryu et al., 2019; Yang et al., 2015), artificial neural networks (Akhavian and Behzadan, 2018) and others.

As mentioned earlier, despite the significant research efforts to develop innovative solutions to automate construction workers’ activity recognition, a comprehensive solution is not yet implemented. The industry still lacks a complete and competent solution that takes into account two major challenges, namely, the heterogeneity of construction activities and the complexity of factors governing the distinction between an activity and another of which machine learning classifiers may employ to generate accurate results. To the authors knowledge, no previous research work have tested the use of physiological signals to train construction labor activity recognition classifiers. Some studies have used physiological wearable sensors to measure workers energy expenditure, metabolic equivalents, physical activity levels, as well as sleep quality (Lee et al., 2017a), but so far there is lack of research examining the use of such rich source of data for construction activity recognition. This study explores the feasibility of using labor physiological patterns (i.e. BVP, RR, HR, GSR and TEMP) as an additional source of information that helps improving the accuracy of machine learning classifiers to recognize construction labor activities.

3. Methodology

The system development phase reported in this paper consists of data collection, data preprocessing and machine learning model development (Figure 1). Recommendations derived from this development phase guide the scaling-up of the proposed method in the form of an internet of things (IoT) system where data is collected in-site through wearable sensors and streamed to a central artificial intelligence unit that automatically recognizes tasks and records data, allowing worker supervisors to monitor worker’s performance remotely.

The following sections explain the execution of each phase on a pilot project examined for the purpose of prototyping this method. The prototype was tested on a sample of four prefabrication light-stone construction workers, who are engaged in mixed tasks within their daily routine. These tasks are, namely, oiling, moving, cleaning and opening stone molds. The system aimed to automatically recognize which of these four tasks a worker was engaged in at real-time basis. Smartly recognized tasks are associated with their recorded cycle durations to form a convenient information gathering method, for productivity control and future schedule forecasting purposes.

3.1 Sampling for prototype testing

The sampled light-stone pre-fabrication factory is located in Alain, UAE and occupies an area of 9260 m². The factory is specialized in pre-fabricating industrial stones. Access to the factory was secured through the factory owner and the operations manager. Some of the production lines in the factory were automated, whereas others were performed conventionally and fully dependent on human manual work. Such manual work (i.e. oiling, moving, cleaning and opening light-stone molds) is the subject of examination in this study. The factory management pointed out that the main reason they decided to invest in replacing human with automatic production lines, despite their high cost, was to raise production capacity. This claim supports the fact that productivity is a major concern for the construction and manufacturing industries.

The decision to conduct the pilot study in this factory stemmed from three reasons. First, most of the tasks conducted in the factory mimic to a great extent the ones conducted in actual construction sites. Second, unlike construction sites, the factory offers a controlled work environment; As such, all variables that may influence productivity except the ones concerned in this study, namely workers physiological statuses, can mainly be considered as fixed. For example, because all work was conducted under the same semi-closed factory area and because the whole study was conducted within the same period (February18 and 22, 2018) where temperatures between morning and noon isn’t significantly varying (Table 1), whether variability was not an issue of concern for interpreting productivity fluctuations. Third, the factory repetitive production environment allows conducting a micro-analysis on a minute-by-minute basis to understand the relation between workers’ physiological signals and their productivity. This stems from the fact that productivity can be measured easily and more precisely on the basis of number of completed units per minute, or otherwise the number of minutes consumed for a unit of production, unlike actual construction site tasks where work is often measured in hours rather than minutes.

For the purpose of this study, activities relevant to manual stone casting were selected. Table 2 shows the sampled activity descriptions and their levels of difficulty as provided by an expert from the site. The workers were engaged in many other manual tasks, but these four were selected as they provide a spectrum of obviously distinct tasks, as well as closely similar tasks (e.g. oiling and cleaning), which suffices to challenge the ability for a machine learning model to differentiate them.(Figure 2 and Table 3).

4. Data acquisition

4.1 Selection of wearable sensors

The set-up shown in Figure 3 was used to acquire the workers’ physiological signals and track their activity. The workers were instructed to wear Zephyr BioharnessTM 3 sensors (belt) during their working hours. This belt recorded electrocardiograms using chest sensors and tri-axial accelerations. Using this sensor, we were able to acquire the worker’s heart rates (HR) and respiration rate (RR). The workers also wore an E4 unit (Empatica Inc.) on their non-dominant wrists. E4 sensor collected acceleration data; BVP, GSR and temperature. Zephyr Bioharness was found to provide practically precise HR and RR data equivalent to a standard laboratory metabolic system (Kim et al., 2013). Empatica E4 unit accuracy and validity for physiological signal tracking was tested and verified by Ragot et al. (2018). The accuracy levels were deemed sufficient, as the purpose of this study was not to report clinical emergencies, rather this work aims to recognize task types based on proximate physiological signal variations.

4.2 Data Acquisition mobile application

An android mobile application was developed for data acquisition and fusion purposes (Figures 4). Data fusion is a technique that syndicates data from various sources to obtain representative synchronized data for specific events (Jang et al., 2011). The mobile application integrates data from real-time physiological sensors (Zephyr BioharnessTM and E4 wristband) that are connected to the mobile application via Bluetooth, together with data from an observers’ manual entry. The data received from the two sensor kits, along with the observers manual input (activity type) are logged at a frequency of 1 Hz (1 data point per second) and saved in the mobile external storage as CSV file. A sample of the raw data is shown in Table 4.

Figure 4 shows a screenshot from the application, which consists of a single view of three sections. The first upper section displays the current real-time physiological readings from the Zephyr BioharnessTM and E4 wristband sensors. The other two sections exhibits an interface for the observer to manually log both the nominal data pertaining workers’ supervision status and the current task/activity. The supervision status is logged through three-choice radio-bottoms (continuous supervision, partial supervision and no supervision), which refers to the extent of which the workers’ direct supervisor is present to monitor his work. The workers’ task/activity is logged through a dropdown list which includes all possible productive and non-productive activities a worker might do during a working day. The productive activities are those listed in Table 2 (e.g. moving stone, opening molds with automatic screwdriver, moving mold, etc.), whereas the non-productive activities are, namely, resting, talking only, standing with supervisor and drinking water. Both are listed under the same dropdown menu. When the observer changes the dropdown selection from one to another, this means that a former activity has ended and a new activity has started. Because the cycles of productive activities investigated in this study are too short, it is unlikely that workers rest or remain idle at the middle of a production cycle. Therefore, when the activity changes from a productive to another non-productive one, this infers that the former has ended and that workers are talking a break before beginning another productive cycle. Table 4 shows an example of how the raw data looks like for 20 seconds.

4.3 Real-time data collection and storage

Initially, the mobile application only automated the logging of the physiological signal data. In fact, in the first and second days of the experiment, the research assistant used to log the worker activity and supervision status manually using a pen and a paper, which was ineffective and inefficient for micro-analysis and inconvenient for the observer. Therefore, at the end of the second day of the experiment, the mobile application was modified to automate the tracking process for the remaining three sampled workers monitored throughout the 3rd, 4th and 5th days. The automatic logging of activities enabled automatic computation of the activity durations precisely, as explained in Section 3.3. While the physiological signal data collected from the first tracked workers were sufficient and complete, the activity type and supervision status of the first tracked worker were missing a significant amount of data and were not as accurate as the logged productivity tracking activity collected from the other three workers using the modified version of the mobile app. Therefore, a decision was made to exclude the first worker from the productivity analysis. Thus, the number of sampled workers used in this study reduced from 4 to 3.

5. Data pre-processing and feature extraction

5.1 Missing values

Furthermore, even after using the improved activity tracking version of the mobile app, we found some missing data points. This missing data was due to the use of two different sensor kits brought from two different manufacturers, and each kit has multiple built-in sensors, which resulted in synchronization problems. For instance, each sensor employs a different Bluetooth device that vary in the Bluetooth coverage range. Thus, one of the sensors might be disconnected at a specific distance when the other is still working and sending data. This was a challenge for the observer holding the android tablet to stand at a reasonably close distant so as not to lose Bluetooth signals from sensors, while also not being too close to not physically disturb workers’ walking freedom during their work. Missing data points due to the sensor synchronization problem were treated by replacing them with previous similar values. Furthermore, portions of the physiological signal data and the productivity tracking activities were not saved appropriately due to certain technical problems with the mobile application. These missing data portions were excluded from the analysis.

5.2 Rescaling and normalization

Rescaling or data normalization is an important data preprocessing step. Nevertheless, before establishing whether to conduct the experiments with normalized data or not, we conducted several experiments with and without normalizing the data and all the results were in favor of the normalized data. The physiological signal data were scaled to fall within the range [0, 1]. Furthermore, the data were rescaled for each worker physiological signal data separately using equation (1):

(1) x′=x−min⁡(x)max⁡(x)−min⁡(x)

Where x is the rescaled value and is the original value.

5.3 Time-domain feature extraction

Table 4 shows the data streamed from sensors at a frequency of 1 Hz (1 data point per second). To covert this data to comparable inputs that can be used to train ML models, time-domain features were extracted. Descriptive statistical information were computed for each physiological signal associated with the specified activity windows. Activity windows are defined in the database (Table 4) as the second when the activity has started till the last second before it was changed to another activity by the data logger. For example, Frames 6–9 listed in Table 4 represent a time window for a “Moving” activity. The derived descriptive statistics included mean, mode, median, maximum (max), minimum (min), variance (var) and standard deviation (std) of physiological signals measured for each time window. This has shrunken the data points of all data collected throughout the days of the experiment from 29,479 rows down to 302 rows.

The number of features eventually used in this study are 30. These are, namely, MeanGSR, ModeGSR, MedianGSR, MaxGSR, MinGSR, VarGSR, MeanTemp, ModeTemp, MedianTemp, MaxTemp, MinTemp, VarTemp, MeanBVR, ModeBVR, MedianBVR, MaxBVR, MinBVR, VarBVR, MeanRR, ModeRR, MedianRR, MaxRR, MinRR, VarRR, MeanHR, ModeHR, MedianHR, MaxHR, MinHR and VarHR.

It is noteworthy that there is no definite answer on what consists a sufficient number of data instances (rows) to train a classification model, which is mainly reliant on the algorithm’s complexity, number of features used (i.e. 30 in our case), and number of classes to be predicted (i.e. 4 in our case). Nonetheless, general guidelines and rules of thumb exist suggesting that a number of 10 times as many data instances as there are features is good enough, which is the case in this study [refer to Raudys and Jain (1991) for more details].

6. Machine learning model development

In this study we attempt to tackle two classification problems using the physiological signals of the worker. The first model is a simpler two-class (Binary) classifier that is intended to recognize whether the worker is conducting a Productive or a Nonproductive activity. The second model is a multi-class classifier that is intended to recognize the productive task type of which the worker is performing. The two models are implemented sequentially. Activities labelled as “Productive” in the former classifier are filtered for the use of the latter multi-class classifier (Figure 1). We could have mixed all data under one five-class model that classifies: Oiling, Opening, Cleaning, Moving and a fifth Nonproductive class. However, test trails showed that filtering Nonproductive activities through an initial binary classifier, and leaving the latter classifier to be dedicated for recognizing productive tasks only, yield more accurate results.

The cleaned and scaled 30 time-domain features extracted from the physiological signals were used to train the ML algorithms, and the training/testing datasets were split using a k-fold (k = 10) cross-validation method. In a k-fold procedure, the entire dataset is shuffled randomly and split into k groups, where one group is held as the testing dataset and the rest is used as the training dataset for an alternating k times, so that each group is given the opportunity of being used as a training set 1 time and used to train the model k-1 times. The model evaluation scores is therefore averaged across the k iterations. In our case, as our sample size is 302 and the selected k value is 10, the training sets would be approximately 30 for each iteration. Section 7.2 reports on the model evaluation results.

6.1 “Productive” vs “nonproductive” activity recognition

The output variables (also referred to as the classes) are Y1 = “Productive,” and Y2 = “Non-productive.” Activity such as “Chatting-only,” “Drinking,” and “Resting” were considered Nonproductive tasks. Other activities (listed in Table 2) such as “Oiling,” “Opening,” “Cleaning,” and “Moving” were annotated as “Productive.” Table 5 shows the class distribution for this classification problem.

6.2 Multi-class activity recognition

The 197 activities labelled as “Productive” by the first classifier were used to perform the task type recognition classification problem. The second classifier was dedicated to recognizing whether the worker is engaged in “Oiling,” “Opening,” “Cleaning,” or “Moving” the light stone molds. Table 6 shows the class distribution for this classification problem.

6.3 Feature reduction

In this paper, four feature selection algorithms were used, namely, info gain attribute evaluation (IGAE), gain ratio attribute evaluation (GRAE), ANOVA and chi-squared statistic. Using correlation-based feature subset selection a subset of features that are highly correlated with the class while having low intercorrelation are preferred (for more information see Hall and Smith, 1998). Correlation attribute evaluation) evaluates the worth of an attribute by measuring the (Pearson's) correlation between itself and the class. A summary and explanation of these techniques are outlined in Karegowda et al. (2010).

We use the top five features ranked by these four feature selection algorithms (GRAE, IGAE, ANOVA and chi-squared) (Tables 7 and 8). All the five physiological signals (RR, TEMP, GSR and BVP) were found to be significant, except HR. In this study we will include all 30 features regardless whether they were deemed to improve classification accuracy or not. While their inclusion do not harm model performance, we need to be more careful when removing any feature for pure optimization purposes. A future study would investigate the possibility of removing the HR sensor to ease the data collection process. Nonetheless, this is beyond the scope of this study.

7. Machine learning model selection and evaluation

7.1 Data visualization

It is helpful to initially visualize the data to observe whether one may distinguish an obvious pattern. The scatter plots in Figures 5 and 6 depict the correlations of selected time domain features. In Figure 5, some of these features (e.g. stdGSR vs stdRR in Figure 5m) clearly show that there is a pattern of which a classifier could be trained to draw a line separating between the two classes. Figure 5m, for example, shows the correlation between the standard deviation of the GSR with the standard deviation of the RR across the timestamp of a particular task. Each point in the figure represents an event of a performed activity, which is labelled as a productive or non-productive activity. Clearly, Figure 5m shows that when both standard deviations of GSR and RR are high, the worker is more likely performing a productive task, and when both standard deviations are low, the worker is more likely performing a non-productive task. From such figure, one may visually inspect whether the classifier would accurately distinguish between classes through these readings. In a scatter plot where data points of both classes are highly intersecting [e.g. Figure 5(e)], a separating line is hardly found and as such and the model will likely be confused assuming it solely relied on those two pairs. Noteworthy that such 2D plot is only an indicator of whether an accurate classification model could be found, as the separating line is mathematically formed in a multi-dimensional space (that cannot be visualized), which utilizes all features fed into the model. Nonetheless, patterns found between few pairs of features [e.g. Figures 5(m) and (i)] suffices to expect that a relatively accurate classification model may be drawn from the data. Based on this concept, from Figure 6 one may expect that the classifier will find difficulty differentiating between some tasks, which may have similar physiological patterns. Section 8 will re-visit the interpretations of those figures in more detail.

7.2 Model evaluation and selection

Tables 9 and 10 report the classification performance of four machine learning algorithms after implementing each of them for multiple iterations to optimize their performance. As these are well-known algorithms, the computations behind their implementation will not be elaborated in this paper, where we will only review their results and evaluate their performance. The reader is referred to (Akhavian and Behzadan, 2016) or other machine learning literature for more details about the theoretical formulations, pros and cons of each algorithm. It is important to note that accuracy alone is not a reliable metric to validate the model performance. The problem is that most machine learning algorithms tend to produce unsatisfactory models when built using imbalanced datasets. Specifically, the classification algorithm will be biased toward the majority class to improve the classification accuracy. For example, a data set with class “A” constituting 95% of the data and class “B” constituting 5% of the data. The classification accuracy might be 95% and this result doesn’t aid in verifying the reliability of the model. Therefore, in this study, following a k-fold (k = 10) cross-validation method to examine testing datasets against training data sets, 5 different performance metrics were reported. These are, namely, area under the curve (AUC), classification accuracy, F1, precision and recall. Explanation of each metric is provided in Japkowicz and Shah (2011). Generally speaking, the most common metrics are classification accuracy ([True Positive + True Negative]/Total), Recall (True Positive/[True Positive + False Negative]) and Precision (True Positive/[True Positive + False Positive]). The recall metric expresses the proportion of actual positives that was classified correctly by the model. In other words, a model that produces no false negatives (e.g. non-productive activities misclassified as productive) has a recall of 1.0. On the other hand, the precision metric expresses the proportion of positives classified by the model and were actually correct. In other words, a model that produces no false positives (e.g. productive activities misclassified as non-productive) has a precision of 1.0. Typically, there is a trade-off between recall and precision metrics, and which metric is favored over the other depends on the nature of the classification problem. In medical research, for example, where the interest is to correctly identify COVID-19 positive cases, one may be more sensitive to avoiding false negative predictions, and such case gives more attention to the recall measure over the precision measure. In this study, the interest is to accurately classify all activities, and as such all metrics are deemed equally important. Another metric reported in this study is the F1 metric, which is simply the harmonic mean of recall and precision.

For both classification problems of this study (Tables 9 and 10), neural network were found to perform better than kNN, SVM and logistic regression. We therefore only show the confusion matrices of neural network classifiers (Tables 11 and 12).

A more visual way to evaluate model performance is through the receiver operating characteristic (ROC) curve. The ROC curve shows the models optimum ability to yield high True Positive rates, while maintaining low False Positive rates, when the threshold that trades-off between those opposing rates are swept across all possible values. In short, the higher the AUC, the more power the model has to predict a certain class. A curve that falls exactly at the diagonal is said to have random performance, while it is considered to be worst that random in case it falls to the bottom-right of the diagonal. In our case, all curves fall at the upper-left of the diagonal, which means that the model is not performing randomly (Figures 7 and 8). A perfect model is that which yield an ROC curve escalating to the very top-left of the graph.

It is clear that our machine learning model performs well at predicting certain classes (e.g. oiling and cleaning) but performs less accurately at predicting other classes (e.g. moving and opening). The reason behind this finding may be because human effort to perform the former tasks are more constant and predictable than the latter, which may involve higher fluctuations in physiological patterns. The normal distributions depicted in Figure 7 would also support this argument.

8. Discussion

Previous studies have either employed remote-monitoring sensors (Cheng et al., 2013; Escorcia et al., 2012; Mani et al., 2017; Weerasinghe et al., 2012) such as cameras and RFIDs; or labor-attached sensors (Akhavian and Behzadan, 2016; Cheng et al., 2013; Lee et al., 2017b; Valero et al., 2016) such as accelerometers, gyroscopes and magnetometers. Remote-monitoring methods are subject to the challenge of installing sufficient number of sensors covering all work site angles to trace workers trajectories and postures. On the other hand, labor-attached sensors provide more depth about worker’s endogenous variables governing their productivity. Nonetheless, to the best of the authors knowledge, there has not been a comprehensive study to automatically recognize labor activities through the acquisition of physiological data (i.e. BVP, RR, HR, GSR and TEMP) of construction workers using wearable sensors. This study complements the current body of literature to use labor physiological patterns as an additional source of information that helps recognizing labor activities as basis for automatic productivity estimation.

As far as activity recognition using machine learning classification algorithms is concerned, the accuracy of the classifier depends mainly on capturing features (i.e. model inputs) that distinct an activity from another. For instance, time domain features extracted from the Blood Volume Pulse (BVP) (Mean BVP, Median BVP, Max BVP, Mode BVP, Std BVP and Min BVP) were found to be the most important features to recognize labor activity (refer to Tables 6 and 7). This does not mean that other sensor readings are not important, but they only add marginally to the classifier’s accuracy compared to BVP. By definition, BVP is the change in volume of blood over a period of time. It can be effected by HR and RR, but interestingly can also be effected by the persons’ mood (Mohanavelu et al., 2017). The multi-dimensionality of this feature may explain why this input has the highest predictive power, but yet more research is needed to understand why BVP levels follow different patterns across different activities, which is beyond the scope of this paper.

The right-bottom distributions in the scatter plots depicted in Figure 6 clearly shows that different MeanBVP values are associated with different activities, except for the “moving” activity that has a flat curve. This means that labors’ MeanBVP varies significantly when performing such activity, and as such the ML classifier fails to capture distinctive MeanBVP levels associated with “moving.” Furthermore, it is also clear from Figure 6 that in most cases, “moving” and “opening” have very similar physiological patterns, making it more confusing for the classifier to distinguish between them, and as such 48% of “moving” activities were mistakenly classified as “opening” (Table 12). This issue does not manifest with “oiling,” were labors seems to be distinctively relaxed when performing this activity, and as such have unique physiological patterns from other examined activities.

Despite of the classifier’s confusion between “moving” and “opening,” a good sign is that none of the “moving” activities were misclassified as “oiling” or “cleaning,” which means that different types of activities certainty have different physiological patterns, but special care needs to be taken for some activities that evoke similar physiological patterns of which the method proposed in this paper fails to address. Such limitation could be tackled by complementing the physiological features with other labor trajectory and posture data used in previous studies collected through accelerometers, gyroscopes, magnetometers, or image-processed data (Akhavian and Behzadan, 2016; Cheng et al., 2013; Escorcia et al., 2012; Lee et al., 2017b; Mani et al., 2017; Weerasinghe et al., 2012; Valero et al., 2016).

Similarly, labor trajectory and posture data, if used by its own to feed a labor activity recognition classifier, may fail to produce accurate results. For example, Akhavian and Behzadan (2016) relied solely on accelerometer and gyroscope data to train a machine learning classifier to recognize the activities of “hammering” and “turning a wrench.” They found that these two activities produce similar movements on the upper arm, where sensors were worn by workers for data collection. It is possible that if the classifier was complemented with physiological patterns data, as proposed in this study, a higher level of activity recognition accuracy would be achieved. Further research is needed to examine the extent of which fusion of data used in this study and previous studies would result in improved labor activity recognition accuracy.

9. Conclusion and limitations

This study contributes to the body of knowledge in construction automation by employing multiple classic classification algorithms to explore the feasibility of using physiological signals (i.e. BVP, RR, HR, GSR and TEMP) as features for remote and automated recognition of labor activities. A data acquisition methodology was proposed in which wearable sensor signals from workers are streamed to an Android mobile app, thereby extracting the data to a central unit where machine learning algorithms are applied. The physiological features that had highest predictive power were identified, and the factors governing the method’s accuracy were highlighted, for further investigation and experimental work.

The proposed method is not an alternative to previously used labor recognition methods that used labor trajectory and posture data but is rather viewed as a complementary to improve activity recognition accuracy, especially for cases where two or more activities have similar movement patterns.

A pilot test of this method conducted against three pre-fabrication stone construction workers throughout three full working shifts, showed a promising result of up to 88% accuracy level for activity-type recognition. The pilot study infers that different tasks are associated with physiological signals’ data patterns; not only because they require different efforts but also because they involve different physical tragedies and thus different breath and heart rate patterns. For instance, the heart rate mean (Mean HR) is higher at tasks entailing higher effort, while the heart rate standard deviation (Std. HR) is higher at tasks entailing fluctuating efforts. Machine learning models should be able to capture such patterns for different tasks using big data sets, as the one we analyzed in this study. It is deemed that the development of accurate labor recognition methodology will foster to a complete IoT and machine learning system through which supervisors could monitor, control and collect rich real-time data about workers productivity remotely.

One limitation of the proposed method is that it is blindly reliant on the machine learning algorithms to make inference about the relationships between physiological signals and task types. For instance, time domain features extracted from BVP were found to have the highest predictive power in recognizing labor activity types. Such relationships may be further studied with the help of biological domain experts to find the rationale underlying these relationships, and perhaps suggest more meaningful physiological signals to improve the accuracies of our machine learning models. Such in-depth study may also examine whether certain physiological signals vary from a person to another so that more robust generalizations could be made about their intrinsic relationships with activity types. In addition, further research is needed to examine the extent of which the fusion of physiological data used in this study, and trajectory and posture data used in previous studies would result in improved labor activity recognition accuracy.

Figures

Figure 1.

System development and implementation methodology

Figure 2.

Workers performing mold oiling, mold opening and mold cleaning tasks

Figure 3.

Worker wearing Bio Harness 3.0 belt and E4 wristband

Figure 4.

Android mobile application for data collection and fusion

Figure 5.

Scatter plot of pattern clusters for idle and not idle activities based time-domain physiological features

Figure 6.

Scatter plot of pattern clusters for four types of activities based time-domain physiological features

Figure 7.

ROC curves for “Productive” and “Non-Productive” classes

Figure 8.

ROC curves for “Oiling,” “Opening,” “Cleaning and “Moving” classes

Table 1.

Whether variations during test days in February

February	18th	19th	20th	21st	22nd
Highest Temperature	36°	27°	23°	26°	24°
Lowest Temperature	21°	20°	20°	18°	18°

Table 2.

“Productive” activities used for the multi-class classification pilot model

ID	Short Description	Description	Difficulty
P1	Oiling	The worker oils the mold to prepare it for casting	Easy
P2	Opening	The worker opens the mold screws using an automatic screwdriver (de-drilling)	Easy
P3	Cleaning	The worker cleans the mold to prepare it for reuse	Medium
P4	Moving	After usage, the worker carries (moves) the empty mold from the fabrication space to a storage space	Medium

Table 3.

“Nonproductive” activities

ID	Short description	Description
N1	Chatting-only	When the worker is idle, and chatting with his colleagues or his supervisor
N2	Drinking	When the worker is idle, drinking water/juice between productive activity cycles
N3	Resting	When the worker is idle, resting between productive activity cycles

Table 4.

Example from the raw data

Frame	Time	GSR	TEMP	BVP	RR	HR	Work Task Start	Output
1	8:06:28	6.84	33.55	−18.39	15.6	123	‘Moving’ stone	Productive
2	8:06:29	6.59	33.55	−45.77	15.6	122		Productive
3	8:06:30	6.98	33.59	70.09	15.0	122		Productive
4	8:06:31	6.98	33.59	−48.23	15.0	121		Productive
5	8:06:32	7.09	33.57	7.87	14.7	120		Productive
6	8:06:33	9.08	33.57	62.42	14.7	120	‘Opening' molds with screwdriver	Productive
7	8:06:34	9.08	33.71	101.58	14.4	118		Productive
8	8:06:35	10.52	33.71	−39.61	14.4	116		Productive
9	8:06:36	9.77	33.73	154.04	14.2	115		Productive
10	8:06:37	10.81	33.45	−100.31	14.4	123	‘Moving molds’	Productive
11	8:06:38	10.43	33.57	−460.11	14.5	124		Productive
12	8:06:39	10.43	33.57	−104.23	14.5	129		Productive
13	8:06:40	11.34	33.47	64.21	16.5	137	‘Opening’ molds by hammer	Productive
14	8:06:41	11.93	33.47	−100.31	16.5	137		Productive
15	8:06:42	9.93	33.55	23.96	17.2	137		Productive
16	8:06:43	9.93	33.55	31.48	17.2	135		Productive
17	8:06:44	10.32	34.11	61.01	20.6	153	‘Chatting-only’	Non-Productive
18	8:06:45	10.32	34.11	−6.31	20.6	154		Non-Productive
19	8:06:46	10.32	34.13	37.73	21.1	155		Non-Productive
20	8:06:47	10.43	34.13	−2.31	21.1	156		Non-Productive
21	8:06:48	10.43	34.15	10.27	19.3	159		Non-Productive

Table 5.

Class distribution for the training data for the “productive” vs “Nonproductive” classification problem

Class	Count
Non-productive	99
Productive	197

Table 6.

Class distribution for the training data for the multi-class classification problem

Class	Count
Oiling	75
Opening	25
Cleaning	30
Moving	67

Table 7.

Features selection algorithms results for the production/non-productive binary classifier

	IGAE	GRAE	ANOVA	Chi-Squared
MinBVP	0.11	0.055	32.70	36.13
StdBVP	0.11	0.054	30.40	35.73
ModeBVP	0.09	0.044	28.62	27.23
MaxBVP	0.07	0.036	26.34	22.25
VarRR	0.04	0.02	20.25	12.25

Table 8.

Features selection algorithms results for the multi-class classifier

	IGAE	GRAE	ANOVA	Chi-Squared
MedianBVP	0.66	0.33	100.41	72.30
MeanBVP	0.53	0.27	70.05	69.90
VarTEMP	0.41	0.20	0.19	0.19
StdTEMP	0.41	0.20	0.29	0.29
MinGSR	0.21	0.11	14.82	36.87

Table 9.

Performance of the production/non-productive binary classifiers

Method	AUC	CA	F1	Precision	Recall
kNN	0.607	0.617	0.607	0.601	0.617
SVM	0.747	0.756	0.753	0.751	0.756
Neural Network	0.783	0.772	0.770	0.769	0.772
Logistic Regression	0.793	0.743	0.742	0.741	0.743

Table 10.

Performance of the multi-class classifiers

Method	AUC	CA	F1	Precision	Recall
kNN	0.817	0.721	0.678	0.604	0.773
SVM	0.960	0.909	0.882	0.870	0.893
Neural Network	0.915	0.893	0.861	0.855	0.867
Logistic Regression	0.813	0.746	0.706	0.632	0.800

Table 11.

Confusion matrix for production/non-productive neural network classifier

	Predicted
	Productive	Non-Productive	Total
Actual
Productive	85.0%	15.5%	197
Non-Productive	36.9%	63.1%	99
Total	207	95	302

Table 12.

Confusion matrix for multi-class neural network classifier

	Predicted
	Oiling	Cleaning	Moving	Opening	Total
Actual
Oiling	88.0%	2.7%	8.0%	1.3%	75
Cleaning	6.7%	80.0%	6.7%	6.7%	30
Moving	11.9%	4.5%	76.1%	7.5%	67
Opening	0.0%	0.0%	52.0%	48.0%	25
Total	76	29	72	20	197

References

Aha, D.W., Kibler, D. and Albert, M.K. (1991), “Instance-based learning algorithms”, Machine Learning, Vol. 6 No. 1, pp. 37-66.

Akhavian, R. and Behzadan, A.H. (2013), “Knowledge-based simulation modeling of construction fleet operations using multimodal-process data mining”, Journal of Construction Engineering and Management, Vol. 139 No. 11, p. 04013021.

Akhavian, R. and Behzadan, A.H. (2015), “Construction equipment activity recognition for simulation input modeling using mobile sensors and machine learning classifiers”, Advanced Engineering Informatics, Vol. 29 No. 4, pp. 867-877, doi: 10.1016/j.aei.2015.03.001.

Akhavian, R. and Behzadan, A.H. (2016), “Smartphone-based construction workers' activity recognition and classification”, Automation in Construction, Vol. 71, pp. 198-209.

Akhavian, R. and Behzadan, A.H. (2018), “Coupling human activity recognition and wearable sensors for data-driven construction simulation”, ITcon, Vol. 23, pp. 1-15.

Altman, N.S. (1992), “An introduction to kernel and nearest-neighbor nonparametric regression”, The American Statistician, Vol. 46 No. 3, pp. 175-185.

Cao, J., Wang, W., Wang, J. and Wang, R. (2016), “Excavation equipment recognition based on novel acoustic statistical features”, IEEE Transactions on Cybernetics, Vol. 47 No. 12, pp. 4392-4404.

Cheng, C.F., Rashidi, A., Davenport, M.A. and Anderson, D.V. (2017), “Activity analysis of construction equipment using audio signals and support vector machines”, Automation in Construction, Vol. 81, pp. 240-253.

Cheng, T., Teizer, J., Migliaccio, G.C. and Gatti, U.C. (2013), “Automated task-level activity analysis through fusion of real time location sensors and worker’s thoracic posture data”, Automation in Construction, Vol. 29, pp. 24-39, doi: 10.1016/j.autcon.2012.08.003.

Cheng, C.F., Rashidi, A., Davenport, M.A., Anderson, D.V. and Sabillon, C. (2018), Software and Hardware Requirements for Audio-Based Analysis of Construction Operations, ASCE, Reston, VA.

Cho, C., Lee, Y. and Zhang, T. (2017), “Sound recognition techniques for multi-layered construction activities and events”, in Computing in Civil Engineering, ASCE, Reston, VA, pp. 326-334, doi: 10.1061 /9780784480847.041.

Escorcia, V., Dávila, M.A., Golparvar-Fard, M. and Niebles, J.C. (2012), “Automated vision-based recognition of construction worker actions for building interior construction operations using RGBD cameras”, Construction Research Congress 2012, pp. 879-888, doi: 10.1061/9780784412329.089.

Gencoglu, O., Virtanen, T. and Huttunen, H. (2014), “Recognition of acoustic events using deep neural networks”, 2014 22nd European signal processing conference (EUSIPCO), IEEE, New York, NY, pp. 506-510.

Hall, M. and Smith, L. (1998), “Feature subset selection: a correlation based filter approach”, Progress in Connectionist-Based Information Systems, Vols 1/2.

Jang, H., Kim, K., Kim, J. and Kim, J. (2011), “Labour productivity model for reinforced concrete construction projects”, Construction Innovation, Vol. 11 No. 1, pp. 92-113, doi: 10.1108/14714171111104655.

Japkowicz, N. and Shah, M. (2011), Evaluating Learning Algorithms: A Classification Perspective, 1st ed., Cambridge University Press.

Jarkas, A. and Bitar, C. (2012), “Factors affecting construction labour productivity in Kuwait”, Journal of Construction Engineering and Management, Vol. 138 No. 7, pp. 811-820, doi: 10.1061/(ASCE)CO.1943-7862.0000501.

Joshua, L. and Varghese, K. (2010), “Construction activity classification using accelerometers”, Construction Research Congress 2010: Innovation for Reshaping Construction Practice, pp. 61-70.

Joshua, L. and Varghese, K. (2011), “Accelerometer-based activity recognition in construction”, Journal of Computing in Civil Engineering, Vol. 25 No. 5, pp. 370-379.

Karegowda, A., Manjunath, A. and Jayaram, M. (2010), “Comparative study of attribute selection using gain ratio and correlation based feature selection”, International Journal of Information Technology and Knowledge Management, Vol. 2 No. 2, pp. 271-277.

Kim, J., Roberge, R., Powell, J., Shafer, A. and Jon Williams, W. (2013), “Measurement accuracy of heart rate and respiratory rate during graded exercise and sustained exercise in the heat using the zephyr BioHarness”, International Journal of Sports Medicine, Vol. 34 No. 6, pp. 497-501, doi: 10.1055/s-0032-1327661.

Lee, W., Lin, K., Seto, E. and Migliaccio, G. (2017a), “Wearable sensors for monitoring on-duty and off-duty worker physiological status and activities in construction”, Automation in Construction, Vol. 83, pp. 341-353, doi: 10.1016/j.autcon.2017.06.012.

Lee, W., Seto, E., Lin, K. and Migliaccio, G. (2017b), “An evaluation of wearable sensors and their placements for analyzing construction worker’s trunk posture in laboratory conditions”, Applied Ergonomics, Vol. 65, pp. 424-436, doi: 10.1016/j.apergo.2017.03.016.

Mani, N., Kisi, K., Rojas, E. and Foster, E. (2017), “Estimating construction labor productivity frontier: pilot study”, Journal of Construction Engineering and Management, Vol. 143 No. 10, p. 04017077, doi: 10.1061/(ASCE)CO.1943-7862.0001390.

Mohanavelu, K., Lamshe, R., Poonguzhali, S., Adalarasu, K. and Jagannath, M. (2017), “Assessment of human fatigue during physical performance using physiological signals: a review”, Biomedical and Pharmacology Journal, Vol. 10 No. 4, pp. 1887-1896.

Navon, R. (2005), “Automated project performance control of construction projects”, Automation in Construction , Vol. 14 No. 4, pp. 467-476, doi: 10.1016/j.autcon.2004.09.006.

Ragot, M., Martin, N., Em, S., Pallamin, N. and Diverrez, J. (2018), Emotion Recognition Using Physiological Signals: Laboratory vs Wearable Sensors, Springer, Cham, pp. 15-22, doi: 10.1007/978-3-319-60639-2_2.

Raudys, S.J. and Jain, A.K. (1991), “Small sample size effects in statistical pattern recognition: recommendations for practitioners”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13 No. 3, pp. 252-264.

Ryu, J., Seo, J., Jebelli, H. and Lee, S. (2019), “Automated action recognition using an accelerometer-embedded wristband-type activity tracker”, Journal of Construction Engineering and Management, Vol. 145 No. 1, p. 04018114.

Sabillon, C., Rashidi, A., Samanta, B., Davenport, M.A. and Anderson, D.V. (2020), “Audio-based bayesian model for productivity estimation of cyclic construction activities”, Journal of Computing in Civil Engineering, Vol. 34 No. 1, p. 04019048.

Sherafat, B., Ahn, C.R., Akhavian, R., Behzadan, A.H., Golparvar-Fard, M., Kim, H. and Azar, E.R. (2020), “Automated methods for activity recognition of construction workers and equipment: state-of-the-art review”, Journal of Construction Engineering and Management, Vol. 146 No. 6, p. 03120002.

Sumner, M., Frank, E. and Hall, M. (2005), “Speeding up logistic model tree induction”, European conference on Principles of Data Mining and Knowledge Discovery, Springer, Berlin, Heidelberg, pp. 675-683.

Valero, E., Sivanathan, A., Bosché, F. and Abdel-Wahab, M. (2016), “Musculoskeletal disorders in construction: a review and a novel system for activity tracking with body area network”, Applied Ergonomics, Vol. 54, pp. 120-130.

Weerasinghe, I., Ruwanpura, J., Boyd, J. and Habib, A. (2012), “Application of microsoft kinect sensor for tracking construction workers”, Construction Research Congress, pp. 858-867, doi: 10.1061/9780784412329.087.

Yang, S., Cao, J. and Wang, J. (2015), “Acoustics recognition of construction equipments based on LPCC features and SVM”, 2015 34th Chinese Control Conference (CCC), IEEE, New York, NY, pp. 3987-3991.

Acknowledgements

This work was financially supported by the Research Affairs Office at UAE University.

Corresponding author

Hamad Al Jassmi can be contacted at: h.aljasmi@uaeu.ac.ae