Abstract
In this work, we introduce a novel distance metric that describes the distance between sets of events, where events in the most common form are actions that happen at a given time. More generally, an event can be any object that is in an ordered relation to other objects. In our case, an event is a course taken by a student that happens during a specific semester. Calculating the distance uses the difference between the positional relations of all individual events in the set. For this, we do not use the absolute position of events but instead use the sum of differences of the relations before, concurrent, and after to express distance. We describe our metric algorithmically and evaluate it formally as well as exemplary on an existing data set of student exams. We also show that the results of the metric are intuitive to interpret for humans by comparing them to the results of a user study that we ran.This metric can be applied to a range of problems that rely on the positional relation of events by removing the dependency of timestamps for events and replacing them with a set of ordered identifiers. We show a specific application of the metric by tackling the problem of clustering and predicting study paths from university students.
Original language | English |
---|---|
Title of host publication | 2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020) |
Editors | Geoff Webb, Zhongfei Zhang, Vincent S. Tseng, Graham Williams, Michalis Vlachos, Longbing Cao |
Publisher | IEEE |
Pages | 506-515 |
Number of pages | 10 |
ISBN (Electronic) | 978-1-7281-8206-3 |
ISBN (Print) | 978-1-7281-8207-0 |
DOIs | |
Publication status | Published - Oct 2020 |
Austrian Fields of Science 2012
- 102033 Data mining
Keywords
- clustering
- distance metric
- event
- student data
- user study
- Student data
- Distance metric
- User study
- Event
- Clustering