摘要
时间序列分类是时间序列数据分析中的重要任务之一.不同于时间序列分析中常用的算法与问题,时间序列分类是要把整个时间序列当作输入,其目的是要赋予这个序列某个离散标记.它比一般分类问题困难,主要在于要分类的时间序列数据不等长,这使得一般的分类算法不能直接应用.即使是等长的时间序列,由于不同序列在相同位置的数值一般不可直接比较,一般的分类算法依然还是不适合直接应用.为了解决这些难点,通常有两种方法:第一,定义合适的距离度量(这里,最常用的距离度量是DTW距离),使得在此度量意义下相近的序列有相同的分类标签,这类方法属于领域无关的方法;第二,首先对时间序列建模(利用序列中前后数据的依赖关系建立模型),再用模型参数组成等长向量来表示每条序列,最后用一般的分类算法进行训练和分类,这类方法属于领域相关的方法.长期以来,研究者往往只倾向于使用其中一种算法,而这两类算法的比较却比较缺乏.文中深入分析了这两类方法,并且分别在不同的合成数据集和实际数据集上比较了两类方法.作者观测到了两类算法在不同因素影响下的性能表现,从而为今后发展新的算法提供了有力依据.
Time series classification or categorization is an important task in time-series analysis. Unlike traditional methods and problem formulations in time-series analysis, time series classification aims to take whole time sequences as input, and produce discrete labels that are assigned to each sequence. Compared to traditional classification problems, time series classification poses additional difficulties. A major difficulty is due to the fact that the time sequences are variable in length, making many traditional classification methods unable to apply directly. Even for sequences of uniform lengths, many methods can still not be applied directly because often the data located at different parts of the sequences are incomparable. Two methods have been tried separately in the past, including distance based methods such as DTW, and model based methods such as Markov models. Using either of these methods as preprocessing steps, a uniform length vector space can be built to enable the classification methods to be applied. In the past, there has been a lack of comparison between these two methods. This paper compares distance and model based methods on several data sets including synthetic and real data sets, to explicate the relative advantages and disadvantages of these methods. This paper presents several key observations on the relative merits of these two methods, and paves the way for further research in developing new methods for time series classification.
出处
《计算机学报》
EI
CSCD
北大核心
2007年第8期1259-1266,共8页
Chinese Journal of Computers
关键词
分类
时间序列
基于模型聚类
马尔可夫模型
统计学习
classification
time series
model based clustering
Markov model
statistical learning