摘要
传统词袋(bag of words,BoW)模型在构造视觉词典时一般采用k-means聚类方法实现,但k-means聚类方法的性能在很大程度上依赖于初始点的选择,从而导致生成的视觉词典鲁棒性较差,此外,每次迭代都要计算数据点与中心点的距离,计算复杂度高。针对上述问题,提出了一种改进的k-means聚类视觉词典构造方法,该方法首先对初始值的选取进行了优化,克服了随机选取初始值对聚类性能的影响,其次基于三角形不等式对计算进行了简化,使生成的视觉词典更加稳定,计算复杂度更低,最后引入权值分布对图像进行基于视觉词典的表示,并将基于改进的视觉词典的词袋模型应用于图像分类,提高了分类性能。通过在Caltech 101和Caltech 256两个数据库进行实验,验证了本文方法的有效性,并分析了词典库大小对分类性能的影响。从实验结果可以看出,采用本文方法所得到的分类正确率提高了5%~8%。
Generally, the k-means clustering method is applied to generate the codebook in bag of word (BoW) model. However, the performance of the k-means clustering method greatly depends on the selection of original centers, which result in less robust codebook. Moreover, the distance between the center point and data point needs to be cal- culated in each iteration, which leads to high calculation complexity. Aiming at this problem, an improved k-means clustering method based on optimized selection of the original center is proposed, which overcomes the influence of randomly selected original center on clustering performance. Triangle inequality is used to simplify the calculation, which makes the generated codebook more robust and makes calculation less complex. At last, a weight contribution based codebook representation method is introduced and the BoW model based on the improved codebook is applied to image categorization, which improves the categorization result. The experiments on Caltech 101 and Caltech 256 databases were carried out, which proves the effectiveness of the proposed method. The effect of codebook size on categorization accuracy is analyzed. The results show that using the proposed method the categorization accuracy is improved by 5% to 8%.
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2012年第10期2380-2386,共7页
Chinese Journal of Scientific Instrument
基金
国家自然科学基金(61077079)
教育部博士点计划基金(20102304110013)资助项目