邵俊明
开通时间:..
最后更新时间:..
First, we thank the reviewers for their positive comments and constructive suggestions. The main criticism was that we could have benchmarked better our proposed algorithm (i.e. SyncTree). Therefore, we provide here supplementary materials for supporting the feedback of reviews. Thanks for spending a substantial amount of time looking over it in advance.
Here, we provide two experiments for better answering the questions of reviewers.
Exp. 1. The ability to handle concept drift.
Experimental setup: Here a simple fictitious synthetic data is generated, where the first 1000 points are formed as a Gaussian cluster (Fig. 1a), and three new emering clusters with different shapes are produced with later 200 points (Fig 1b. T2). We check whether different algorithms allows handling the emerging concepts or not. Fig. 1c- 1f plot the micro-clusters stored for the last 200 points. Here we can observe that CF-based algorithms actually difficult to handle the evolving clusters as most instances are wrongly grouped into micro-clusters. In contrast, SyncTree seems good.
![]() |
![]() |
(a) DS (T1) | (b) DS (T2) |
![]() |
![]() |
(c) CluStream (microclusters) |
(d) DenStream (microclusters) |
![]() |
![]() |
(e) SyncTree (microclusters) | (f) ClusTree (microclusters) |
Exp. 2. The memory costs of SyncTree and comparing algorithms.
SyncTree | ClusTree | CluStream | DenStream | |
Spam | 70.897m | 70.983m | 398.820m | 185.303m |
Electricity | 34.696m | 38.563m | 33.423m | 65.571m |
NWeather | 89.889m | 33.423m | 34.708m | 28.283m |
Covtype | 57.484m | 537.556m | 455.347m | 136.142m |
Sensor | 265.824m | 511.837m | 214.144m | 67.619m |
The source code of SyncTree and comparing algorithms. Code