定位技术的飞速发展催生了时空轨迹大数据,轨迹数据中往往存在着明显偏离轨迹的异常点。检测出轨迹中的异常点对提高数据质量和后续轨迹数据挖掘精度至关重要。该文提出了一种基于双向长短时记忆网络(Bidirectional Long Short-Term Mem...定位技术的飞速发展催生了时空轨迹大数据,轨迹数据中往往存在着明显偏离轨迹的异常点。检测出轨迹中的异常点对提高数据质量和后续轨迹数据挖掘精度至关重要。该文提出了一种基于双向长短时记忆网络(Bidirectional Long Short-Term Memory, Bi-LSTM)模型的轨迹异常点检测算法。首先对每个轨迹点提取一个6维的运动特征向量,然后构建了一个Bi-LSTM模型,模型输入为一定序列长度的轨迹数据特征向量,输出为轨迹点的类型结果。同时,算法采用了欠采样和过采样的组合方法缓解类别不平衡对检测性能的影响。融合了长短时记忆网络单元和双向网络,Bi-LSTM模型能够自动学习正常点和邻近异常点在运动特征上的差异。基于真实船舶轨迹标注数据的实验结果表明,该文算法的检测性能显著优于恒定速度阈值法、不考虑数据时序性的经典机器学习分类算法和卷积神经网络模型,尤其是召回率达到了0.902,验证了该文算法的有效性。展开更多
背景与目的:结直肠癌的发生、发展涉及多个癌基因的激活和抑癌基因的失活,野生型R-脊椎蛋白3(R-spondin 3,RSPO3)在结直肠癌生长中的作用目前尚不清楚,本研究旨在探讨RSPO3对结直肠癌生长的影响并探索其潜在机制。方法:采用生物信息学分...背景与目的:结直肠癌的发生、发展涉及多个癌基因的激活和抑癌基因的失活,野生型R-脊椎蛋白3(R-spondin 3,RSPO3)在结直肠癌生长中的作用目前尚不清楚,本研究旨在探讨RSPO3对结直肠癌生长的影响并探索其潜在机制。方法:采用生物信息学分析RSPO3在结直肠癌及泛癌组织中的表达,分析结直肠癌中RSPO3表达与自然杀伤(natural killer,NK)细胞浸润、NK细胞激活分子表达的相关性。利用短发夹RNA(short hairpin RNA,shRNA)和慢病毒感染建立RSPO3敲减的SW480-RSPO3-KD细胞株、RSPO3过表达的HCT116-RSPO3-OE细胞株及相应的对照细胞株。采用细胞计数试剂盒-8(cell counting kit-8,CCK-8)检测体外各稳定转染细胞株的细胞增殖。采用流式细胞术分析各稳定转染细胞株的细胞周期、裸小鼠脾脏和移植瘤组织中NK细胞的比例。通过裸小鼠皮下移植瘤模型观察RSPO3敲减或过表达的结肠癌细胞在裸小鼠体内的生长。利用双荧光素酶报告基因系统检测RSPO3敲减或过表达对结肠癌Wnt基因转录活性的影响。结果:生物信息学分析显示,RSPO3在多种实体瘤肿瘤组织包括结直肠癌组织中的表达显著低于相应的癌旁组织。RSPO3敲减或过表达不影响体外SW480和HCT116结肠癌细胞的增殖(P>0.05)和细胞周期(P>0.05)。但在裸小鼠体内,与对照细胞相比,RSPO3敲减显著促进SW480细胞移植瘤的生长(260.2±162.4 vs 1311.7±570.1,P<0.05),而RSPO3过表达则显著抑制HCT116细胞移植瘤的生长(1549.0±241.2 vs 512.1±250.0,P<0.05)。流式细胞术分析发现,在荷移植瘤裸小鼠体内,RSPO3敲减显著减少了脾脏和移植瘤组织中NK细胞的比例(脾脏:6.42±0.94 vs 5.25±0.59,P=0.04;移植瘤:8.27±0.29 vs 6.48±1.48,P=0.04);而RSPO3过表达显著增加了脾脏和移植瘤组织中NK细胞的比例(脾脏:5.29±0.16 vs 7.02±0.49,P=0.01;移植瘤:6.39±0.39 vs 8.14±0.34,P<0.05)。癌症基因组图谱(The Cancer Genome Atlas,TCGA)数据相关性分析显示,RSPO3表达与NK细胞表面标志物CD56(r=0.58,P<0.05)和CD16(r=0.64,P<0.05)的表达显著正相关,并与NK细胞激活标志物CD69(r=0.51,P<0.05)和KLRB1(r=0.37,P<0.05)的表达显著正相关。双荧光素酶报告基因实验结果显示,RSPO3敲减后Wnt荧光素酶活性下调(1.0±0.0 vs 0.45±0.09,P<0.05),而RSPO3过表达后Wnt荧光素酶活性上调(1.0±0.0 vs 1.75±0.14,P<0.05)。结论:RSPO3能在体内显著抑制结直肠癌移植瘤的生长,并能增加移植瘤组织中NK细胞浸润,RSPO3是一个潜在的结直肠癌的抑制基因。展开更多
Topic models such as Latent Dirichlet Allocation(LDA) have been successfully applied to many text mining tasks for extracting topics embedded in corpora. However, existing topic models generally cannot discover bursty...Topic models such as Latent Dirichlet Allocation(LDA) have been successfully applied to many text mining tasks for extracting topics embedded in corpora. However, existing topic models generally cannot discover bursty topics that experience a sudden increase during a period of time. In this paper, we propose a new topic model named Burst-LDA, which simultaneously discovers topics and reveals their burstiness through explicitly modeling each topic's burst states with a first order Markov chain and using the chain to generate the topic proportion of documents in a Logistic Normal fashion. A Gibbs sampling algorithm is developed for the posterior inference of the proposed model. Experimental results on a news data set show our model can efficiently discover bursty topics, outperforming the state-of-the-art method.展开更多
This paper presents a non-parametric topic model that captures not only the latent topics in text collections, but also how the topics change over space. Unlike other recent work that relies on either Gaussian assumpt...This paper presents a non-parametric topic model that captures not only the latent topics in text collections, but also how the topics change over space. Unlike other recent work that relies on either Gaussian assumptions or discretization of locations, here topics are associated with a distance dependent Chinese Restaurant Process(ddC RP), and for each document, the observed words are influenced by the document's GPS-tag. Our model allows both unbound number and flexible distribution of the geographical variations of the topics' content. We develop a Gibbs sampler for the proposal, and compare it with existing models on a real data set basis.展开更多
文摘定位技术的飞速发展催生了时空轨迹大数据,轨迹数据中往往存在着明显偏离轨迹的异常点。检测出轨迹中的异常点对提高数据质量和后续轨迹数据挖掘精度至关重要。该文提出了一种基于双向长短时记忆网络(Bidirectional Long Short-Term Memory, Bi-LSTM)模型的轨迹异常点检测算法。首先对每个轨迹点提取一个6维的运动特征向量,然后构建了一个Bi-LSTM模型,模型输入为一定序列长度的轨迹数据特征向量,输出为轨迹点的类型结果。同时,算法采用了欠采样和过采样的组合方法缓解类别不平衡对检测性能的影响。融合了长短时记忆网络单元和双向网络,Bi-LSTM模型能够自动学习正常点和邻近异常点在运动特征上的差异。基于真实船舶轨迹标注数据的实验结果表明,该文算法的检测性能显著优于恒定速度阈值法、不考虑数据时序性的经典机器学习分类算法和卷积神经网络模型,尤其是召回率达到了0.902,验证了该文算法的有效性。
文摘背景与目的:结直肠癌的发生、发展涉及多个癌基因的激活和抑癌基因的失活,野生型R-脊椎蛋白3(R-spondin 3,RSPO3)在结直肠癌生长中的作用目前尚不清楚,本研究旨在探讨RSPO3对结直肠癌生长的影响并探索其潜在机制。方法:采用生物信息学分析RSPO3在结直肠癌及泛癌组织中的表达,分析结直肠癌中RSPO3表达与自然杀伤(natural killer,NK)细胞浸润、NK细胞激活分子表达的相关性。利用短发夹RNA(short hairpin RNA,shRNA)和慢病毒感染建立RSPO3敲减的SW480-RSPO3-KD细胞株、RSPO3过表达的HCT116-RSPO3-OE细胞株及相应的对照细胞株。采用细胞计数试剂盒-8(cell counting kit-8,CCK-8)检测体外各稳定转染细胞株的细胞增殖。采用流式细胞术分析各稳定转染细胞株的细胞周期、裸小鼠脾脏和移植瘤组织中NK细胞的比例。通过裸小鼠皮下移植瘤模型观察RSPO3敲减或过表达的结肠癌细胞在裸小鼠体内的生长。利用双荧光素酶报告基因系统检测RSPO3敲减或过表达对结肠癌Wnt基因转录活性的影响。结果:生物信息学分析显示,RSPO3在多种实体瘤肿瘤组织包括结直肠癌组织中的表达显著低于相应的癌旁组织。RSPO3敲减或过表达不影响体外SW480和HCT116结肠癌细胞的增殖(P>0.05)和细胞周期(P>0.05)。但在裸小鼠体内,与对照细胞相比,RSPO3敲减显著促进SW480细胞移植瘤的生长(260.2±162.4 vs 1311.7±570.1,P<0.05),而RSPO3过表达则显著抑制HCT116细胞移植瘤的生长(1549.0±241.2 vs 512.1±250.0,P<0.05)。流式细胞术分析发现,在荷移植瘤裸小鼠体内,RSPO3敲减显著减少了脾脏和移植瘤组织中NK细胞的比例(脾脏:6.42±0.94 vs 5.25±0.59,P=0.04;移植瘤:8.27±0.29 vs 6.48±1.48,P=0.04);而RSPO3过表达显著增加了脾脏和移植瘤组织中NK细胞的比例(脾脏:5.29±0.16 vs 7.02±0.49,P=0.01;移植瘤:6.39±0.39 vs 8.14±0.34,P<0.05)。癌症基因组图谱(The Cancer Genome Atlas,TCGA)数据相关性分析显示,RSPO3表达与NK细胞表面标志物CD56(r=0.58,P<0.05)和CD16(r=0.64,P<0.05)的表达显著正相关,并与NK细胞激活标志物CD69(r=0.51,P<0.05)和KLRB1(r=0.37,P<0.05)的表达显著正相关。双荧光素酶报告基因实验结果显示,RSPO3敲减后Wnt荧光素酶活性下调(1.0±0.0 vs 0.45±0.09,P<0.05),而RSPO3过表达后Wnt荧光素酶活性上调(1.0±0.0 vs 1.75±0.14,P<0.05)。结论:RSPO3能在体内显著抑制结直肠癌移植瘤的生长,并能增加移植瘤组织中NK细胞浸润,RSPO3是一个潜在的结直肠癌的抑制基因。
基金Supported by the National High Technology Research and Development Program of China(No.2012AA011005)
文摘Topic models such as Latent Dirichlet Allocation(LDA) have been successfully applied to many text mining tasks for extracting topics embedded in corpora. However, existing topic models generally cannot discover bursty topics that experience a sudden increase during a period of time. In this paper, we propose a new topic model named Burst-LDA, which simultaneously discovers topics and reveals their burstiness through explicitly modeling each topic's burst states with a first order Markov chain and using the chain to generate the topic proportion of documents in a Logistic Normal fashion. A Gibbs sampling algorithm is developed for the posterior inference of the proposed model. Experimental results on a news data set show our model can efficiently discover bursty topics, outperforming the state-of-the-art method.
基金Supported by National High Technology Research and Development Program of China(No.2012AA011005)
文摘This paper presents a non-parametric topic model that captures not only the latent topics in text collections, but also how the topics change over space. Unlike other recent work that relies on either Gaussian assumptions or discretization of locations, here topics are associated with a distance dependent Chinese Restaurant Process(ddC RP), and for each document, the observed words are influenced by the document's GPS-tag. Our model allows both unbound number and flexible distribution of the geographical variations of the topics' content. We develop a Gibbs sampler for the proposal, and compare it with existing models on a real data set basis.