Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect ...Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.展开更多
Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used parti...Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used partial least square regression (PLSR) models to relate forest parameters, i.e. canopy closure density and above ground tree biomass, to Landsat ETM+ data. The established models were optimized according to the variable importance for projection (VIP) criterion and the bootstrap method, and their performance was compared using several statistical indices. All variables selected by the VIP criterion passed the bootstrap test (p〈0.05). The simplified models without insignificant variables (VIP 〈1) performed as well as the full model but with less computation time. The relative root mean square error (RMSE%) was 29% for canopy closure density, and 58% for above ground tree biomass. We conclude that PLSR can be an effective method for estimating canopy closure density and above ground biomass.展开更多
China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteo...China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).展开更多
In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous p...In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The results indicated that classification abilities of FPCA and FPLS models outperformed those of the PCDA and PLSDA models by using a small number of functional basis components. With substantial reduction in model complexity and improvement of classification accuracy, it is particularly helpful for interpretation of the complex spectral features related to precancerous colon polyps.展开更多
This study evaluates the operational performance of all routes of Sajha Bus Yatayat operating inside Kathmandu valley using Data Envelopment Analysis (DEA) in terms of efficiency and effectiveness score. This approach...This study evaluates the operational performance of all routes of Sajha Bus Yatayat operating inside Kathmandu valley using Data Envelopment Analysis (DEA) in terms of efficiency and effectiveness score. This approach allows us to access the relative performance of transit system in absence of historical data and research to compare with. To explore the possibility of enhancing the performance, scenarios were created for relatively underperforming routes and long route problem by changing the most important input variable and output variables accordingly with regression model where it was relevant. Partial Least Squares (PLS) regression was used to determine the most influential input variables to the output variables. DEA was conducted to access the performance of all routes under these scenarios. Underperforming routes except the longest route under the first set of scenarios, emerge to be better performing efficiently without considerable negative deviation in effectiveness. The result of second set of scenarios for long route problem suggests that the longest route’s performance can be enhanced significantly upon proper route alignment. Scenarios development and evaluation can help lead transit companies to explore the strategies to facilitate operational performance enhancement.展开更多
The objective of this paper is to present a review of different calibration and classification methods for functional data in the context of chemometric applications. In chemometric, it is usual to measure certain par...The objective of this paper is to present a review of different calibration and classification methods for functional data in the context of chemometric applications. In chemometric, it is usual to measure certain parameters in terms of a set of spectrometric curves that are observed in a finite set of points (functional data). Although the predictor variable is clearly functional, this problem is usually solved by using multivariate calibration techniques that consider it as a finite set of variables associated with the observed points (wavelengths or times). But these explicative variables are highly correlated and it is therefore more informative to reconstruct first the true functional form of the predictor curves. Although it has been published in several articles related to the implementation of functional data analysis techniques in chemometric, their power to solve real problems is not yet well known. Because of this the extension of multivariate calibration techniques (linear regression, principal component regression and partial least squares) and classification methods (linear discriminant analysis and logistic regression) to the functional domain and some relevant chemometric applications are reviewed in this paper.展开更多
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
基金the National Natural Science Foundation of China (41101395, 41071276, 31071324)the Beijing Municipal Natural Science Foundation, China (4122032)the National Basic Research Program of China (2011CB311806)
文摘Powdery mildew (Blumeria graminis) is one of the most destructive crop diseases infecting winter wheat plants, and has devastated millions of hectares of farmlands in China. The objective of this study is to detect the disease damage of powdery mildew on leaf level by means of the hyperspectral measurements, particularly using the continuous wavelet analysis. In May 2010, the reflectance spectra and the biochemical properties were measured for 114 leaf samples with various disease severity degrees. A hyperspectral imaging system was also employed for obtaining detailed hyperspectral information of the normal and the pustule areas within one diseased leaf. Based on these spectra data, a continuous wavelet analysis (CWA) was carried out in conjunction with a correlation analysis, which generated a so-called correlation scalogram that summarizes the correlations between disease severity and the wavelet power at different wavelengths and decomposition scales. By using a thresholding approach, seven wavelet features were isolated for developing models in determining disease severity. In addition, 22 conventional spectral features (SFs) were also tested and compared with wavelet features for their efficiency in estimating disease severity. The multivariate linear regression (MLR) analysis and the partial least square regression (PLSR) analysis were adopted as training methods in model mildew on leaf level were found to be closely related with the development. The spectral characteristics of the powdery spectral characteristics of the pustule area and the content of chlorophyll. The wavelet features performed better than the conventional SFs in capturing this spectral change. Moreover, the regression model composed by seven wavelet features outperformed (R2=0.77, relative root mean square error RRMSE=0.28) the model composed by 14 optimal conventional SFs (R2---0.69, RRMSE--0.32) in estimating the disease severity. The PLSR method yielded a higher accuracy than the MLR method. A combination of CWA and PLSR was found to be promising in providing relatively accurate estimates of disease severity of powdery mildew on leaf level.
基金supported by the 948 Program of the State Forestry Administration (2009-4-43)the National Natura Science Foundation of China (No.30870420)
文摘Boreal forests play an important role in global environment systems. Understanding boreal forest ecosystem structure and function requires accurate monitoring and estimating of forest canopy and biomass. We used partial least square regression (PLSR) models to relate forest parameters, i.e. canopy closure density and above ground tree biomass, to Landsat ETM+ data. The established models were optimized according to the variable importance for projection (VIP) criterion and the bootstrap method, and their performance was compared using several statistical indices. All variables selected by the VIP criterion passed the bootstrap test (p〈0.05). The simplified models without insignificant variables (VIP 〈1) performed as well as the full model but with less computation time. The relative root mean square error (RMSE%) was 29% for canopy closure density, and 58% for above ground tree biomass. We conclude that PLSR can be an effective method for estimating canopy closure density and above ground biomass.
基金supported by the Forest Scientific Research in the Public Interest,China(201404720)the earmarked fund for the China Agriculture Research System(CARS-27)the Beijing Municipal Education Commission,China(CEFF-PXM2017_014207_000043)
文摘China has the largest apple planting area and total yield in the world, and the Fuji apple is the major cultivar, accounting for more than 70% of apple planting acreage in China. Apple qualities are affected by meteorological conditions, soil types, nutrient content of soil, and management practices. Meteorological factors, such as light, temperature and moisture are key environmental conditions affecting apple quality that are difficult to regulate and control. This study was performed to determine the effect of meteorological factors on the qualities of Fuji apple and to provide evidence for a reasonable regional layout and planting of Fuji apple in China. Fruit samples of Fuji apple and meteorological data were investigated from 153 commercial Fuji apple orchards located in 51 counties of 11 regions in China from 2010 to 2011. Partial least-squares regression and linear programming were used to analyze the effect model and impact weight of meteorological factors on fruit quality, to determine the major meteorological factors influencing fruit quality attributes, and to establish a regression equation to optimize meteorological factors for high-quality Fuji apples. Results showed relationships between fruit quality attributes and meteorological factors among the various apple producing counties in China. The mean, minimum, and maximum temperatures from April to October had the highest positive effects on fruit qualities in model effect loadings and weights, followed by the mean annual temperature and the sunshine percentage, the temperature difference between day and night, and the total precipitation for the same period. In contrast, annual total precipitation and relative humidity from April to October had negative effects on fruit quality. The meteorological factors exhibited distinct effects on the different fruit quality attributes. Soluble solid content was affected from the high to the low row preface by annual total precipitation, the minimum temperature from April to October, the mean temperature from April to October, the temperature difference between day and night, and the mean annual temperature. The regression equation showed that the optimum meteorological factors on fruit quality were the mean annual temperature of 5.5-18°C and the annual total precipitation of 602-1121 mm for the whole year, and the mean temperature of 13.3-19.6°C, the minimum temperature of 7.8-18.5°C, the maximum temperature of 19.5°C, the temperature difference of 13.7°C between day and night, the total precipitation of 227 mm, the relative humidity of 57.5-84.0%, and the sunshine percentage of 36.5-70.0% during the growing period (from April to October).
文摘In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely used multivariate methods, principal component discriminant analysis (PCDA) and partial least squares discriminant analysis (PLSDA). The results indicated that classification abilities of FPCA and FPLS models outperformed those of the PCDA and PLSDA models by using a small number of functional basis components. With substantial reduction in model complexity and improvement of classification accuracy, it is particularly helpful for interpretation of the complex spectral features related to precancerous colon polyps.
文摘This study evaluates the operational performance of all routes of Sajha Bus Yatayat operating inside Kathmandu valley using Data Envelopment Analysis (DEA) in terms of efficiency and effectiveness score. This approach allows us to access the relative performance of transit system in absence of historical data and research to compare with. To explore the possibility of enhancing the performance, scenarios were created for relatively underperforming routes and long route problem by changing the most important input variable and output variables accordingly with regression model where it was relevant. Partial Least Squares (PLS) regression was used to determine the most influential input variables to the output variables. DEA was conducted to access the performance of all routes under these scenarios. Underperforming routes except the longest route under the first set of scenarios, emerge to be better performing efficiently without considerable negative deviation in effectiveness. The result of second set of scenarios for long route problem suggests that the longest route’s performance can be enhanced significantly upon proper route alignment. Scenarios development and evaluation can help lead transit companies to explore the strategies to facilitate operational performance enhancement.
文摘The objective of this paper is to present a review of different calibration and classification methods for functional data in the context of chemometric applications. In chemometric, it is usual to measure certain parameters in terms of a set of spectrometric curves that are observed in a finite set of points (functional data). Although the predictor variable is clearly functional, this problem is usually solved by using multivariate calibration techniques that consider it as a finite set of variables associated with the observed points (wavelengths or times). But these explicative variables are highly correlated and it is therefore more informative to reconstruct first the true functional form of the predictor curves. Although it has been published in several articles related to the implementation of functional data analysis techniques in chemometric, their power to solve real problems is not yet well known. Because of this the extension of multivariate calibration techniques (linear regression, principal component regression and partial least squares) and classification methods (linear discriminant analysis and logistic regression) to the functional domain and some relevant chemometric applications are reviewed in this paper.