|
|
|
|
|
Statistical analysis tool The statistical analysis tool program SAT has been developed to perform statistical analyses on data in the IDOD database. The SAT program is also used to derive models for use in the statistical quality control program SQC. Note: SAT has been developed to run in the web browser Internet Explorer. Since features as javascript, cgi, CSS, etc. are used, only versions 4 or higher of these browsers will be able to run the program. Concerning the export format of your request, you should know:
Choose the STAT1 function at the end of the request from to use the transposed format. Choose the STAT2 function at the end of the request from to use the pivot A format. Choose the STAT3 function at the end of the request from to use the pivot B format. Main menuFrom the main menu choose the appropriate section by a single click. The 'Import/Query Menu' prepares SAT to use the dataset. You can retrieve data from the IDOD database and send it to the SAT program, or reload datasets and their subsets in S-plus. The 'Data Handling Menu' performs some basic data manipulations. You can make transformations of the data, filter the data set, and view (or download) the selected data. The 'Statistical Analysis Menu' performs statistical analyses on the selected data. Supported functions are Summary Statistics, Normality Check, Correlation Analyses, Trend Analysis, Regression Analysis and Spatial Analysis. This section is also used to derive models for the statistical quality control program. NOTE: In SAT all the commands and fill-in fields are CASE SENSITIVE. This is due to the underlying statistical program SPLUS2000. Import / Query menuIn the 'Import/Query Menu', you can import data from the IDOD database. After a query has been done on the MUMM-website, an output file in txt-format is made and you will receive a specific URL which contains the internet address of the SAT-program and the name of the dataset. For instance, the following URL contains the internet-address and the name of the dataset (WPB1 A). http://www.datacenter.mumm.ac.be/ucs/satwww/?name=WPB1A& From the 'Main Menu' page, choose the Import/Query Menu. 'Send Data to SAT' makes those data available for further use in the SAT program. Send Data to SATThe option 'Send Data to SAT' opens a new screen. To activate the data for SAT, push the button 'Send Data'. The name of the data set is automatically provided. A message is given to indicate that the data set is imported and activated in SAT. In case the user started the SAT session via the basic URL, (http://www.datacenter.mumm.ac.be/ucs/satwww/), without the name of the dataset in it, the name of the dataset should be provided. Reload Data in SATAnother option to start a SAT session is via Reload Data, which can be used if the dataset is analysed for a second time. It has the advantage of quickly checking if the dataset is available in S-plus, the text file is not needed. Moreover, transformations made in a previous SAT-session will still be available. Available SubsetsThe last option is to select a Subset, which works the same way as reload data but for datasets that are only a part of the original dataset, since they are filtered in a previous SAT-session. See section 4.2 'Filter the dataset'. Data handling menuThe "Data Handling Menu" performs basic operations on the current data set. After selecting 'Data Handling Menu' in the 'Main Menu', you have the choice between 3 functions. Variable transformationThe function 'Variable Transformation' inserts a transformed variable into the selected data set. It is used to create a new variable from a transformation of one or more variables in a dataset. The name of the dataset, the name of the new variable (will be added to the dataset) and a function declaring the transformation have to be given by the user. For the transformation function, unary (on one variable), binary (on two variables) and combinations of these operators can be used. The following operators are allowed Unary:
Binary:
Combination of the unary and binary operators All this can be done automatically by using the table, where you should choose the variable, a function and a constant for the function. Press Add to continue. The square root can obtained by the power function and constant 0.5. After clicking 'Transformation', the new variable is added as the last column in the data set. Filter the Data Set'Filter the data set focuses on a subset of the data. Quantitative variables are filtered. Categorical variables can only use the ' is equal to ' comparison operator and the constant should be placed between "parenthesis". Even the 'event'-variable, indicating the time, can be used as a variable. All comparison operators can be used. The constant should take the following specific form "dd/mm/yyyy". The table below gives an overview :
View Data SetThe 'View Data Set' function represents the data
set in a tabular form. Click the 'View Data Set' button. A table with
the data is given, a link to download the data set as a Microsoft Excel
sheet is available and the possibility to open the data window in a new
dataset for quick reference during the statistical analysis. Statistical analysis menu'Statistical Analysis Menu' performs basic statistical procedures and model building on the active data sets. Different types of statistical analysis are available
Summary Statistics'Summary Statistics' groups basic exploratory data analysis tools. These functions are used to provide some insight in the data you wish to analyze. The section is split up in 2 parts. 'Numerical Summary Statistics' derives basic statistics for the variables in the data set. The following statistics are given for a chosen data set:
You specify the name of the data set that should be analyzed. Clicking 'Give Summary Statistics' results in a table with the statistics for each variable in the chosen data set. The 'Graphical Summary Statistics' summarize the data for a given variable. The following graphs are created:
You should provide the name of the data set and the variable. Clicking 'Make Summary Plots' results in a visual representation of the data. Normality Check'Normality Check' allows to visually and numerically verify whether the data of a given variable can be assumed normally distributed. The normality of a variable (Gaussian distribution) is tested by means of the 'Normality Test' function. You should indicate the name of the data set and the name of the variable of interest. The Kolmogorov-Smirnov and the Chi-square test are performed. In addition to these numerical results graphical representations of the variable are derived. A histogram with density line, the fitted percentiles, a normal QQ-plot and the cumulative density plot are given. Trend Analysis'Trend fitting' derives the relationship between a variable of interest (the response variable) and another variable such as distance (regressor variable). You should provide the name of the data set, the name of the regressor variable and the response variable. You can specify the following options:
Correlation AnalysisThe correlation matrix section of the statistical analysis functions consists of 2 functions. The Correlation Matrix function gives the numerical results of a correlation analysis, while the Scatter plot matrix function derives a visual representation that illustrates the correlation between the variables in a data set. The correlation matrix derives the correlation between the variables in a data set. The correlation matrix is calculated as well as the variance of each variable and the number of observations for each pair of variables. You should provide the name of the data set to which this option should be applied. The scatter plot matrix is used to visually illustrate the correlation between the different variables in a data set. For each pair of variables a scatter plot is made and shown in a matrix form. As input the name of the data set should be given. Regression AnalysisThe regression analysis section comprises 2 analyses: the multiple regression function is used to derive the parameters of a user-defined regression model; the subset regression function is used to find a subset of regressor variables out of a list of possible regressors such that all the regressor variables are significant at a given level. The 'Multiple Regression' performs regression with more then 1 regressor variable. The data set should be given, the name of the response variable and the regression function. The regression function specifies the regression model and the variables to be considered as regressors. The output is divided into a numerical part and a graphical part. In the numerical part the estimates for the coefficients are shown, with their standard error, the t-value and the p-value. The R-square value and the residual standard error are also shown together with the p-value of the model. In the graphical output, a plot of the residuals versus the fitted values, a plot of the response versus the fitted values and a QQ-normal plot are given. The 'Subset Regression' automatically identifies the subset of regressor variables that significantly explain the response variable. You should specify the data set, the response variable and the list of potential regressor variables. You can also choose the maximum size of the subset, the maximum number of subset results that should be derived, the maximum p-value for a regressor to be taken into account in a model and the minimum of data points that should be available to derive results for a subset (i.e. because of missing data, the number of data points can substantially differ depending on which combination of regressor variables considered) As a result the requested number of subsets are given in order of decreasing R² adjusted value. Spatial AnalysisThe 'Variogram Calculation' calculates a variogram for a variable. You should specify the data set, the variable of interest and the variable that should be used as location variable. As result a figure with the variogram and a list of results is presented. The first list gives combined results. The second list gives the details for each difference in distance. The 'Variogram Fit' function is used to fit a power model
You should specify the data set, the variable of interest and the variable that should be used as location variable. Also the distance up to which the model should be fitted should be given. As result the variogram graph is given together with the values of the parameters a, b and c. |
Coastal forecast
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
© MUMM | BMM | UGMM 20022012 webmaster@mumm.ac.be MUMM is a department of the Royal Belgian Institute of Natural Sciences |