You are here: MUMM > BMDC > TOOLS > STATISTICAL ANALYSIS > DOCUMENTATION


Untitled Document

 In this section
 
BMDC
 
Open an account
 
Search for data
 
Documentation
 
What data?
Parameter-based inventory
Project-based inventory
 
Near-real time data
 
Catalogues
 
Partnerships
 
Send your feedback
   
   
 Related links
 
Campaigns at sea
The Belgica
 
Measures
Physico-chemical observations
 
Centers and data in Europe
Sea-search
 
Centers and data worldwide
IODE OceanPortal
   
 Recommend us
 
Send this page
by e-mail

Statistical analysis tool

Introduction

The statistical analysis tool program SAT has been developed to perform statistical analyses on data in the IDOD database. The SAT program is also used to derive models for use in the statistical quality control program SQC.

Note: SAT has been developed to run in the web browser Internet Explorer. Since features as javascript, cgi, CSS, etc. are used, only versions 4 or higher of these browsers will be able to run the program. 

Concerning the export format of your request, you should know:

  • the record output format is not suitable for SAT
  • the transposed output format can be used by SAT (e.g. for univariate analyses, like summary statistics, normality checks, …), but is not suitable for correlation analysis or regression analysis.
  • the pivot A and pivot B output formats are suitable for SAT: they should be chosen depending of what you want:
    • to find a relation between a variable and the different analysis methods, then PIVOT A is suitable.
    • to find a relation between different variables where the analysis method is not very important, then especially PIVOT B is suitable (e.g. for long time series analysis).

Choose the STAT1 function at the end of the request from to use the transposed format.

Choose the STAT2 function at the end of the request from to use the pivot A format.

Choose the STAT3 function at the end of the request from to use the pivot B format.

Main menu

From the main menu  choose the appropriate section by a single click.

The 'Import/Query Menu' prepares SAT to use the dataset. You can retrieve data from the IDOD database and send it to the SAT program, or reload datasets and their subsets in S-plus.

The 'Data Handling Menu' performs some basic data manipulations. You can make transformations of the data, filter the data set, and view (or download) the selected data.

The 'Statistical Analysis Menu' performs statistical analyses on the selected data. Supported functions are Summary Statistics, Normality Check, Correlation Analyses, Trend Analysis, Regression Analysis and Spatial Analysis. This section is also used to derive models for the statistical quality control program.

NOTE: In SAT all the commands and fill-in fields are CASE SENSITIVE. This is due to the underlying statistical program SPLUS2000.

Import / Query menu

In the 'Import/Query Menu', you can import data from the IDOD database. After a query has been done on the MUMM-website, an output file in txt-format is made and you will receive a specific URL which contains the internet address of the SAT-program and the name of the dataset. For instance, the following URL contains the internet-address and the name of the dataset (WPB1 A).

http://www.datacenter.mumm.ac.be/ucs/satwww/?name=WPB1A&

From the 'Main Menu' page, choose the Import/Query Menu. 'Send Data to SAT' makes those data available for further use in the SAT program.

Send Data to SAT

The option 'Send Data to SAT' opens a new screen. To activate the data for SAT, push the button 'Send Data'. The name of the data set is automatically provided. A message is given to indicate that the data set is imported and activated in SAT.

In case the user started the SAT session via the basic URL, (http://www.datacenter.mumm.ac.be/ucs/satwww/), without the name of the dataset in it, the name of the dataset should be provided.

Reload Data in SAT

Another option to start a SAT session is via Reload Data, which can be used if the dataset is analysed for a second time. It has the advantage of quickly checking if the dataset is available in S-plus, the text file is not needed. Moreover, transformations made in a previous SAT-session will still be available.

Available Subsets

The last option is to select a Subset, which works the same way as reload data but for datasets that are only a part of the original dataset, since they are filtered in a previous SAT-session. See section 4.2 'Filter the dataset'.

Data handling menu

The "Data Handling Menu" performs basic operations on the current data set. After selecting 'Data Handling Menu' in the 'Main Menu', you have the choice between 3 functions.

Variable transformation

The function 'Variable Transformation' inserts a transformed variable into the selected data set. It is used to create a new variable from a transformation of one or more variables in a dataset. The name of the dataset, the name of the new variable (will be added to the dataset) and a function declaring the transformation have to be given by the user.

For the transformation function, unary (on one variable), binary (on two variables) and combinations of these operators can be used. The following operators are allowed

— Unary:

^ (power) e.g. AMON^3
log (natural logarithm) e.g. log(AMON)
exp (exponential) e.g. exp(AMON)
/ (division) e.g. AMON/2
* (multiplication) e.g. AMON*2

— Binary:

+ (sum) e.g. AMON+NTRI
- (minus) e.g. AMON-NTRI
/ (division) e.g. AMON/NTRI
* (multiplication) e.g. AMON*NTRA

— Combination of the unary and binary operators

All this can be done automatically by using the table, where you should choose the variable, a function and a constant for the function. Press Add to continue.

— The square root can obtained by the power function and constant 0.5.

After clicking 'Transformation', the new variable is added as the last column in the data set.

Filter the Data Set

'Filter the data set focuses on a subset of the data.

Quantitative variables are filtered.

Categorical variables can only use the ' is equal to ' comparison operator and the constant should be placed between "parenthesis".

Even the 'event'-variable, indicating the time, can be used as a variable. All comparison operators can be used. The constant should take the following specific form "dd/mm/yyyy".

The table below gives an overview :

Variable

Comparison operators

Constant

Normal quantitative

all

number

Categorical

only 'is equal to'

'string'

'event', time variable

all

'28/9/2000'

View Data Set

The 'View Data Set' function represents the data set in a tabular form. Click the 'View Data Set' button. A table with the data is given, a link to download the data set as a Microsoft Excel sheet is available and the possibility to open the data window in a new dataset for quick reference during the statistical analysis. 

Statistical analysis menu

'Statistical Analysis Menu' performs basic statistical procedures and model building on the active data sets.

Different types of statistical analysis are available

  • Summary Statistics
  • Normality Check
  • Trend Analysis
  • Correlation Analysis
  • Regression Analysis
  • Spatial Analysis
  • SQC Modeling

Summary Statistics

'Summary Statistics' groups basic exploratory data analysis tools. These functions are used to provide some insight in the data you wish to analyze. The section is split up in 2 parts.

Numerical Summary Statistics

'Numerical Summary Statistics' derives basic statistics for the variables in the data set. The following statistics are given for a chosen data set:

  • Number of measurements
  • Number of missing values
  • Mean
  • Standard Deviation
  • Minimum Value
  • 1st Quartile
  • Median
  • 3rd Quartile
  • Maximum Value

You specify the name of the data set that should be analyzed. Clicking 'Give Summary Statistics' results in a table with the statistics for each variable in the chosen data set.

Graphical Summary Statistics

The 'Graphical Summary Statistics' summarize the data for a given variable. The following graphs are created:

  • Boxplot
  • Histogram
  • QQ Normal Plot
  • Density Plot

You should provide the name of the data set and the variable. Clicking 'Make Summary Plots' results in a visual representation of the data.

Normality Check

'Normality Check' allows to visually and numerically verify whether the data of a given variable can be assumed normally distributed.

Normality Test

The normality of a variable (Gaussian distribution) is tested by means of the 'Normality Test' function.

You should indicate the name of the data set and the name of the variable of interest.

The Kolmogorov-Smirnov and the Chi-square test are performed. In addition to these numerical results graphical representations of the variable are derived. A histogram with density line, the fitted percentiles, a normal QQ-plot and the cumulative density plot are given.

Trend Analysis

Trend Fitting

'Trend fitting' derives the relationship between a variable of interest (the response variable) and another variable such as distance (regressor variable).

You should provide the name of the data set, the name of the regressor variable and the response variable.

You can specify the following options:

  • Significance level of the confidence interval: this option indicates the probability level of the plotted confidence interval
  • Linear: a linear regression model is used in this case, you can indicate whether you desire a numerical summary, and/or a plot and you can indicate which type of confidence intervals you wish to see on the plot.
  • Quadratic: a quadratic regression model is used. The same options as in the linear case are available
  • Non-parametric: a non-parametric loess smoothing is performed. The span width of the model is to be chosen by you.

Correlation Analysis

The correlation matrix section of the statistical analysis functions consists of 2 functions. The Correlation Matrix function gives the numerical results of a correlation analysis, while the Scatter plot matrix function derives a visual representation that illustrates the correlation between the variables in a data set.

Correlation Matrix

The correlation matrix derives the correlation between the variables in a data set. The correlation matrix is calculated as well as the variance of each variable and the number of observations for each pair of variables. You should provide the name of the data set to which this option should be applied.

Scatter plot Matrix

The scatter plot matrix is used to visually illustrate the correlation between the different variables in a data set. For each pair of variables a scatter plot is made and shown in a matrix form.

As input the name of the data set should be given.

Regression Analysis

The regression analysis section comprises 2 analyses:

the multiple regression function is used to derive the parameters of a user-defined regression model;

the subset regression function is used to find a subset of regressor variables out of a list of possible regressors such that all the regressor variables are significant at a given level.

Multiple Regression

The 'Multiple Regression' performs regression with more then 1 regressor variable.

The data set should be given, the name of the response variable and the regression function.

The regression function specifies the regression model and the variables to be considered as regressors.

The output is divided into a numerical part and a graphical part.

In the numerical part the estimates for the coefficients are shown, with their standard error, the t-value and the p-value. The R-square value and the residual standard error are also shown together with the p-value of the model.

In the graphical output, a plot of the residuals versus the fitted values, a plot of the response versus the fitted values and a QQ-normal plot are given.

Subset Regression

The 'Subset Regression' automatically identifies the subset of regressor variables that significantly explain the response variable.

You should specify the data set, the response variable and the list of potential regressor variables.

You can also choose the maximum size of the subset, the maximum number of subset results that should be derived, the maximum p-value for a regressor to be taken into account in a model and the minimum of data points that should be available to derive results for a subset (i.e. because of missing data, the number of data points can substantially differ depending on which combination of regressor variables considered)

As a result the requested number of subsets are given in order of decreasing R² adjusted value.

Spatial Analysis

Variogram Calculation

The 'Variogram Calculation' calculates a variogram for a variable. You should specify the data set, the variable of interest and the variable that should be used as location variable.

As result a figure with the variogram and a list of results is presented. The first list gives combined results. The second list gives the details for each difference in distance. 

Variogram Fit

The 'Variogram Fit' function is used to fit a power model   to the calculated variogram.

You should specify the data set, the variable of interest and the variable that should be used as location variable. Also the distance up to which the model should be fitted should be given. 

As result the variogram graph is given together with the values of the parameters a, b and c.



News
Marine biodiversity Day

Coastal forecast

TIDES
OSTEND
[TAW]
 
Time
Elev.
 Low
19:40
0.53 m
 High
1:40
4.54 m
 Table Graph North Sea animation Belgian coastal zone animation

Harmonic prediction 
Ostend 1980–2020:
  *to
Enter as YYYY-MM-DD
  
WIND
WESTHINDER
 Speed 6.04 m/s 
 Sector 344° , NNW 
 Table Graph Graph North Sea animation
  
WAVES
AKKAERT
 Height 1.03 m
 Table Graph North Sea animation
  
CURRENTS
WESTHINDER
 Graph ploar plot Line plot North Sea animation Belgian coastal zone animation
  
TEMPERATURE
OSTEND
 Graph Daily maps
  
SALINITY
OSTEND
 Graph Daily maps
  
TRANSPORT
  Daily maps
  


 © MUMM | BMM | UGMM 2002–2012 webmaster@mumm.ac.be
 MUMM is a department of the Royal Belgian Institute of Natural Sciences