Techniques of Observational Astronomy
AST3722C


Basic Statistics

We use statistics to analyze a set of observations in order to evaluate just what we can conclude from those data.

Imagine we have a collection of data: 4.45, 4.50, 4.50, 4.55, 4.55, 4.55, 4.60, 4.65, 4.65

Important characteristics:

Median: The individual value from the collection such that ½ the observations are less and ½ are greater:
4.55
Note that the median must be extracted from the dataset, not simply calculated.

Why is the median sometimes useful?

Imagine a different data set: 4.45, 4.50, 4.50, 4.55, 4.55, 4.55, 4.60, 4.65 , 8.7


How do we get an estimate of how good are data are?

How repeatable/reliable are our values?

The standard deviation tells us something about the expected value of a single observation.

If the data are normally distributed

Usually we accept a variation as statistically significant only if it is more than 3 sigma from the mean.

Standard deviation of the mean:

How reliable is our estimate of the mean?

The “standard deviation in the mean” is given by

or .  This is an estimator of the quality of the mean value and it reflects the improvement gained by averaging several data points.  Note that to improve the quality of the data by a factor of ten would require one hundred samplings of the data.

The “standard deviation in the mean” (sometimes called “standard error”) is the appropriate value to use to draw “error bars” on a plot of mean values.


 

                   1                 2                 3                  4

                   102.7051    96.99768    106.1652     106.7639

                   93.74577    87.22317    84.87374     92.7521

                   92.91529    102.4426    107.6497     98.48607

                   102.2656    112.7647    108.2898     93.23228

                   110.5028    111.9835    111.2649     110.5721

                   92.93493    117.3313    117.9858     100.0187

                   104.8264    78.16412    88.74805     121.3331

                   102.9943    97.65819    102.8124     96.12005

                   93.52754    110.9502    107.0592     86.2086

                   108.0685    89.13299    98.85192     84.94068

mean          100.4486    100.4649    103.3701     99.04276

std dev      6.660202    12.92402    10.09143     11.23548

std err        2.106141    4.086935    3.191189     3.552972

                          

Kinds of data collections

Normal” or “Gaussian” distributions:  Most experimental results should follow this distribution.

Poisson” or “counting rate” distributions

The “counts” accumulated in a CCD pixel will have a Poisson distribution.

The “standard deviation” of a Poisson distribution is given by so if you have 10,000 counts in a pixel, the error will be ±100.


Signal-to-Noise ratio

For Poisson statistics (c = total received counts)

Linear Least Squares or Regression analysis

Assume a straight line fit to some data

Assume a straight line fit to some data. Let y = focus value, x = temperature

 

The errors can be estimated from:

Interpretation: At 0 F the focus will be (about) 31000. The change in focus with temperature is about –40 counts per degree.


Correlation Coefficient

It is useful to look at the correlation coefficient, rho, between x and y. A correlation coefficient of 0 means that x and y are not correlated, a value of +/- 1 means the quantities are positively/negatively correlated.


This page is maintained by John P. Oliver; write me at oliver@astro.ufl.edu
This material is being made available to you subject to a variety of caveats.

This page was last edited 10/25/2004