JMP tutorial: Statistical Interference Using JMP

The first two questions below are a demonstration of the use of JMP to analyse statistical data. These two questions are followed by lab exercise which put the material learned to work.

1. Confidence Interval

This question demonstrates the concept of a confidence interval. Open the script "confidence.jsl" from the sample scripts folder and run it. This script is a Normal Distribution and it picks 100 samples, each of sample size 20. For each sample, the mean is computed with a 95% confidence interval. Each interval is graphed, in gray if the interval captures the overall mean and in red if it doesn't.

Press Ctrl+D to generate another series of 100 samples. Each time, note the number of times the interval captures the theoretical mean. The ones that don’t capture the mean are due only to chance, since we are randomly drawing the samples. For a 95% confidence interval, we expect that around five will not capture the mean, so seeing a few is not remarkable.

Below are a few samples produced at random:

Change the confidence level by clicking on confidence level below the graph and typing 5. The following is produced as a result:

2. Case Study: The Earth’s Ecliptic

In 1738, the Paris observatory determined with high accuracy that the angle
of the earth’s spin was 23.472 degrees. However, someone suggested that the angle
changes over time. Examining historical documents found five measurements dating
from 1460 to 1570. These measurements were somewhat different than the Paris
measurement, and they were done using much less precise methods.
The question is whether the differences in the measurements can be attributed to
the errors in measurement of the earlier observations, or whether the angle of the
earth’s rotation actually changed. We need to test the hypothesis that the earth’s
angle has actually changed.
H0: Earth’s angle has not changed, the mean of the previous values is not different
from the one calculated in Paris. Hypothesized mean = 23.472.
HA: Earth’s angle has changed and the means are different.

• Open Cassub.jmp
• Analyze > Distribution, set obliquity as Y variable

Test if the mean of these values is different than the value from the Paris
observatory. Our null hypothesis is that the mean is not different.

It can be seen that the mean is 23.499 which is different than the Paris Hypothesis (23.472), thus the hypothesis is incorrect.

LAB EXERCISES

1. The file movies.jmp contains a list of the top grossing movies of all time (as of June 2003). It contains data showing the name of a movie, the amount of money it made in the United States (Domestic) and in foreign markets (in millions of dollars), its year of release, and the type of movie.

Below is a histogram of all of the types of movies which was created by selecting Analyze --> Distribution then dragging types into the "Y, Columns" box:

There are five levels in this variable: Action, Comedy, Drama, Family, Mystery-Suspense. Of those listed above, there are 56 Action movies, 69 Comedy, 77 Drama, 45 Family and 29 Mystery-Suspense.

Below is the histogram of the domestic gross for each movie which is produced in the same way as above:

The range of values for this variable is between 100 and 600.8 dollars and the average domestic gross of these movies is 157.48 as shown on the Quantiles and Moments parts above.

There are outliers (shown on the right side of the box). Look for outliers in the datasheet; they should be extreme values that are not within the expected range. To check if your guess of outliers was right, place the pointer on one of the points in the outlier box and it should show which movie it was.

Create a subset of the data consisting of only drama movie followed by a histogram. To create a subset consisting of only drama movies, double click on drama on the histogram produced for types of movies. This will produce only the drama movies which could be used to create the histogram shown below:

The average domestic and foreign grosses for the subset are found to be $166.10 and $322.81 respectively.
The plots have a few outliers as seen in the pictures above.

2. Open the file Analgesics file from the sample data. This file contains the results of a study of the effect of three different pain relievers on the amount of pain experienced by the patients. The only classification of patients in the study is by gender. Below is a histogram of the variables gender, drug and pain which is created by dragging all three variables into the y,columns box:

Click on the histogram bars; it can be seen that females use much more of drug A than of B and C and their levels of pain are between 0-10, with most females having a pain level of around 7.5.

The distribution for females can be seen below:

It can be seen that males consumed almost equal amounts of each drug but experienced much higher levels of pain, with pain levels between 5-17.

The distribution for males can be seen below:

Males consumed almost the same amounts of drugs B & C as females, but much less of drug A. Males also had much higher levels of pain than females, according to the study.

Next, analyze the distribution for the amount of pain caused by each of the drugs A, B and C which can be done by placing drug in the "Y, Columns" box and pain in the "By" box.

Drug A

Drug B

Drug C

3. Open the file Scores.jmp from the sample data. This file contains data from a Study in the United States. the study was conducted to get the results of 5000 students on tests of Calculus and Physics. The results were separated into the four regions of the US. Some students took the Calculus test, some the Physics and some both.

A histogram of the results is shown below:

The mean score on the Calculus test was 452.06227 and on the Physics test was 417.11735

By clicking on the histogram bar, it shows that most students who received high scores on the calculus tests, also received high scores on their Physics tests.

An example showing that students who received high scores on one test also received high scores on the other:

Another example showing that students who received low scores on one test also received low scores on the other:

Moreover, a graph of Physics scores versus Calculus scores is linear:

Below are the mean values for the Calculus scores for each of the four regions:

Region 1:

Region 2:

Region 3:

Region 4:

The mean scores on the Calculus tests for all four regions are almost equal.

Do the same for the Physics tests.

Region 1:

Region 2:

Region 3:

Region 4:

The mean Physics scores in all four regions of the US also seemed to be almost equal.

From an equivalent former test, the mean score of United States Calculus students was 450. This shows that there has been a minor increase in scores since that last test.

Construct a 95% confidence interval for the mean calculus score by clicking on the red triangle to the left of "Calculus Score" --> Confidence Interval --> 0.95 to get the following:

Physics teachers say that the overall United States score on the Physics test should be higher than 420; however, the data does not support that since the mean Physics score is below 420.

Construct a 95% confidence interval for the mean Physics score to get the following:

4. Open hotdogs.jmp from the sample data. The results came from the investigation of taste and nutritional content of hot dogs. Below is the histogram of the results:

The number of hot dogs of each type is roughly equal.

The $/oz variable in this file represents the cost in dollars per ounce of hot dog. An outlier plot is created which is shown below:

The two outlier points (Top right) represent "General Kosher Beef" and Wall's Kosher Beef Low Fat".

The caloric content of the three types of hot dogs is shown below, each with a 95% confidence interval.

Type = Beef:

Type = Meat:

Type = Poultry:

On average, hot dogs made with poultry have the lowest caloric content compared with beef and meat.

To test the conjecture that the mean sodium content of all hot dogs is 410 grams, a histogram for the content of sodium by the type of hot dog (meat) was produced. The sodium content was found to be around 418.5 grams.

5. The difference between the z-test and the t-test is that, even though they are both used to statistically compare the mean values of two different groups of data to see if they are similar, the z-test is usually applied to a larger number of samples. The degrees of freedom in a Student's t-test is a parameter in the equation of the test. The value of the degrees of freedom tells us the number of mean values being compared. The z-test and t-test become equal when the value of the degrees of freedom is equal to zero.

JMP tutorial

Saturday, 31 March 2012

Statistical Interference Using JMP

No comments:

Post a Comment