CMPSC-132: Programming and Computation II Homework 2 (100 points) Goal: The goal of this assignment is to reinforce the fundamental concepts of object-oriented programming in Python. Through this homework, you should gain a better idea of how to implement a system of classes that build on each other. General instructions: • The work in this assignment must be your own original work and be completed alone. • The instructor and course assistants are available on Piazza and with office hours to answer any questions you may have. You may also share testing code on Piazza. • A doctest is provided to ensure basic functionality and may not be representative of the full range of test cases we will be checking. Further testing is your responsibility. • Debugging code is also your responsibility. Assignment-specific instructions: • Download the starter code file from Canvas. Do not change the function names or given starter code in your script. • You can assume objects will be initialized with the proper date types; you don’t need to check data types unless is explicitly mentioned in the class’ method description • If you are unable to complete a method, leave the pass statement

Boston trader Sarah Knight on her travels in Connecticut, 1704

Jonathan Edwards revives Enfield, Connecticut, 1741

Samson Occom describes his conversion and ministry, 1768

8. Blueprint and photograph of Christ Church

In approximately 400-600 words, discuss the one of the main themes of this chapter, using one of the primary sources as well as the chapter text to illustrate your observations.

Paper details: Please answer ONLY the following Five question tax issues with analysis based on PRIMARY AUTHORITY see attached SAMPLE (and site the citation the same way is in the attached sample ( Example Rev. Rul. 84-101, 1984-28 I.R.B. 5.) 1-What is the character of any taxable gain generated by the sale of Eli Wolford interest to Kevin Dole? 2-What is the amount of any taxable gain generated by the sale of Eli Wolford interest to Kevin Dole? 3-Is there a tax consequence to the existing partners due to the transfer of interest in the partnership by Eli Wolford? 4-Will the company partnership structure be affected by the sale of interest by Eli Wolford? 5-ADD your own good tax issue

The aim of this workshop is to give you some experience using standard statistical treatments of data using statistical software. The package we will use for this workshop is Minitab. It is the package used as the standard statistical software by the Mathematics and Statistics courses at RMIT and Chemistry also has a site licence for it. Most of the analyses in this workshop can be done using Excel but Excel is not very user-friendly for statistical analyses. The analyses in Minitab can be done from simple pull-down menus and there is a good on-line help facility. You can copy-and-paste data from Excel into Minitab. For the assessment you should enter your results into the attached pro forma.

Before you attempt the case studies in this assignment you should try the worked examples in modules 1-4

Case Study 1

Analyst A

Analyst B

6.42

6.40

6.41

6.54

6.43

6.52

6.38

6.58

The above results were obtained by two analysts using a new method for determination of Nickel in a standard reference alloy containing a certified value of 6.49% Ni

For this data we want to determine the standard statistics:- (i) mean (ii) variance (iii) standard deviation and (iv) confidence intervals for the mean. This data enables us to answer the following questions:-

Which analysis is the most accurate? (i.e. closest to the certified value)

Which analysis is the most precise? (i.e. which has the smallest spread, or variability, of values

As well we can use the t-test to answer the following:-

Is there any evidence, with either analyst, of a systematic error? I.e. does either average differ significantly from the certified value?

Do the results of each analyst differ significantly?

Analysis: Basic Statistics

Open up Minitab by clicking on the Minitab icon on your desktop

When you open the program you will notice it is divided into two areas – the data area (lower screen) and the output area. Enter data from the above table in columns C1 and C2.

Warning: make sure you start entering data in row 1 NOT in the cell immediately below the column heading (C1 etc). This cell is reserved for column labels (you may put a label here like ‘Analyst A’). Also make sure you don’t enter a column label in row 1. The whole column will then be formatted as text (C1-T) and cannot be used for analysis. If this happens delete the whole column and start again (clicking on ‘C1’ will highlight the whole column).

To get descriptive statistics click on Stat => Basic Statistics => Display Descriptive Statistics to get the basic statistics dialog box. Highlight C1 and C2 on the left and then click ‘Select’. Alternatively you can click in the Variable box and type C1 C2 . Then click OK and the output will appear in the output window. From the output data enter the values in the pro forma. Note that the output does not give the variance but you should be able to calculate it from the standard deviation.

Confidence Intervals

The confidence intervals for the mean can be obtained as follows: Stat => Basic Statistics => 1-sample t. Click on ‘confidence interval’ and leave at the default 95%

The confidence interval is of the form (low value, high value). To express ie interval in the form of ‘mean +/- deviation’ calculate the deviation as 0.5*(high – low)

Hypothesis Testing

We now want to test whether either sample deviates significantly from the expected (certified) value of 6.49. We need to formulate the null hypothesis (H_{o}). In all statistical testing the probability is then calculated of the null hypothesis being true. If there is a low probability (usually < 5% or p = 0.05) of H_{o} being true we reject it and accept the alternative (H_{1}). The null hypothesis generally considers any deviations as being just due to chance/ experimental error. In question (c ) we are looking a null hypothesis of the analytical result not being significantly different from the certified value i.e the mean value is actually 6.49. In question (d) our null hypothesis is that the two means are equal.

For question (c ) we apply a t test: Stat => Basic Statistics => 1-sample t as above but this time check the Test mean box and enter ‘6.49’ in this box. Remember in your conclusions that a result is significant (i.e reject H_{o}) if p < 0.05 (less than a 5% chance that H_{o} is ture i.e no significant difference in the mean value from the certified value).

For question (d) we also apply a t test, to compare two means: Stat = Basic Statistics => 2-sample t. Click on ‘Samples in different columns’ Click ‘First’ box and then double click on C1 in the variables column and similarly for C2 as ‘Second’. Accept ‘Not equal’ and ‘95’ as default values. The 95% confidence level given in the output is for the difference between the two means. The probability that this difference is actually zero (or not significantly different from zero) is given at the end of the output.

Case Study 2

A (mL)

B (mL)

C (mL)

D (mL)

14.03

13.98

14.13

14.16

14.09

13.90

14.23

14.23

14.07

13.79

14.08

14.10

Four students were asked to perform three triplicate titrations using the same titrimetric procedures. Test to see if the students’ results differ significantly

One-Way Analysis of Variance

Enter the above data in columns C3-C6 (with appropriate headings for the columns). We can then test for differences in the means of each column using 1-way ANOVA as follows:- Stat => ANOVA => 1-way (unstacked). Highlight all the columns C3-C6 in the left box and click ‘select’ . They should now appear in the response box. Click OK.

The output should be a typical ANOVA table (see the notes :measurement and assessment of variability.pdf for a full explanation of the ANOVA table). The key value is again the p value (p that H_{o} is true). We are testing here whether all four students are the same i.e their results do not differ significantly. We have to be careful about the alternative (H_{1}) hypothesis if we reject H_{o}. H_{1} is notthat the students are all different (why?). Minitab gives a diagram which can help in interpreting the results, showing each mean and confidence interval. Two results differ significantly if their CI’s don’t overlap. Note, however, that Minitab uses a pooled CI so they are all the same size. The diagram is thus just an indication but is still quite useful.

Case Study 3

Phosphorus (mg/kg)

Temperature (^{o}C)

Soil 1

Soil 1

Soil 2

Soil 2

230

18.2

18.4

18.2

18.5

260

18.6

18.9

18.4

18.1

290

17.7

18

18.1

17.8

320

17.1

17.4

17.8

17.5

An experiment was carried out on the determination of phosphorus in soils to examine the effect of temperature on the analysis. As a result of time and cost considerations it was only possible to carry out 16 experiments. There was insufficient soil for all 16, so two batches of soil were used. A randomised block design was used, giving these results. Test to see if the temperature affects the analysis, and if there is any difference between the soils. Is there any evidence of interaction between soils and temperature?

Two Way Analysis of Variance

This differs from the previous study in that there are two variables – temperature and soil. The data needs to be set out differently, as follows:-

In one column enter all 16 phosphorus analytical values (18.2, 18.4 ….17.5)

You also need two coding columns. Make one column the code for temperature and give a code (1 – 4) for each temperature.

Enter in a third column the code (1-2) for the soil type. Thus the first value (18.5) will have 18.5, 1, 1 in the three columns while the last value (17.5) would have 17.5,4,2 (i.e 320^{o}C and soil 2)

Carry out the two way ANOVA:- Stat=> ANOVA => 2-Way. In the response field enter the column for phosphorus and enter the other two variables in the row and column boxes. Check the ‘display means’ boxes.

Because there are two variables there are now null hypotheses for each variable (e.g no significant difference between soils i.e mean [P] for soil 1 = mean [P] for soil 2). As with all our previous testing the p value is the probability that this is true and we reject H_{o} if p is low ( < 0.05) and hence conclude there is a significant difference.

In 2 way ANOVA the possibility of variable interaction is also tested. An interaction means, for example, that temperature differences depend on soil type. If we see temperature differences with soil 1 but not soil 2 this would be an interaction effect. Again the diagram of means and CIs can be an indication of where differences occur.

Case Study 4

X(^{o}C)

20

21

22

23

24

25

26

27

28

29

30

Y(%)

10

15

14

17

18

19

20

23

24

23

28

A study was made on the effect of temperature on the yield of a chemical process. The results are shown in the above table. Carry out a regression analysis of the data. Predict the % yield when the temperature is 22.5^{o}C

Linear Regression

The analysis can be carried out as follows:-

Stat => Regression => Regression. Enter the Y column in the response box and the X column in the predictors box. Click on options and in the ‘prediction intervals for new responses’ enter ‘22.5’ (note if you have more than one X for prediction you can enter them in a new column and put the column in this box).

The output gives you the model (the regression equation), values of the intercept (constant) and gradient (predictor) with statistical information on these parameters. A full ANOVA table is also shown . For full interpretation of this output you should consult the ‘Chemometrics2: Regression” notes.

The t tests determine whether the gradient or the intercept are significantly non-zero. 9Again, check the p values)

The confidence intervals for the gradient and intercept can be determined as +/- s_{a}*t_{n-2,.05} and similarly for s_{b} . s_{a } and s_{b} are the standard deviations of gradient and intercept respectively. T is the critical t value for n-2 (n = number of pairs of data) degrees of freedom and 0.05 significance level. This value can be obtained from t tables.

At the end of the output is the predicted Y when X = 22.5, along with the confidence (CI) and prediction (PI) intervals. The full meaning of these terms is explained in the regression notes.

Case Study 5

X concentration (mg/L)

40

50

60

70

80

90

40

60

50

80

Y colorimeter reading

69

175

272

335

409

415

72

265

180

412

The following data indicates the relationship between the amount of b- erylthriodine in an aqueous solution and the colorimeter reading of turbidity. Carry out a regression analysis as above. Assess whether a linear model is appropriate for this data.

Carry out linear regression: Stat => regression =>regression. Enter the columns for the X and Y data in the predictor and response boxes.

To investigate whether a linear model is appropriate (i.e . a straight line fits the data better than a polynomial, exponential or logarithmic curve, or some other model) we carry out a ‘lack-of-fit’ test. Minitab has two of these. The first is the ‘pure error’ test, which is standard for testing ‘lack-of-fit’ but requires that at least some of the X values be replicated. The second test is non-standard but does give an indication even when there are no replicates. To use this test click on ‘Options’ in the regression screen and click ‘pure error’ and ‘data subsetting’. The null hypothesis is that there is no curvature (i.e. a linear model is adequate) so the p value reflects the extent to which this is true.

We can also examine the graph of the data and graphs of residuals. To get a regression plot of the fitted line select: Stat => regression =fitted line plot. Enter the columns for X and Y in the predictors and response boxes as above. Click on ‘options’ and check ‘display confidence bands’ and ‘display prediction bands’

To display plots of the residuals , in the regression screen click on ‘graphs’ and clock on the ‘normal plot of residuals’ and ‘residuals vs fits’ boxes.

Examining graphs and plots of residuals can help determine whether there are any outliers. While a curve may be the best fit to all the data , one outlier can greatly affect this. Residuals should be randomly scattered around the X axis of the residuals-X plot, and the normal plot of the residuals should be linear (see Regression notes for further discussion).

Exercise: ‘The Inverse Calibration Problem’

An unknown erlythroidine solution gives a colorimeter reading of 402. What is the predicted concentration? What are the confidence limits for this prediction if (i) this was a single measurement (ii) it was an average of several measurements.

This question is typical of the sort of problem frequently encountered in analytical calibrations. We cannot proceed as in case study 4 because we now wish to determine X from a known Y (the ‘inverse’ problem). Least squares analysis assumes the error is in the Y determinations. However the error in X determined from Y can be estimated from the standard deviation of the interpolated X_{0}:

S_{X0} = }^{0.5}

A spreadsheet has been set up to carry out this calculation. It can be found in s:\hons\data treatment\invcalib.xls, or on the data treatment site on the DLS.

Carry out the determination of the prediction and confidence intervals as follows:-

(i) Copy the (X,Y) calibration values for Q.5 from Minitab . Paste them in the invcalib spreadsheet (inverse calibration sheet) in the X and Y columns at the left.

(ii) enter ‘402’ in the y_{o} cell (highlighted in green) and ‘1’ in the highlighted cell for ‘m’. This then gives the predicted value for x and the CI if 402 is a single measurement

(iii) change the ‘m’ values to 5. See what effect it has on the CI .

Part B You are to carry out an evaluation of your project , in terms of the data collection and treatment aspects of the project. If you do not have a project yet, think of a ‘hypothetical’ project in your discipline area which might be carried out. This is to be presented as a brief summary , set out as follows.

Project Overview

Give the project title (including supervisor). State the aims of the project – what do you want to achieve? Why is the study being carried out?

Define the response(s)

What is being measured? List your types of responses. Are these responses qualitative or quantitative? If qualitative can they be turned into quantitative responses (e.g by giving a score or rating). Are they discrete or continuous?

Define the Factors

What factors (variables) affect your results (responses)?

Rank the factors – known to influence, suspect to influence, unknown effect

Divide the factors into controllable and uncontrollable

Identify sources of error

What are the sources of error in your study? How can they be minimised? You need to consider the effect of sampling – usually you cannot test the whole population so you want to take a sample of the population. How do you select the sample? How big should the sample be?

Name………………………… Student Number ………………………

Case Study 1

Basic Statistics

Analyst A

Analyst B

Mean

Standard Deviation

Variance

Confidence Interval (95%)

Which analyst is the most precise? ……………….. Reason?…………………..

Which Analyst is the most accurate?………………. Reason? …………………

Test

Null Hypothesis

p

Significant?

A differs from standard?

B differs from standard?

A and B differ from each other?

Case Study 2

H_{0} …………………………………… H_{1} ………………………………………..

p ……………… significant? ……………………………………

Does the diagram indicate anything further about the students? …………………………….

Case Study 3:

F

p

Significant?

Temperatures

Soils

Interactions

Case Study 4

Predicted equation (model)

P for hypothesis (b= 0)

Significant? i.e. is the gradient non-zero?

Standard deviation of slope (s_{b})

t (from tables)

Confidence intervals for b_{1}(+/- ts_{b})

Predicted yield for 22.5^{o}C

Confidence interval

Prediction interval

Case Study 5

Model (equation) ……………………………………………….

Lack-of-fit significant? …………………………………………

Check the plot of the data and the residuals. Is there any evidence of an outlier? What further treatment of the data would you suggest?

A psychologist conducted a study to examine the nature of the relation, if any, between an employee’s emotional stability (X) and the employee’s ability to perform in a task group (Y). Emotional stability was measured by a written test for which the higher the score, the greater is the emotional stability. Ability to perform in a task (Y=1 if able, Y=0 if unable) was evaluated by the supervisor. The psychologist believes a logistic regression model is appropriate for studying this suspected relation. The dataset for this exercise is “emoc.accdb” (Access format) and can be download from e leaning.

Conduct an appropriate analysis, a complete write-up is required, supplement with tables and figures as necessary; if possible include a scatter plot with the fitted logistic response function.

Use the dataset from exercise 10 and conduct a power analysis for a one-way ANOVA and 2B- factorial ANOVA. Write up your findings.

Logistic

Use the dataset from exercise 9 and conduct a power analysis for the logistic regression. Write up your findings.

Regression

Provided below are test data for three groups of students taught by Mr. Smith, Ms. Jones, and Ms. Green along with student pretest data. Use test as the dependent variable and the teacher and pretest as the independents to fit a multiple regression model testing your residual analysis. Then conduct a power analysis. Write up your findings.

Test Teacher Pretest

38 Smith 21

39 Smith 26

36 Smith 22

45 Smith 28

33 Smith 19

43 Green 34

38 Green 26

38 Green 29

27 Green 18

34 Green 25

24 Jones 23

32 Jones 29

31 Jones 30

21 Jones 16

28 Jones 29

Include in your write-up analysis a discussion of outliers and influential data points, residual analysis (are they normal), relevant tables and plots and a complete interpretation of the parameter estimates and overall model.

Power Analysis Extra Credit (1 pt)

The EC is due with Exercise 6; please use a separate attachment for it in the eLearing drop box.

Write a paragraph to a page review (APA format with a complete APA citation) of a primary source using power analysis. Be sure to answer the following questions in your summary: (a) What is the general problem under study, (b) specifically what research question/hypothesis are the researchers testing within the specified analysis (i.e., what groups are being compared), (c) what are the covariate variable(s), (d) what is the criterion variables, (e) did the authors conducted residual analysis, (f) is the analysis related to the general problem the researchers are investigating (g) describe the power analysis and (h) what were the findings of the specific analysis?

Use the data below and conduct a 2-B factorial ANOVA, complete with an interaction analysis if necessary. The data describe performance on statistics project in a sample of 28 graduate students. A complete write-up is required, supplement with tables and figures as necessary. Subjects were classified as master or doctoral and students either self-selected a self-study curriculum, a laboratory curriculum or a traditional lecture curriculum.

SAS input

data Ex9;

input id

group

study

score @@;

datalines;

01 1 1 34 02 1 1 33 03 1 1 28 04 1 1 29 05 1 1 33

06 1 2 34 07 1 2 31 08 1 2 28 09 1 2 31 10 1 2 .

11 1 3 45 12 1 3 29 13 1 3 38 14 1 3 34 15 1 3 33

16 2 1 30 17 2 1 31 18 2 1 39 19 2 1 30 20 2 1 34

21 2 2 35 22 2 2 36 23 2 2 37 24 2 2 41 25 2 2 39

26 2 3 . 27 2 3 28 28 2 3 41 29 2 3 47 30 2 3 45

;

run;

proc format;

valuegfmt 1=’Master’

2=’Doctoral’;

valuesfmt 1=’SelfStudy’

2=’LAB’

3=’Lecture’;

run;

SPSS input

data list list

/id group study score.

begin data.

01 1 1 34

02 1 1 33

03 1 1 28

04 1 1 29

05 1 1 33

06 1 2 34

07 1 2 31

08 1 2 28

09 1 2 31

10 1 2 .

11 1 3 45

12 1 3 29

13 1 3 38

14 1 3 34

15 1 3 33

16 2 1 30

17 2 1 31

18 2 1 39

19 2 1 30

20 2 1 34

21 2 2 35

22 2 2 36

23 2 2 37

24 2 2 41

25 2 2 39

26 2 3 .

27 2 3 28

28 2 3 41

29 2 3 47

30 2 3 45

end data.

val lab group 1 “Master” 2 ‘”Doctoral’.

val lab study 1 “SelfStudy” 2 “LAB” 3 “Lecture”.

Less than Full Rank Design Extra Credit (1 pt)

The EC is due with Exercise 9; please use a separate attachment for it in the eLearing drop box.

Write a paragraph to a page review (APA format with a complete APA citation) of a primary source of an unbalanced study. Be sure to answer the following questions in your summary: (a) What is the general problem under study, (b) specifically what research question/hypothesis are the researchers testing within the specified analysis (i.e., what groups are being compared), (c) what was the statistical approach taken to accommodate the nonorthogonality, (d) what is the outcome variable, (e) did the authors present a test of any of the linear model assumptions, (f) is the analysis related to the general problem the researchers are investigating and (g) what were the findings of the specific analysis?

For answers contact us via proessaywiters@gmail.com

Math 203 Course Project You may only use quantitative data for this project. The final project should be written in paragraph form, but should include all the information listed below. Obviously, it will contain graphs and charts, but do not present the information as individual numbered problems. Think of it as if you were writing an article to go into a newspaper or magazine.

First, decide on a random variable that you are interested in knowing more about. You will be gathering data for this variable. You may do your own survey to get the data (use sampling techniques described in chapter 1) or you may use data found online. Your sample size should be no smaller than 10 and no larger than 50. Define your population and describe how you got your sample data set. lnclude the raw data in your report.

ls your data discrete or continuous? ls your data nominal, ordinal, interval or ratio?

Construct a frequency distribution for your data set using 5 to 8 classes (you choose how many). What is your class width? Your frequency distribution should include class limits, class boundaries, class midpoints, frequency, cumulative frequency, and relative frequency.

Construct a histogram, frequency polygon and ogive for your data seU labeling each graph appropriately. Describe the shape of your data (i.e. ls it symmetrical, skewed right or skewed left?).

Calculate the measures of central tendency for your sample (mean, median, mode and midrange).

Calculate the measures of variation for your sample (range, variance and standard deviation). 7 . Use Chebyshev’s Theorem to find the range in which at least 75% of the data fall.

ldentify the five-number summary and find the interquartile range. Construct a boxplot of your data set. Does your data contain any outliers (identify specific criteria used to determine)?

Construct two probability questions about your data and solve each. Here, you are creating the problenn and then solving. Examples below: a. For discrete data: P(exactly X) ; P(at least X); P(at most X); P(less than X); P(more than X) b. For continuous data: P(X between 2 values); P(more than X); P(less than X) l-0. Constructa90% confidence intervalforthe population parameterfromyourdata (includethecriticalvalue that would be used).

Construct a hypothesis test using your data. Here, you will be making a guess about the population parameter. Show both the traditional method (give the criticalvalue) and p-value method. Use a level of significance of 0.05. NOTE: The final project must be neat, organized and easy to read. I will NOT accept nor grade any rough drafts or scratch work!l! lf you are familiar with computer software like Excel, you may use that for the graphs, otherwise graphs may be donebyhand.