George Casella. University of Of the exercises in Statistical Inference, Second Edition, this manual a. f(x) is a pdf since it is positive and. ∫ ∞. − ∞. Statistical Inference-Second Edition (echecs16.infoa-Berger). Alexander Villar Espinoza. A. Villar Espinoza. Loading Preview. Sorry, preview is currently unavailable. Solutions Manual for Statistical Inference, Second Edition George Casella Roger L. Berger University of Florida North Carolina State University Damaris.
|Language:||English, Spanish, Portuguese|
|ePub File Size:||27.78 MB|
|PDF File Size:||19.12 MB|
|Distribution:||Free* [*Register to download]|
Page 1. Statistical Inference. Second Edition. George Casella. Roger L. Berger. D U X B U R Y A D V A N C E D S E R I ES. Page 2. Page 3. Page 4. Page 5. Statistical inference / George Casella, Roger L. Berger. . the basics of probability, we develop the theory of statistical inference using .. Inverted gamma pdf. George Casella. Roger l. Berger Statistical inference / George Casella, Roger L. Bergernd ed. .. Histogram of exponential pdf.
There is an obtuse pattern as to which solutions were included in this manual. We assembled all of the solutions that we had from the first edition, and filled in so that all odd-numbered problems were done. In the passage from the first to the second edition, problems were shuffled with no attention paid to numbering hence no attention paid to minimize the new effort , but rather we tried to put the problems in logical order. A major change from the first edition is the use of the computer, both symbolically through Mathematica tm and numerically using R. Some solutions are given as code in either of these lan- guages.
When this endeavor was started, we were not sure how well it would work. The final judgment of our success is, of course, left to the reader. The book is intended for first-year graduate students majoring in statistics or in a field where a statistics concentration is desirable. The prerequisite is one year of calculus. Some familiarity with matrix manipulations would be useful, but is not essential.
The book can be used for a two-semester, or three-quarter, introductory course in statistics. Chapters 5 and 6 are the first statistical chapters. Chapter 5 is transitional between probability and statistics and can be the starting point for a course in statistical theory for students with some probability background. In particular, the likelihood and invariance principles are treated in detail.
Along with the sufficiency principle, these principles, and the thinking behind them, are fundamental to total statistical understanding. Chapters represent the central core of statistical inference, estimation point and interval and hypothesis testing.
A major feature of these chapters is the division into methods of finding appropriate statistical techniques and methods of evaluating these techniques. Different concerns are important, and different rules are invoked. Of further interest may be the sections of these chapters titled Other Considerations. Here, we indicate how the rules of statistical inference may be relaxed as is done every day and still produce meaningful inferences. Many of the techniques covered in these sections are ones that are used in consulting and are helpful in analyzing and inferring from actual problems.
The final three chapters can be thought of as special topics, although we feel that some familiarity with the material is important in anyone's statistical education.
Chapter 11 deals with the analysis of variance oneway and randomized block , building the theory of the complete analysis from the more simple theory of treatment contrasts. Our experience has been that experimenters are most interested in inferences from contrasts, and using principles developed earlier, most tests and intervals can be derived from contrasts.
Finally, Chapter 12 treats the theory of regression, dealing first with simple linear regression and then covering regression with "errors in variables. As more concrete guidelines for basing a one-year course on this book, we offer the following suggestions.
There can be two distinct types of courses taught from this book. For such students we recommend covering Chapters in their entirety which should take approximately 22 weeks and spend the remaining time customizing the course with selected topics from Chapters Once the first nine chapters are covered, the material in each of the last three chapters is self-contained, and can be covered in any order.
Another type of course is "more practical. It stresses the more practical uses of statistical theory, being more concerned with understanding basic statistical concepts and deriving reasonable statistical procedures for a variety of situations, and less concerned with formal optimality investigations.
Such a course will necessarily omit a certain amount of material, but the following list of sections can be covered in a one-year course: Chapter 1 2 3 4 5 6 7 8 Sections All 2. The material in Sections The exercises have been gathered from many sources and are quite plentiful.
We feel that, perhaps, the only way to master this material is through practice, and thus we have included much opportunity to do so. The exercises are as varied as we could make them, and many of them illustrate points that are either new or complementary to the material in the text. Some exercises are even taken from research papers. It makes you feel old when you can include exercises based on papers that were new research during your own student days! Although the exercises are not subdivided like the chapters, their ordering roughly follows that of the chapter.
Subdivisions often give too many hints. As this is an introductory book with a relatively broad scope, the topics are not covered in great depth. However, we felt some obligation to guide the reader one step further in the topics that may be of interest.
Thus, we have included many references, pointing to the path to deeper understanding of any particular topic. The Encyclopedia of Statistical Sciences, edited by Kotz, Johnson, and Read, provides a fine introduction to many topics. To write this book, we have drawn o n both our past teachings and current work. The fourth moment is not easy to get, one way to do it is to get the mgf of X.
This is a lengthy calculation. The Metropolis Algorithm is used to generate variables. Among other options one can choose the variables in positions to or the ones in positions , , Now, follow the algorithm on page Chapter 6 Principles of Data Reduction 6.
This is a difficult problem. The order statistics are a minimal sufficient statistic. Second Edition e. Fix sample points x and y. The second term is constant on such an interval. This will be true for all such intervals if and only if the order statistics for x are the same as the order statistics for y.
Therefore, the order statistics are a minimal sufficient statistic. From Example 6. So, X 1 , X n is not complete. That provides the opportunity to construct an unbiased, nonzero estimator of zero. These are all location families. The last vector depends only on Z1 ,. For c , d , and e the order statistics are sufficient, so Y1 ,. For b , X 1 is sufficient.
Then the joint pdf of Y1 ,. Use Theorem 6. A minimal sufficient statistic may contain an ancillary statistic. Then Y1 and Y2 are iid and, by Theorem 2. Thus, by Theorem 3. Let M X denote the median calculated from X1 ,. This quadratic graph is a line and does not contain a two-dimensional open set. Use the same factorization as in Example 6. Thus, i Xi is a complete, sufficient statistic by Theorems 6. If the expectation exists, this is an analytic function which cannot be identically zero.
Second Edition 6. Hence the family is not complete. This is a polynomial of degree 2 in p. To make it zero for all p each coefficient must be zero. For d , the order statistics are minimal sufficient. This is a location family. Thus, by Example 6. So this sufficient statistic is not complete.
X is sufficient because it is the data. So the family is not complete. X is sufficient by Theorem 6. By Example 6. Because i Xi is a one-to-one function of log i Xi , i Xi is also a complete sufficient statistic.
From Exercise 6. So if Z1 ,. Thus, as in Exercise 6.
Do part b first showing that i Xi2 is a minimal sufficient statistic. Thus, i Xi , i Xi is a minimal sufficient statistic. By Theorem 6. Thus, X, statistic. This can be accomplished using the methods from Section 4. This is a two-to-one transformation.
From Theorem 6. By Exercise 6. Because they are independent, by Theorem 4. Thus evidence is equal whenever the likelihood functions are equal, and this follows from Formal Sufficiency and Conditionality. Equations 6. To prove the Conditionality Principle. Now consider the Formal Sufficiency Principle.
From the likelihood principle, inference about p is only through L p x. The values of the likelihood are 1, p, p2 , and p3 , and the sample size does not directly influence the inference. In each pair, Xi and Yi are independent, so W and V are independent. Hence, W, V defined as in part ii is sufficient. Thus, we do not have equivariance.
Because X1 ,. The formal structures for the problem involving X and the problem involving Y are the same.
They both concern a random sample of size n from a normal population and estimation of the mean of the population.
The distribution of X1 ,. An estimator of the form kS 2 is invariant because n n! G2 and G3 are both subgroups of G1. So invariance with respect to G1 implies invariance with respect to G2 and G3. The transformations in G2 leave the scale parameter unchanged. Second Edition An estimator of the given form is invariant if, for all a and x1 ,. These values are in the following table. But it is usually best to do as much as possible analytically, first, and perhaps reduce the complexity of the numerical P problem.
Substitute this into L. Many computer programs can be used to maximize this function. Then from Example 7. Thus k is the MLE. The roots are. Because it is the only place where the first derivative is zero, it is also a global maximum.
But if n is small, the bias is quite large. In Example 7. Usually the midpoint of this interval is taken as the MLE. This is the same as Exercise 6. This involved algebra can be found in Schwarz and Samanta This is a special case of the computation in Exercise 7. The posterior distributions are just the normalized likelihood times prior, so of course they are different.
Two answers are provided. First, use the Miscellanea: For k! So the MLEs are the same as the method of moment estimators in part a. Therefore it is maximum. By Corollary 4. Second Edition 7. Let n a, b denote the pdf of a normal distribution with mean a and variance b. This also completes part c.
We will use the results and notation from part b to do this special case. The joint density is the product of the individual densities.
Note that X 1 , X n is not a minimal sufficient statistic recall Exercise 5. Then maxi Xi is minimal sufficient. Both a and b are exponential families, and this condition is satisfied for all exponential families. Hence the estimator is unbiased. Therefore, we need to minimize i a2i , P P 2 2 P P b.
This one is real hard - it was taken from an American Statistician article, but the proof is not there. A cryptic version of the proof is in Tukey Approximate Weights, Ann. Then P! Therefore, by Theorem 7. Then here are formulas for E Y 4 and Var Y 2. Example 3. Now, using Lemma 3. Because Pi Xi is a complete sufficient statistic, Theorems 7.
Evaluating this yields! Maybe the exponential model is not a good assumption. Use the factorization in Example 6. Straightforward calculation gives: It does not fit the definition of either one. We formulate a general theorem. Arguing as in Example 6. By Theorem 5. By Theorem 7.
T is a Bernoulli random variable. Hence, n! The loss function in Example 7. Figure omitted. The second derivative is positive, so this is the mini- mum. For X Bayes estimator. The Bayes risk is infinite for both estimators. Thus we can write the estimator as! For this binomial model, S is a complete sufficient statistic. Chapter 8 Hypothesis Testing 8.
A normal approximation is also very good for this calculation. This is a fairly large value, not overwhelming evidence that the accident rate has dropped. A normal approximation with continuity correction gives a value of. The log-likelihood is! We will not use the hint, although the problem can be solved that way. Instead, make the following three transformations. So by an extension of Exercise 4. Numeric maximization could be used to compute the statistic for observed data x.
Verification that this is a maximum is lengthy. We omit it. Now, the argument proceeds as in part a.
From Exercise 7. From Example 7. We can pick the prior parameters so that the acceptance regions match in this way. For H0: The likelihood function is! Under H0 , the scale parameters of W and V are equal. Then, a simple generalization of Exercise 4. The size is. P From Corollary 8. By Corollary 8.
By Exercise 8. By the Neyman-Pearson Lemma, the most powerful test of H0: So this family has MLR. Thus the ratio is increasing in x, and the family has MLR. We will prove the result for continuous distributions. But it is also true for discrete MLR families. From Exercise 3. The family does not have MLR. This family has MLR. From part a , this ratio is increasing 0 in x.
Thus this. Thus, the family does not have MLR. Thus, the given test is UMP of its size. The test is not UMP for testing H0: Hence the ratio is not monotone. Hence the family has MLR. This is Example 8. From Theorems 5. Y1 , Yn are sufficient statistics.
So we can attempt to find a UMP test using Corollary 8. Thus the given test is UMP by Corollary 8. So these conditions are satisfied for any n. This is Exercise 3.
This is Exercise 8. We will use the equality in Exercise 3. The argument that the noncentral t has an MLR is fairly involved. It may be found in Lehmann , p. The proof that the one-sided t test is UMP unbiased is rather involved, using the bounded completeness of the normal distribution and other facts. See Chapter 5 of Lehmann for a complete treatment. Again, see Chapter 5 of Lehmann From Exercise 4. The Wi s are independent because the pairs Xi , Yi are.
The hypotheses are equivalent to Hp 0: Hence, the LRT is the two-sample t-test. The two-sample t test is UMP unbiased, but the proof is rather involved. See Chapter 5 of Lehmann So there is no evidence that the mean age differs between the core and periphery.
Using the values in Exercise 8. So the p-value is. There is no evidence that the mean age differs between the core and periphery. So there is some slight evidence that the variance differs between the core and periphery. Note, early printings had a typo with the numerator and denominator degrees of freedom switched. That is, Test 3 is unbiased. Second Edition 8. Example 8. Hence, the tests are all unbiased. This is very similar to the argument for Exercise 8.
Use Theorem 8. By Theorem 8. First calculate the posterior density. The following table illustrates this.
So the indicated equality is true. Chapter 9 Interval Estimation 9. From 7. We now must establish that this set is indeed an interval. To do this, we establish that the function on the left hand side of the inequality has only an interior maximum.
That is, it looks like an upside-down bowl. We make some further simplifications. Since this is the sign change of the derivative, the function must increase then decrease. Hence, the function is an upside-down bowl, and the set is an interval. Analogous to Example 9. Since k p is nondecreasing, this gives an upper bound on p.
This is clearly a highest density region. The interval of Typically, the second degree and n degree polynomials will not have the same roots. Therefore, the two intervals are different. Then the two intervals are the same. Using the result of Exercise 8. Using the results of Exercise 8. Second Edition 9. The interval in part a is a special case of the one in part b. Thus the interval in part a is nonoptimal. A shorter interval with confidence coefficient. Recall the Bonferroni Inequality 1.
Use the interval 9. Use the interval after 9. This will happen if the test of H0: The LRT see Example 8. There are only two discrepancies.
This is Exercise 2. The LRT statistic for H0: The values a y and b y are not expressible in closed form. This is an example of the effect of the imposition of one type of inference frequentist on another theory likelihood. For the confidence interval in Example 9. This confidence interval is The two confidence intervals are virtually the same.
The LRT method derives its interval from the test of H0: To compare the intervals we compare their lengths. We know from Example P 7. So no values of a and b will make the intervals match. To evaluate this probability we have two cases: So the conditions of Theorem 9. So moving the interval toward zero increases the probability, and it is therefore maximized by moving a all the way to zero.
Using Theorem 8. For Exercise 9. The inequality follows directly from Definition 8. The solution for the lower confidence interval is similar. Start with the hypothesis test H0: Arguing as in Example 8. The LRT of H0: Thus, the values of a and b that give the minimum length interval must satisfy this along with the probability constraint. The confidence interval, say I s2 will be unbiased if Definition 9.
Some algebra will establish! For those values of K, C 0 dominates C. Chapter 10 Asymptotic Evaluations So by Theorem Applying the formulas of Example 5. The integral of ETn2 is unbounded near zero. Then we apply Theorem 5. It is easiest to use the Mathematica code in Example A. The MLE comes from differentiating the log likelihood!
The approximate variance is quite a pain to calculate. Now using Example 5.
There are some entries that are less than one - this is due to using an approximation for the MOM variance. For part e ,verifying the bootstrap identities can involve much painful algebra, but it can be made easier if we understand what the bootstrap sample space the space of all nn bootstrap samples looks like. Given a sample x1 , x2 ,.
The first column is 9 x1 s followed by 9 x2 s followed by 9 x3 s, the second column is 3 x1 s followed by 3 x2 s followed by 3 x3 s, then repeated, etc.
The general result should now be clear. The correlation is. Here is R code R is available free at http: The output is V1 V2 V1 1. The bootstrap standard deviation is 0. The histogram looks similar to the nonparametric bootstrap histogram, displaying a skewness left. Also, the approximate pdf of r will be normal, hence symmetric. The variance of X! Write n n! Again the heavier tails favor the median. From the discussion preceding Example The other limit can be calculated in a similar manner.
One might argue that in hypothesis testing, the first one should be used, since under H0 , it provides a better estimator of variance.
If interest is in finding the confidence interval, however, we are making inference under both H0 and H1 , and the second one is preferred. Now the hypothesis is about conditional probabilities is given by H0: The information number is P i xi!
We test the equivalent hypothesis H0: The likelihood is the same as Exercise We assume that the underlying distribution is normal, and use that for all score calculations. The actual data is generated from normal, logistic, and double exponential.
The sample size is 15, we use simulations and draw 20 bootstrap samples. Boot 0. Median 0.