Aggregated data and High Dimension Anomalies

Aggregated data and High Dimension Anomalies.

1.1 Purpose
Homework 7 is meant to give you some practice with WLS and play around
with the high dimensional estimation problems.
Any line starting with ”side note” is something for you to think about, but
no need to answer to it.
1.2 Weighted Least Squares
Let’s create data at the individual level according to the linear regression model
Y = Xβ + with the following specifications
• Let i
ind∼ Normal(0, 20).
• Let the sample size, n, be 1000.
• Let β =

0
1.2

(the first parameter is for the constant feature).
• let’s assign each sample to 20 exclusive groups according to the following
multinomial distribution: P(i ∈ Group k) ∝ √
1
k
where k ∈ {1, . . . , 20}
• Define the non-constant feature, Xi
ind∼ Binomial(n = 100, p = 0.5), ∀i
Create the aggregated data by simply averaging the data points within each
group for the X and Y values. Let’s call these aggregated data values X¯
k,
and Y¯
k. Let the vectorized version of the data be denoted as Y¯ and let

X¯ =


1 X¯
1
.
.
.
.
.
.
1 X¯
20



1.2.1 Q0
TRUE/FALSE, the aggregated data will always have a higher correlation between X and Y than the correlation at the individual level. Note, ignore the
constant feature.
1.2.2 Q1
TRUE/FALSE, Y¯ = Xβ¯ + γ where γ only depends on
1.2.3 Q2
Let γ = Y¯ − Xβ¯ , TRUE/FALSE, E(γ|X¯) = 0

1.2.4 Q3
What is the analytical expression for V ar(γk|X) and Cov(γk, γm|X) where k 6=
m. Please express the solution in terms of V ar() = σ
2 and the sample sizes of
the different groups. You should assume the group assignments are given.
1.2.5 Q4
TRUE/FALSE, using OLS on the aggregated data will produce unbiased estimates for β.
1.2.6 Q5
TRUE/FALSE, using OLS on the aggregated data vs using OLS on the individual level data will produce the same exact estimates for β.
1.2.7 Q6
If we only had access to the aggregate data, please produce the point-wise
95% confidence interval for β if we used OLS (i.e. pretend the variances are
constant) and compare that to the interval created using WLS (i.e. the correct
calculation).
1.2.8 Q7
Continuing Q6, which one would you recommend using?
1.2.9 Q8
Compute the point-wise 95% confidence interval for β using the individual level
data using OLS.
Side note, you should wonder if using the individual data is always preferable
despite the calculation from Q3.
For the following problems, let’s change the data generation process slightly:
let Xi
ind∼ Binomial(n = 100, p =
k−10
200 + 0.5), i.e. group 1 is distributed
according to p =
−9
200 + 0.5, group 2 has p =
−8
200 + 0.5, etc. There are still 20
groups.
Side note, you can imagine the group are different neighborhoods. X is your
parents’ income when you were born and Y is the base salary of your first job
(all in weird units).
1.2.10 Q9
Compare the point-wise 95% confidence interval for β1 using OLS at the individual level data vs the method chosen in Q7 with the aggregate data. Which
one would you recommend?
2
1.2.11 Q10
Using the individual level data and OLS, please write the code that produces the
the point-wise 95% prediction interval for new Y values for each hypothetical
X values, 0, 1, . . . , 100. Please make the interval center the regression line. No
need to report numbers, just the code is sufficient.
Again, the prediction interval is the interval that will capture 95% of the
cases when predicting new data points.
1.2.12 Q11
For this problem, assume you only have access to the aggregate data.
Side note: if you were to create a prediction interval based on the aggregate
data, you would need X¯
new AND its corresponding group size (notice how WLS
assumes the weights are known). When you apply these intervals to individuals,
this is how ecological correlation mistakes are made.
Instead of creating an interval for Y¯
new, let’s create an interval for Ynew|{Xnew, X¯}
by computing an interval that uses V ar(Ynew − Xnewβˆwls|X, X ¯
new), estimates
σˆ
2 under our WLS setting, and centers Xnewβˆwls. Please create a plot that
compares this interval to the interval implied by your code from Q10.
Side note: you should think about what’s specific about this set up is allowing us to do this? Is this calculation true for all WLS settings?
1.3 NOT-James-Stein’s estimator
Let’s define MSE in estimating high dimension vectors, β, using an estimate βˆ,
as E(kβ − βˆk
2
).
1.3.1 Q12
What is the theoretical MSE if we estimated arbitrary β with the vector of 0’s?
Side note: do not overthink. This is just to show anything CAN be an estimate for anything.
1.3.2 Q13
Under the usual regression settings, create the biased estimate βˆ
γ = γ ∗ βˆOLS
where βˆOLS is the coefficient estimate from the regression. Calculate the theoretical mean squared error for γβˆOLS. Express the result in terms of γ, β, X,
and σ
2 and simplify as much as possible.
Side note: you should know why this isn’t very useful in practice because of β
and σ
2 are unknown.
1.3.3 Q14
Let Y = Xβ + where β is the 0 vector. Let ∼ N(0, 10), n = 1000 and create
99 random features all from a uniform random variable (between 0 and 1) and
3
1 constant feature for X. Let βˆOLS be the usual regression estimate. Using the result above, with your simulated X values, write the code AND report the smallest value for γ before the MSE starts to increase again.
Side note: this is intentionally similar to the simultaneous inference case.
1.3.4 Q15
To shrink a vector Z to the origin (i.e. the vector of all 0s), we can multiply Z
by γ ∈ [0, 1). However, we can also shrink Z to arbitrary vector µ by calculating γ (Z − µ) + µ.
Same as Q14, let Y = Xβ + where β is the 0 vector. Let ∼ N(0, 10),
n = 1000 and create 99 random features all from a uniform random variable
(between 0 and 1) and 1 constant feature for X. Let βˆOLS be the usual regression estimate. Shrink βˆOLS towards µ = 2, i.e. a vector containing 2’s and with
γ = 0.99. Numerically approximate the MSE over 100 simulations for the shrink estimator and the OLS estimator. The report which estimator would you prefer if you’re optimizing for MSE for estimating β.

Aggregated data and High Dimension Anomalies

Sample Solution

 

Is this question part of your Assignment?

We can help

Our aim is to help you get A+ grades on your Coursework.

We handle assignments in a multiplicity of subject areas including Admission Essays, General Essays, Case Studies, Coursework, Dissertations, Editing, Research Papers, and Research proposals

Header Button Label: Get Started NowGet Started Header Button Label: View writing samplesView writing samples