Aggregated data and High Dimension Anomalies.
1.1 Purpose
Homework 7 is meant to give you some practice with WLS and play around
with the high dimensional estimation problems.
Any line starting with âside noteâ is something for you to think about, but
no need to answer to it.
1.2 Weighted Least Squares
Letâs create data at the individual level according to the linear regression model
Y = Xβ + with the following specifications
⢠Let i
indâ¼ Normal(0, 20).
⢠Let the sample size, n, be 1000.
⢠Let β =
0
1.2
(the first parameter is for the constant feature).
⢠letâs assign each sample to 20 exclusive groups according to the following
multinomial distribution: P(i â Group k) â â
1
k
where k â {1, . . . , 20}
⢠Define the non-constant feature, Xi
indâ¼ Binomial(n = 100, p = 0.5), âi
Create the aggregated data by simply averaging the data points within each
group for the X and Y values. Letâs call these aggregated data values X¯
k,
and Y¯
k. Let the vectorized version of the data be denoted as Y¯ and let

X¯ =


1 X¯
1
.
.
.
.
.
.
1 X¯
20



1.2.1 Q0
TRUE/FALSE, the aggregated data will always have a higher correlation between X and Y than the correlation at the individual level. Note, ignore the
constant feature.
1.2.2 Q1
TRUE/FALSE, Y¯ = Xβ¯ + γ where γ only depends on
1.2.3 Q2
Let γ = Y¯ â Xβ¯ , TRUE/FALSE, E(γ|X¯) = 0
1.2.4 Q3
What is the analytical expression for V ar(γk|X) and Cov(γk, γm|X) where k 6=
m. Please express the solution in terms of V ar() = Ï
2 and the sample sizes of
the different groups. You should assume the group assignments are given.
1.2.5 Q4
TRUE/FALSE, using OLS on the aggregated data will produce unbiased estimates for β.
1.2.6 Q5
TRUE/FALSE, using OLS on the aggregated data vs using OLS on the individual level data will produce the same exact estimates for β.
1.2.7 Q6
If we only had access to the aggregate data, please produce the point-wise
95% confidence interval for β if we used OLS (i.e. pretend the variances are
constant) and compare that to the interval created using WLS (i.e. the correct
calculation).
1.2.8 Q7
Continuing Q6, which one would you recommend using?
1.2.9 Q8
Compute the point-wise 95% confidence interval for β using the individual level
data using OLS.
Side note, you should wonder if using the individual data is always preferable
despite the calculation from Q3.
For the following problems, letâs change the data generation process slightly:
let Xi
indâ¼ Binomial(n = 100, p =
kâ10
200 + 0.5), i.e. group 1 is distributed
according to p =
â9
200 + 0.5, group 2 has p =
â8
200 + 0.5, etc. There are still 20
groups.
Side note, you can imagine the group are different neighborhoods. X is your
parentsâ income when you were born and Y is the base salary of your first job
(all in weird units).
1.2.10 Q9
Compare the point-wise 95% confidence interval for β1 using OLS at the individual level data vs the method chosen in Q7 with the aggregate data. Which
one would you recommend?
2
1.2.11 Q10
Using the individual level data and OLS, please write the code that produces the
the point-wise 95% prediction interval for new Y values for each hypothetical
X values, 0, 1, . . . , 100. Please make the interval center the regression line. No
need to report numbers, just the code is sufficient.
Again, the prediction interval is the interval that will capture 95% of the
cases when predicting new data points.
1.2.12 Q11
For this problem, assume you only have access to the aggregate data.
Side note: if you were to create a prediction interval based on the aggregate
data, you would need X¯
new AND its corresponding group size (notice how WLS
assumes the weights are known). When you apply these intervals to individuals,
this is how ecological correlation mistakes are made.
Instead of creating an interval for Y¯
new, letâs create an interval for Ynew|{Xnew, X¯}
by computing an interval that uses V ar(Ynew â XnewβËwls|X, X ¯
new), estimates
ÏË
2 under our WLS setting, and centers XnewβËwls. Please create a plot that
compares this interval to the interval implied by your code from Q10.
Side note: you should think about whatâs specific about this set up is allowing us to do this? Is this calculation true for all WLS settings?
1.3 NOT-James-Steinâs estimator
Letâs define MSE in estimating high dimension vectors, β, using an estimate βË,
as E(kβ â βËk
2
).
1.3.1 Q12
What is the theoretical MSE if we estimated arbitrary β with the vector of 0âs?
Side note: do not overthink. This is just to show anything CAN be an estimate for anything.
1.3.2 Q13
Under the usual regression settings, create the biased estimate βË
γ = γ â βËOLS
where βËOLS is the coefficient estimate from the regression. Calculate the theoretical mean squared error for γβËOLS. Express the result in terms of γ, β, X,
and Ï
2 and simplify as much as possible.
Side note: you should know why this isnât very useful in practice because of β
and Ï
2 are unknown.
1.3.3 Q14
Let Y = Xβ + where β is the 0 vector. Let ⼠N(0, 10), n = 1000 and create
99 random features all from a uniform random variable (between 0 and 1) and
3
1 constant feature for X. Let βËOLS be the usual regression estimate. Using the result above, with your simulated X values, write the code AND report the smallest value for γ before the MSE starts to increase again.
Side note: this is intentionally similar to the simultaneous inference case.
1.3.4 Q15
To shrink a vector Z to the origin (i.e. the vector of all 0s), we can multiply Z
by γ â [0, 1). However, we can also shrink Z to arbitrary vector µ by calculating γ (Z â µ) + µ.
Same as Q14, let Y = Xβ + where β is the 0 vector. Let ⼠N(0, 10),
n = 1000 and create 99 random features all from a uniform random variable
(between 0 and 1) and 1 constant feature for X. Let βËOLS be the usual regression estimate. Shrink βËOLS towards µ = 2, i.e. a vector containing 2âs and with
γ = 0.99. Numerically approximate the MSE over 100 simulations for the shrink estimator and the OLS estimator. The report which estimator would you prefer if youâre optimizing for MSE for estimating β.
Sample Solution
The Australian Institute of Family Studies (AIFS) Child Family and Community Australia (CFCA) (201 Aggregated data and High Dimension Anomalies 4) recognized that the present way to deal with youngster insurance in Australia has recognized the jobs the state and region governments must play in shielding all kids from misuse and disregard. It likewise clarified that administrations had expected their commitment in meeting the basic formative needs all things considered, especially those youngsters whose guardians can’t or don’t give a sheltered, defensive condition or whose guardians are liable for the maltreatment or disregard these kids experienced. The Community and Disability Services Ministerial Advisory Committee (CDSMAC Aggregated data and High Dimension Anomalies ) built up the National Framework for Protecting Australia’s Children 2009’2020 (COAG 2009) which incorporated a wide scope of youngster insurance methodologies and mediations that the state and region governments will actualize to avoid kid misuse and disregard. The focal point of these blended methodologies and intercessions will be on the arrangement of essential assistance and early counteractive action programs which will be done and facilitated by non-government offices as these have gotten government financing, including the arrangement of out of home consideration administrations. All social arrangement including kid insurance approaches are the consequence of government enactment, which is the aftereffect of a political procedure, and all strategies are the aftereffect of a natural political, ideological process. As basic social laborers, it is fundamental that we are mindful, attempt to comprehend and scrutinize the changing social and political setting wherein arrangements and open intercessions are created and executed. It is imperative to be basically mindful of the impact of a prevailing ideological talk in arrangement and effectively take part in a more extensive discussion. Healey (2012) has communicated that approach may shape the limitations and the extent of the work social laborers do and sway on the potential advantages offered to you Aggregated data and High Dimension Anomalies ngsters and families. Having the option to comprehend what frameworks of thoughts work at an approach and authoritative level help us in seeing how and why government and administration react diversely in connection to youngster assurance at a specific time. This article will right off the bat give a concise chronicled diagram of the assorted ideological variables impacting kid security strategy and practice in Australia and represent its advancement. Beginning with the youngster “salvage development” and magnanimous activities in the Aggregated data and High Dimension Anomalies nineteen century to progressively current conventional measures through government arrangement and enactment. Furthermore, it will concentrate on later government youngster assurance arrangement reactions and changes, the Special Commission of Inquiry into Child Protection Services in NSW which explicitly recognized difficulties and holes in the NSW kid security framework. What’s more, made explicit change suggestions to be made and applied crosswise over NSW kid security framework through the turned out of the NSW Keep Them Safe Action Plan. Kid assurance strategy belief system We have seen various Government strategy and practice reactions to kid welfare and insurance issues bringing about noteworthy changes to kid assurance practice after some time. Ferguson (2004) states that the philosophies that support the setting of kid assurance strategy and practice have changed. In the mid ‘ 1800s, we found in Australia the foundation of intentional non â government youngster welfare segment with Christian holy places running halfway houses and care of kids in institutional settings. As indicated by Liddell (1993) this was the primary reaction particularly in NSW which experience an expansion in quantities of relinquished and ignored youngsters because of the gold rush period and the developing populace . In the late nineteenth century, we saw the foundation of a Children’s Court, the improvement of youngster assurance enactment and the ascent of what is alluded to as the “kid salvage” development (CFCA, 2015). This development supporting phi Aggregated data and High Dimension Anomalies losophy comprised of the conviction that guardians had an ethical obligation to think about their youngsters that guardians were relied upon to expect. The belief system behind this development tragically was later liable for the advancement of unfavorable intercession approaches that have gotten known as the “Taken Generations” expelling Indigenous youngsters from their families. This was an early case of arrangement going in a misguided course and a case of imperialism at work affecting on families, for this situation, Indigenous youngsters and families through enactment and strategy. By the 1950s, we started to see an alternate reaction from government accepting greater accountability and expanding its utilization of administrative capacity to authorize sufficient guidelines of care. We saw the end of numerous enormous foundations and the foundation of littler private offices for kids needing care and security (Tomison, 2001, p.48). The belief system behind this kid security changes Harris (2003) clarified that was driven by an additionally overall business talk to expand the adequacy and productivity in the arrangement of administrations . In the 1960’s we saw the ascent of what is known as the second influx of the youngster “salvage development” driven by investigate experts, for example, Dr Henry Kempe who presented the idea of the “battered-kid disorder” giving restorative proof of physical wounds of maltreatment by the family and other guardians. (CFCA, 2015). Laws likewise started to change at around this time, making it a legitimate commitment for wellbeing experts to report evident youngster misuse. (CFCA,2015). We started to see the advancement of various hypothetical models that educate the improvement regarding youngster assurance frameworks Nett and Spratt (2012). Not exclusively did generous networks felt a commitment to act and shield youngsters from misuse and disregard, yet the administration started to accept accountability to inspect evident kid misuse cases and give kid assurance administrations (Lamont and Bromfield, 2010). As indicated by Harris (2003 ) in the 1980’s and 1990’s kid insurance administrations embraced a more professional way to deal with kid security, utilizing caseworkers , creating field-tested strategies, following an administrative methodology, estimating administration yields and entering in focused offering forms. Spratt (2001) additionally recognized two other noteworthy belief systems that have affected youngster insurance change, and these are bureaucratic and technocratic philosophy. This adjustment in philosophy gave diverse youngster security work rehearses, arrangements, case the board frameworks that were more legalistic and bureaucratic and it included more layers of responsibility Howe (1992) One of the reactions of technocratic belief system is that it will in general bar different met Aggregated data and High Dimension Anomalies hods for improving the abilities of the workforce, for instance, through staff advancement activities. History embodies that kid assurance arrangement and practice change transcendently have been molded and driven by philosophy and less so by look into based proof Gray, Plath, and Webb, 2009; Sholnsky and Stern (2007). As indicated by Gillingham, (2014) the clarification for this event is that picking examination base proof over philosophy in approach and administration practice change isn’t constantly a basic and simple undertaking to accomplish and can result as history has appeared in wrong reasons for move made and lacking assistance arrangement, in any case, in spite of the fact that it’s valuable to know about the difficult idea of utilizing exploration base proof and needs, be viewed as it ought not be an obstruction for strategy creators not to utilize research base proof when making change. The Wood Report The NSW Government gave a commission for a significant investigation into the state’s youngster security framework, drove by resigned Supreme Court Judge Justice James Wood following the passing of two kids in 2007 because of misuse and disregard. The examination concentrated on the activities of the Department of Community Services ‘ now Family and Community Services (FACS) and a non-government family bolster administration. This audit inspected the accompanying: frameworks for revealing youngster misuse and disregard, the board of reports including the proficiency of frameworks and process, organizing and basic leadership, the executives of cases, recording of fundamental data, the expert limit of case managers, the ampleness of current statuary structures and obligation of compulsory correspondents, sufficient courses of action for interagency participation, the sufficiency of game plans for kids in out of home consideration, the ampleness of assets and kid insurance frameworks and different issues concurred by the Commissioner and the Minister. The request prompts suggestions, procedures for administrative, auxiliary and social change in the NSW youngster security framework. On 2008, the discoveries of this Special Commission of Inquiry into Child Protection Services in New South Wales were discharged as a three-volume report containing 111 proposals. The investigation into youngster security administrations found that interest for kid assurance administrations was being met for just a small amount of the kids announced, and that families were rejected from intercession or administration arrangement in light of the prioritization of high-hazard cases requiring critical mediation. The request noticed that those reports evaluated by FACS ‘numerous appraisals came up short on an all encompassing methodology, need meticulousness and didn’t exploit skill or data of others’. Interviews and entries from a wide scope of administrations, for example, the Association of Children’s Welfare Agencies, Department of Community Services, NSW Commission for Children and Young People, Aggregated data and High Dimension Anomalies NSW Ombudsman, The Benevolent Society, The Children’s Guardian and numerous others as a feature of the request consultative procedure. In light of Justice Wood’s report, the NSW Government built up a five-year thorough activity plan Keep Them Safe to change and improve the youngster security framework in NSW making kid assurance the>