#### Anticipated Date of Graduation

Summer 2019

#### Document Type

Thesis

#### Degree Name

Master of Science in Mathematical Sciences

#### Department

Mathematics

#### First Advisor

Douglas Darbro

#### Abstract

The goal of this paper is to broaden general knowledge on nested data analysis, its problems of dependent data, the unit of analysis problem, and non-random one time only sampling, and look as a novice at how Hierarchical Linear Modelling, HLM, deals with this, and what the advantages of HLM, and shortfalls might be. The paper is a learning curve for the author, with observations along the way noted. That learning experience is shared, so other researchers will have a better understanding of errors arising from off-the-cuff interpretations on nested data, and have a broader view of the structural implications of nested data, shoring up a more competent overview of and adding more informed use of the statistical software that models nested data.

Nested Data is ubiquitous. Nested data analysis presents serious challenges to traditional statistical methods, and has been inadequately dealt with in many historical studies. Hierarchical Linear Modelling, HLM specifically addresses Nested data, mapping Nested effects with what are called Deviances, allowing for inference to wider populations when sample sizes are adequate, and carries an easily adaptable methodology for sub-modelling effects with more explanatory variables.

ANY data can be in some sense Nested, that is organized as sets of individual data points collected under groups. Data points inside a nest (group) often-times are not independent, the values among the set are related, and as such violate a critical assumption in statistical tests, that of independence. Determining and ascribing relations to the group level or individual level begins to get fuzzy, as a bias in a set of individuals can be mistaken for a group effect, and a biasing group effect can confound the individual effect. Like it or not, researchers will often be presented with Nested Data Sets. Most of the time, nested data is not collections of samples randomly selected into groups. Groups are presented with a pre auto-selected membership. A class of students is not ordinarily an independent sample of students. Education researchers will benefit from a familiarity of statistical property issues inherent in nested data. A basic understanding of sound modelling of nested data sets will at a minimum, steer the researcher clear of pitfalls such as the ecological fallacy, and atomistic fallacy, (explained herein), and with a modest learning curve, provide the researcher with a significantly better toolbox , Hierarchical Linear Modelling, HLM, for modelling nested data. Conventional statistical techniques, Ordinary Least Squares Regression and ANOVA, only rigorously apply to independent data-sets, and when independence of data is violated, inflated Type 1 , and other errors result, and regression coefficients are mis-represented, if the researcher disregards or is oblivious to the dependence of data.

Statistics in general is of such scope, that many researchers have not been introduced, or gained an appreciation of nested structure analysis difficulties, and HLM style methods readily available which are flexible, highly useful, and minimize errors in nested data analysis.

An actual data-set is examined with HLM and compared with traditional ANOVA, ANCOVA regressions. HLM showed no substantial improvements on analysis of this study data-set compared to ANOVA,ANCOVA results. HLM did however perform as a quality check on the traditional analysis. Regression coefficients generated both ways were very nearly the same, adding confidence to results obtained. No improvement on variance reduction was demonstrated by HLM on this data. The largest nest effect, Minority Status compared HLM with ANCOVA for the effect on Math Gain by Prescore. HLM Deviances were calculated arithmetically for School, Minority Status, and Sex. HLM is nevertheless recommended for its’ explicit and extendable equations value. This author ran r, but would recommend HLM specific software.

HLM is explained in a basic way, and the value of HLM Deviances is highlighted and related to the Linear Transform demonstrated in this paper which maps the Consensus (Total level 1 ) Line into a Group Line. Understanding the relations between the regression line types affords some reverse engineering. Note, that if (generally a good idea) it is desired to run HLM, a simple quality check is to subtract Consensus Slope and Intercepts from a Group line, and one should be able to obtain an estimate the HLM Deviances for that group line. By understanding how the Deviances relate simple 1^{st} order regression lines, you don’t need to run the software to estimate the Group Deviances! The Deviances, slope and intercept, taken together quantify a Group Effect, which is a change on a Main Effect. A Group Effect is a general line shift. By specifying two X coordinates on the Line, pre and post shift, the line change is an AREA.

A proposed new group effect metric is given, Nested Shift Line Effect…NSLE . NSLE is a measure of the line shift area. Where sample sizes are sufficient, given variances relative to effect size, this new scale-free metric NSLE can rank Nesting effects, inside the study, and can be employed on Meta-Studies. NSLE also has value as a single measure diagnostic. By comparing similar magnitude NSLE’s, ( of the same and also opposite polarity ) one can evaluate set and subset influences for common cause variables. A favorable attribute of NSLE, is that HLM is not required for its calculation. One needs only to regress a given group line, and regress the Total (Consensus) Line, fix particular X coordinates and measure the area difference between them. This can all be calculated using only particular Y values. The HLM software is important however for the sub-modeling a group effect. Calculation of NSLE Standard Error is initiated and requires better definition. Were NSLE to be adopted by other researchers, a fixed agreed upon definition of NSLE standard error is essential. This task of selecting a standard definition for NSLE Standard Error is perhaps worthwhile for future researchers to consider. NSLE did identify a school and minority effect, also picked up by ANCOVA and noted them as of similar size, and when investigated, minority ratios in schools were responsible for NSLE on school effect, high, and low, correctly identifying the schools involved, demonstrating usefulness as a diagnostic or exploratory indicator.

#### Recommended Citation

Shelton, Ralph, "An Exposition of the Hierarchical Linear Model on a Nested Data-Set with a perspective for Nested Effect discovery and modeling; A new metric for group (Nested) effect size is proposed." (2019). *Master of Science in Mathematics*. 13.

https://digitalcommons.shawnee.edu/math_etd/13