Unraveling fundamental data structure from experimental analysis
Experimental analysis often involves the analysis of groups containing varying numbers of elements; for example, a different number of units for each processing mission within each stratum. We therefore encounter objects which are like matrices, except that they are not perfect rectangular blocks; that is, they are not always “full”.
In this note, we define a new structure, called painting, which can be considered as a partially filled matrix, and seeks to formalize the array operations used in the analysis of the experiment. We then show how table notation can be used to express key equations in a variety of statistical contexts, including stratification, clustering, and sum-of-squares decomposition. Furthermore, we express these equations both in the form invariant And hint form:
- invariant notation (form without coordinates) — defined in terms of objects And the operatorsa bit like the matrix-vector product A⋅x, and
- index notation (coordinate form) – defined explicitly in terms of indexed arrays and summation of multiple indices, much like expressing the matrix-vector product as ∑ⱼAᵢⱼ xⱼ.
Outline
This article consists of four main sections:
- Review of classical notation, the advantages and disadvantages;
- Theoretical development of the Tableau Calculus;
- Application to experiments (completely randomized, block randomized, adjustment formula, cluster randomized, block cluster and ANOVA sum of squares decomposition);
- Python implementation
In experimental analysis, there are three main styles of notation commonly used:
- classical notation — the processing mission is explicitly listed: unit (THURSDAY) Describes the kth unit in the jthe stratum of Ith treatment group (see (1), (2) and (5));
- assignment notation — the allocation mechanism is treated as an independent variable, and we consider sums over quantities like ZᵢYᵢ Or Zᵢⱼ Yᵢⱼ (see (2), (3) and (4)); And