Psychology (Functional Area: Quantitative Methods)

Permanent URI for this collectionhttps://hdl.handle.net/10315/30363

Browse

Now showing 1 - 18 of 18

Access status: Open Access ,
An IRT Model-Based Reliable Change Index With Empirical Priors: An Extension Using A Multiple Group Approach With Finite Sample Sizes
(2025-11-11) Campbell, Sarah Grace; Robert Philip Chalmers
The reliable change index (RCI; Jacobson & Truax, 1991) is a popular tool for assessing whether individuals have changed between treatments. Recently, an Item Response Theory (IRT)-based RCI that incorporates group mean information through the use of expected a posteriori (EAP) estimation has been adopted, showing promising results (Chalmers & Campbell, 2025). This paper extends the previous RCI-IRT work by (1) using finite sample sizes for model calibration and parameter estimation and (2) adopting a multiple group (MG) approach to modelling sample data. Results showed that even with slight methodological changes, the results are similar to the previous studies, in that incorporating empirical priors improves rates of detecting individual change when true change is present. Larger calibration sample size has an impact on model parameter recovery, but not person parameter recovery. Finally, results favour the use of the MG approach with EAP group-informed priors when underlying group heterogeneity is expected.
Access status: Open Access ,
Negligible Effect (Equivalence) Testing Based Procedures For Assessing Distributional Normality
(2025-07-23) Farmus, Linda Sawa Dorota; Cribbie, Robert A.
Researchers in psychology often assess whether a sample distribution is consistent with a normal (Gaussian) population distribution, typically to justify assumptions of statistical models. In Study 1, a novel negligible effect test (NET) for normality is proposed, which evaluates whether a sample distribution is similar enough to a normal distribution to be considered equivalent—i.e., the differences are negligible. The NET defines a negligible effect interval for shape coefficients, and any test statistic whose 100(1–2α)% confidence interval (CI) falls entirely within this interval supports the conclusion of approximate normality. Simulations compared the Type I error and power of traditional difference-based tests (Kolmogorov–Smirnov and Shapiro–Wilk) with the NET. In small samples, the NET had low power to detect normality, while traditional tests had low power to detect nonnormality. However, NET rarely falsely concludes normality in nonnormal distributions, even with small samples. In contrast, traditional methods often flag trivial deviations from normality in large samples, potentially leading to misleading conclusions. The NET avoids this issue by rarely rejecting approximate normality when deviations are minor and practically inconsequential. One limitation of the NET approach is reduced power when distributions are close to normal. Study 2 addressed this by improving CI estimation using bootstrap methods. Alternative CI approaches were tested, including stochastic bootstrap, parametric bootstrap, and Fisher’s r-to-z transformation. The stochastic bootstrap provided the best balance of Type I error and power, and is recommended for use with the NET-based test of normality.
Access status: Open Access ,
Can Statistical Methods Reliably Detect Fraudulent Data? Examining the Utility of P-Value Analyses, Extreme Effect Sizes, GRIM, and GRIMMER
(2025-07-23) Crone, Gabriel; Green, Christopher
Data fraud occurs when one creates fake data (i.e., fabrication) or alters real data (i.e., falsification), often to support a desired research hypothesis. It is detrimental to science and occurs frequently, making it a pressing concern. Fortunately, there exist several statistical tools to detect it. Extant research, however, is largely inconsistent regarding which tools work well,and no research examines how well they differentiate fraudulent articles (containing fake data) from legitimate controls. The present thesis investigated how well four popular methods to detectdata fraud differentiated retracted psychology articles from legitimate controls. I included themethod of extreme effect sizes, p-value analysis, GRIM, and GRIMMER. Extreme effect sizesperformed quite well: standardized effect sizes for retracted articles were noticeably larger than controls. The other methods performed at chance levels or worse. I contend that the method ofextreme effect sizes could provide valuable information during investigations of potentiallyfraudulent studies.
Access status: Open Access ,
Evaluating the Performance of Existing and Novel Equivalence Tests for Structural Equation Modeling
(2023-12-08) Beribisky, Nataly; Cribbie, Robert A.
It has been suggested that equivalence testing (otherwise known as negligible effect testing) be used to evaluate model fit within structural equation modeling (SEM). This dissertation is composed of two studies that propose novel equivalence tests based on the popular RMSEA, CFI and SRMR fit indices. Using Monte Carlo simulations, each study compares the performance of these novel tests to other existing equivalence testing-based fit indices in SEM, as well as to other methods commonly used to evaluate model fit. In each study, results indicate that equivalence tests in SEM have good Type I error control and display considerable power for detecting well-fitting models in medium to large sample sizes. At small sample sizes, relative to traditional fit indices, equivalence tests limit the chance of supporting a poorly fitting model. Both studies also present illustrative examples to demonstrate how equivalence tests can be incorporated in model fit reporting. We recommend that equivalence tests be utilized in conjunction with descriptive fit indices to provide more evidence when evaluating model fit.
Access status: Open Access ,
Regularization in Mediation Models: A Monte Carlo Simulation Comparing Different Regularization Penalties in Multiple Mediation Models
(2022-12-14) Singh, Arjunvir; Choi, Ji Yeh
The two fundamental goals in statistical learning are establishing prediction accuracy and discovering the correct set of predictors to ensure model specificity. Although the field of variable selection has made significant strides over the past decades, these methods are yet to be fully adapted to mediation models. Regularization methods that utilize the l1 penalty such as the Lasso and adaptive Lasso incorporate a small amount of controlled bias into the ordinary least squares estimates to help improve the generalizability of the estimates by significantly reducing their variance across samples. Additionally, the Lasso can perform variable selection and help achieve model selection consistency or sparsistency. Recent literature has proposed methods that have introduced regularization to mediation models. These include regularized structural equation modelling or RegSEM. The current research compares the performance of various regularization penalties such as the Lasso, adaptive Lasso, MCP and SCAD in the context of mediation models. No single regularization penalty performed optimally across all simulation conditions. Additionally, we observed disproportionate selection rates for the Lasso and SCAD penalty with alternating mediators which was indicative of disproportionate shrinkage of the a and b pathways. However, the absolute bias induced in the a and b pathways was equivalent across all samples for each penalty term. This highlights the perils of shrinking individual regression pathways instead of indirect effects as a whole. Overall, the choice of the type of regularization penalty implemented depends on the particularities of the research question.
Access status: Open Access ,
When what is wrong seems right: A Monte Carlo simulation investigating the robustness of coefficient omega to model misspecification
(2021-11-15) Bell, Stephanie Marie; Flora, David B.
Coefficient omega is a model-based reliability estimate that is unrestricted by assumptions of a unidimensional essentially tau equivalent model. Rather, omega can be adapted to suit the underlying factor structure of a given population. A Monte Carlo simulation was used to investigate the performance of unidimensional omega and omega-hierarchical under circumstances of model misspecification for high and low reliability measures and different scale lengths. In general, bias increased with the amount of unmodeled complexity (i.e. unspecified multidimensionality or error correlations). When models were misspecified, observed bias was higher when true population reliability was lower, and increased with scale length. Less variable estimates were observed when true reliability and sample size were higher.
Access status: Open Access ,
Effect Sizes for Equivalence Testing: Incorporating the Equivalence Interval
(2021-11-15) Martinez Gutierrez, Naomi; Cribbie, Robert A.
Equivalence testing (ET) is a framework to determine if an effect is small enough to be considered meaningless, wherein meaningless is expressed as an equivalence interval (EI). Although traditional effect sizes (ESs) are important accompaniments to ET, these measures exclude information about the EI. Incorporating the EI is valuable for quantifying how far the effect is from the EI bounds. An ES measure we propose is the proportional distance (PD) from an observed effect to the smallest effect that would render it meaningful. We conducted two Monte Carlo simulations to evaluate the PD when applied to (1) mean differences and (2) correlations. The coverage rate and bias of the PD were excellent within the investigated conditions. We also applied the PD to two recent psychological studies. These applied examples revealed the beneficial properties of the PD, namely its ability to supply information above and beyond other statistical tests and ESs.
Access status: Open Access ,
Contextualizing Statistical Suppression Within Pretest-Posttest Designs
(2019-11-22) Farmus, Linda Sawa Dorota; Cribbie, Robert A
Statistical suppression occurs when adjusting for a variable enhances or substantially modifies the association between a predictor and an outcome. Although many methodologists have discussed this phenomenon, very little work has examined suppression in longitudinal regression models such as the pretest-posttest design. This research addressed this gap with two separate studies. Study One was a literature review that reviewed 80 articles (i.e., those meeting the inclusion criteria) from a variety fields within psychology. Study Two was an analysis of a large longitudinal clinical dataset via 925 statistical models. Both studies revealed consistent results: in approximately 20% of instances suppression effects were observed and were attributable to the inclusion of a pretest measure. Results underscore that controlling for pretest measures when assessing change may be of value, as this may help to clarify associations between predictors and posttest outcomes.
Access status: Open Access ,
A Multi-Faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles
(2019-11-22) Beribisky, Nataly; Cribbie, Robert A
The over-reliance on null hypothesis significance testing and its accompanying tools has recently been challenged. An example of such a tool is statistical power analysis, which is used to determine how many participants are required to detect a minimally meaningful effect in the population at given levels of power and Type I error rate. To investigate how power analysis is currently used, we review the reporting of 443 power analyses in high-impact psychology journals in 2016 and 2017. We found that many pieces of information required for power analyses are not reported, and selected effect sizes are often chosen based on an inappropriate rationale. Accordingly, we argue that power analysis forces researchers to compromise in the selection of the different pieces of information. We offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as precision-based power analysis or collecting the largest sample size possible.
Access status: Open Access ,
Best Practices for Constructing Confidence Intervals for the General Linear Model Under Non-Normality
(2018-05-28) Adkins, Mark Christopher; Flora, David B.
Given the current climate surrounding the replication crisis facing scientific research, a subsequent call for methodological reform has been issued which explicates the need for a shift from null hypothesis significance testing to reporting of effect sizes and their confidence intervals (CI). However, little is known about the relative performance of CIs constructed following the application of techniques which accommodate for non-normality under the general linear model (GLM). We review these techniques of normalizing data transformations, percentile bootstrapping, bias-corrected and accelerated bootstrapping, and present results from a Monte Carlo simulation designed to evaluate CI performance based on these techniques. The effects of sample size, degree of association among predictors, number of predictors, and different non-normal error distributions were examined. Based on the performance of CIs in terms of coverage, accuracy, and efficiency, general recommendations are made regarding best practice about constructing CIs for the GLM under non-normality.
Access status: Open Access ,
Everything on the Table: Tabular, Graphic, and Interactive Approaches for Interpreting and Presenting Monte Carlo Simulation Data
(2018-05-28) Sigal, Matthew Joseph; Friendly, Michael L.
Abstract Monte Carlo simulation studies (MCSS) form a cornerstone for quantitative methods research. They are frequently used to evaluate and compare the properties of statistical methods and inform both future research and current best practices. However, the presentation of results from MCSS often leaves much to be desired, with findings typically conveyed via a series of elaborate tables from which readers are expected to derive meaning. The goal of this dissertation is to explore, summarize, and describe a framework for the presentation of MCSS, and show how modern computing and visualization techniques improve their interpretability. Chapter One describes this problem by introducing the logic of MCSS, how they are conducted, what findings typically look like, and current practices for their presentation. Chapter Two demonstrates methods for improving the display of static tabular data, specifically via formatting, effects ordering, and rotation. Chapter Three delves into semi-graphic and graphical approaches for aiding the presentation of tabular data via shaded tables, and extensions to the tableplot and the hypothesis-error plot frameworks. Chapter Four describes the use of interactive computing applets to aid the exploration of complex tabular data, and why this is an ideal approach. Throughout this work, emphasis is placed on how such techniques improve our understanding of a particular dataset or model. Claims are supported with applied demonstrations. Implementation of the ideas from each chapter have been coded within the R language for statistical computing and are available for adoption by other researchers in a dedicated package (SimDisplay). It is hoped that these ideas might enhance our understanding of how to best present MCSS findings and be drawn upon in both applied and academic environments.
Access status: Open Access ,
Evaluating Equivalence Testing Methods for Measurement Invariance
(2018-03-01) Counsell, Alyssa Leigh; Cribbie, Robert A.
Establishing measurement invariance (MI) is important to validly make group comparisons on psychological constructs of interest. MI involves a multi-stage process of determining whether the factor structure and model parameters are similar across multiple groups. The statistical methods used by most researchers for testing MI is by conducting multiple group confirmatory factor analysis models, whereby a statistically nonsignificant results in a chi square difference test or a small change in goodness of fit indices (GOFs) such as CFI or RMSEA are used to conclude invariance. Yuan and Chan (2016) proposed replacing these approaches with an equivalence test analogue of the chi square difference test (EQ). While they outline the EQ approach for MI, they recommend using an adjusted RMSEA version (EQ-A) for increased power. The current study evaluated the Type I error and power rates of the EQ and EQ-A and compare their performance to using traditional chi square difference tests and GOFs. Results demonstrate that the EQ for nested models was the only procedure that maintains empirical error rates below the nominal level. Results also highlight that the EQ requires larger sample sizes or equivalence bounds based on larger than conventional RMSEA values like .05 to ensure adequate power rates at later MI stages. Because the EQ-A test did not maintain accurate error rates, I do not recommend Yuan and Chans proposed adjustment.
Access status: Open Access ,
A More Powerful Familywise Error Control Procedure for Evaluating Mean Equivalence
(2018-03-01) Davidson, Heather Patricia; Cribbie, Robert A.
When one wishes to show that there are no meaningful differences between two or more groups, equivalence tests should be used, as a nonsignificant test of mean difference does not provide evidence regarding the equivalence of groups. When conducting all possible post-hoc pairwise comparisons, C, Caffo, Lauzon and Rohmel (2013) suggested dividing the alpha level by a correction of k2/4, where k is the number of groups to be compared, however this procedure can be conservative in some situations. This research proposes two modified stepwise procedures, based on this correction of k2/4, for controlling the familywise Type I error rate. Using a Monte Carlo simulation method, we show that, across a variety of conditions, adopting a stepwise procedure increases power, particularity when a configuration of means has greater than C - k2/4 power comparisons, while maintaining the familywise error rate at or below . Implications for psychological research and directions for future study are discussed.
Access status: Open Access ,
A Differential Response Functioning Framework for Understanding Item, Bundle, and Test Bias
(2017-07-27) Chalmers, Robert Philip Sidney; Flora, David B.
This dissertation extends the parametric sampling method and area-based statistics for differential test functioning (DTF) proposed by Chalmers, Counsell, and Flora (2016). Measures for differential item and bundle functioning are first introduced as a special case of the DTF statistics. Next, these extensions are presented in concert with the original DTF measures as a unified framework for quantifying differential response functioning (DRF) of items, bundles, and tests. To evaluate the utility of the new family of measures, the DRF framework is compared to the previously established simultaneous item bias test (SIBTEST) and differential functioning of items and tests (DFIT) frameworks. A series of Monte Carlo simulation conditions were designed to estimate the power to detect differential effects when compensatory and non-compensatory differential effects are present, as well as to evaluate Type I error control. Benefits inherent to the DRF framework are discussed, extensions are suggested, and alternative methods for generating composite-level sampling variability are presented. Finally, it is argued that the area-based measures in the DRF framework provide an intuitive and meaningful quantification of marginal and conditional response bias over and above what has been offered by the previously established statistical frameworks.
Access status: Open Access ,
Evidence-Based Recommendations of Reporting Results from Mediation Analysis: A Focus on Ease of Interpretation and Maximum Accuracy
(2016-11-25) Kim, Yoosun; Pek, Jolynn
Theoretical models of mediation are common in psychological research, but there is much variability in how results of mediation analyses are reported which could result in interpretational errors, misconceptions, and differences in reader perception and experience. The goal of this study is to develop evidence-based recommendations for reporting results of mediation to reduce objective interpretational errors and maximize readers subjective experience. These recommendations would be based on results from an experiment examining the effect of the four different forms of result reporting (text only, text and table, text and path diagram, and text, table, and path diagram) on interpretational errors and readers experience, where reader experience is composed of four constructs: perceived time, ease of understanding, satisfaction in understanding reported results, and confidence in understanding reported results. Results show that including a path diagram may benefit reducing comprehensive interpretational error and increasing positive reader experiences.
Access status: Open Access ,
Effect Sizes for Single Case Experimental Designs and Their Utility for a Meta-Analysis: A Simulation Study
(2016-09-20) Lee, Joo Ann; Flora, David B.
There has been a lack of consensus as to the optimal effect size for use in meta-analyses involving Single Case Experimental Designs (SCEDs). SCEDs are a set of experimental designs which produce data akin to short interrupted time-series, where observations may not be independent due to autocorrelation. This thesis evaluated the statistical properties of various effect sizes for a reversal ABA'B' SCED via a simulation study. Hedges, Pustejovsky, and Shadishs (2012) Standardized Mean Difference effect size (_HPS) performed best when small to moderate degrees of autocorrelation were present. Partial regression coefficients also performed relatively well in most situations. The results recommend the utilization of _HPS: besides its favorable performance, _HPS has also been designed to be comparable to group-based effect sizes (Cohens d) thus enabling the amalgamation of both SCEDs and group designs in a meta-analysis. Partial regression coefficients may also be used effectively in a meta-analysis of results from SCEDs.
Access status: Open Access ,
Equivalence Tests For Repeated Measures
(2015-12-16) Ng, Victoria Ka Yin; Cribbie, Robert A.
Equivalence tests from the null hypothesis signiﬁcance testing framework are appropriate alternatives to difference tests for demonstrating lack of difference. For determining equivalence among more than two repeated measurements, recently developed equivalence tests include the omnibus Hotelling T2, the pairwise standardized test, the pairwise unstandardized test, and the two one-sided test for negligible trend. With Monte Carlo simulations, the current research evaluated Type I error rates and power rates for these equivalence tests to inform an applied data analytic strategy. Because results suggest that there is no one statistical test that is optimal across all situations, I compare the tests’ statistical behaviours to provide guidance in test selection. Speciﬁcally, test selection should be informed by the measurement level of the repeated outcome, correlation structure, and precision.
Access status: Open Access ,
The Effects of Differential Between-Groups Skewness on Heteroscedastic, Trimmed Means, and Rank-Based Between Groups Procedures
(2015-08-28) Mills, Laura Jane; Cribbie, Robert A.
Abstract The effect of differential between-group skewness was investigated for the traditional t- and ANOVA F tests, the Welch procedure without trimming (Welch, 1938) and with trimming and Winsorized variances (Yuen, 1974), the Welch-James (James, 1951) with trimming and transformation, the Yuen procedure with bootstrapping, trimming, and transformation (Keselman, Wilcox, Othman, & Fradette, 2002), and the Welch procedure with ranked data (Zimmerman & Zumbo, 1992). Empirical Type I error and power rates for these procedures were compared under varied conditions of non-normality, heterogeneity, group size imbalance, and positive and negative pairing of variance and group size. In particular, these conditions were combined with conditions of between-group skewness that was equal, dissimilar, and dissimilar and directionally opposite. Monte Carlo simulations revealed that when skewness across groups was unequal, there were deleterious effects on Type I error and power for models with two, four, and seven groups for the traditional t-test and ANOVA F, which had unacceptable rates of Type I error and power compared to other procedures. Further, procedures that accommodate heteroscedasticity fall short compared to those that can simultaneously accommodate heterogeneity and skewness. Finally, empirical power is highest for the Welch procedure on ranked data in most data conditions. It is recommended that investigators routinely investigate their data for violations and adopt robust procedures such as the Welch test on ranks to test differences of central tendency.

Browse

Recent Submissions