Overview

This repository contains reproducible code accompanying the book chapter “Multilevel analysis for quantitative communication scientists”. The scripts demonstrate a complete multilevel modeling workflow using simulated data to analyze cyberbullying risk among high school students (inspired by Festl et al., 2015).

What You Will Find Here

  • Variance decomposition across three hierarchical levels (students, grades, schools)
  • Hybrid multilevel modeling approach (two-level analysis with level-3 fixed effects)
  • Cluster-mean centering to avoid smushed fixed and random effects
  • Progressive model building from empty models to complex specifications
  • Random slopes and their interpretation
  • Cross-level interactions and quadratic effects
  • Pseudo R² calculations for variance explained at each level

Research Context

The example examines how individual characteristics (e.g., school lunch benefit status, permissiveness toward cyberbullying) and grade-level attributes (e.g., grade level, grade composition) relate to cyberbullying risk among high school students (grades 9-12) across multiple schools.


Requirements

Software

  • R version 4.1.0 or higher (for native pipe |> support)
  • RStudio (recommended but not required)
  • STATA 16 or higher (for STATA version)

R Packages

The script example2_simulation_models_R.R uses pacman for automatic package installation and loading. Required packages:

- lmerTest      # Multilevel modeling with Satterthwaite degrees of freedom (DDF)
- performance   # ICC calculations and model diagnostics
- ordinal       # Ordinal regression for categorical predictors

All packages will be automatically installed on first run if not already present.


Files Included

supplemental_materials/
│
├── README.Rmd                              # This file
├── example2_simulation_models_R.R          # R analysis script (generates data + fits models)
├── example2_models_STATA.do                # STATA analysis script (fits models only)
└── Example2_simulationData.csv             # Simulated dataset (generated by R script)

How to Use

Quick Start - R

Option 1: Generate data and fit models

  1. Download the R script to your working directory
  2. Open R or RStudio
  3. Run the entire script:
source("example2_simulation_models_R.R")

The script will: - Install required packages (if needed) - Generate simulated data - Fit all models progressively - Display results in the console

Option 2: Use pre-generated data

If you have downloaded the Example2_simulationData.csv file from our repository, then you just need to:

  1. Load the data into the R environment
  2. Go to Section 3 in the R script to start fitting models
  3. Skip the data generation steps
# Load pre-generated data
sim_data <- read.csv("Example2_simulationData.csv")

Quick Start - STATA

Prerequisites: You must first generate the simulated data using the R script (or downloading the Example2_simulationData.csv file).

  1. Run the R script to generate Example2_simulationData.csv (or download it directly)
  2. Download the STATA script to your working directory
  3. Open STATA
  4. Update the working directory in the script
  5. Run the script:
do "example2_models_STATA.do"

The STATA script will: - Import the CSV data - Create all necessary variables - Fit all models progressively - Display results with proper F-tests and pseudo-R² calculations

Step-by-Step Execution

For pedagogical purposes, we recommend running the scripts section by section:

In R:

# Open the script `example2_simulation_models_R.R` in RStudio
# Run each section using Ctrl+Enter (Windows) or Cmd+Enter (Mac)
# Or use the "Run" menu to execute code chunks

In STATA:

* Open the .do file
* Highlight sections and run using Ctrl+D (Windows) or Cmd+Shift+D (Mac)

Recommended workflow:

  1. Section 1: Data generation/preparation (understand the structure)
  2. Section 2: Preliminary analyses (examine variance decomposition)
  3. Section 3: Progressive model building (see how effects change)
  4. Section 4: Final model interpretation
  5. Section 5: Session info (for reproducibility)

Expected Runtime

  • Total runtime: ~1-2 minutes on standard hardware
  • Data generation (R only): ~5 seconds
  • Model fitting: ~30-60 seconds (5 models total)

Understanding the Output

Console Output Structure

The scripts produce organized output with clear headers:

========== DATA GENERATION COMPLETE ==========
Sample size: XXX students
Number of schools: 10
Number of grades: 40
Average students per grade: XX

========== STEP 1: UNCONDITIONAL MODELS ==========
[Variance decomposition results]

========== STEP 2: BUILDING MULTILEVEL MODELS ==========
--- MODEL 1: Grade Level Effects ---
[Model results]

--- MODEL 2: Add Level 1 Within-Grade Effects ---
[Model results]
...

Key Statistics to Note

For each model, pay attention to:

  1. Fixed Effects: Regression coefficients and their significance
  2. Variance Components: Random intercept, random slope, residual variance
  3. Pseudo R² measures: Proportion of variance explained at each level
  4. Likelihood Ratio Tests: Significance of random effects

Model Progression Explained

Model 1: Grade Level Only

  • Purpose: Establish baseline with level-2 predictor only
  • Predictors: Grade level (9th, 10th, 11th, 12th)
  • Random effects: Random intercept at grade level

Model 2: Add Within-Grade Effects

  • Purpose: Examine individual-level relationships
  • New predictors: School lunch benefit, individual permissiveness (grade-mean centered)
  • Key concept: Within-grade effects isolated from between-grade effects

Model 3: Add Between-Grade Effects

  • Purpose: Examine compositional/contextual effects
  • New predictors: Grade proportions of lunch benefit, grade-average permissiveness
  • Key concept: Complete disaggregation of level-1 predictors

Model 4: Add Random Slope

  • Purpose: Allow permissiveness effect to vary across grades
  • New random effect: Random slope for permissiveness
  • Key concept: Heterogeneity in within-grade relationships

Model 5: Full Model (FINAL)

  • Purpose: Explain random slope variance and test non-linear effects
  • New predictors: Cross-level interaction, quadratic term
  • Key concept: Moderating effects and curvilinear relationships

Key Methodological Concepts

Cluster-Mean Centering (CMC)

The scripts use CMC to avoid smushed effects (conflated within- and between-cluster effects):

# Level 1: Center on grade mean
L1_predictor_CMC = L1_predictor - L2_grade_mean

# Level 2: Center on school mean  
L2_grade_mean_CMC = L2_grade_mean - L3_school_mean

Why this matters: Ensures level-1 effects represent pure within-grade relationships and level-2 effects represent pure between-grade relationships.

Hybrid Two-Level Approach

Although data has three levels (students > grades > schools), we use a hybrid approach:

  • Include schools (their IDs) as fixed effects at L2 (absorbs level-3 variance in the intercept)
  • Fit two-level models with fixed and random effects (students within grades)

Rationale: School differences not of substantive interest (L3 was considered incidental, as discussed in McNeish & Wentzel, 2016), but must be controlled.

Pseudo R² Calculation

Variance explained is calculated by comparing variance components:

Pseudo_R² = (Variance_baseline - Variance_current) / Variance_baseline

Important: For random slope models, compare to the model before adding the random slope to avoid overestimating explained variance—Our understanding of pseudo-R² measures in multilevel analysis differs from that presented by other authors, such as Rights & Sterba (2023).


Data Structure

Variables in the Dataset

Student-Level (Level 1): - StudentID: Student identifier - y_CMC_Model: Cyberbullying risk score (15-45, higher = more risk), labeled as CB - L1_x1_Comp_Cat: School lunch benefit (1=Paid, 2=Reduced-price, 3=Free) - L1_x2_Comp_Round: Permissiveness toward cyberbullying (10-40)

Grade-Level (Level 2): - GradeID: Grade identifier - L2_x3_GradeLevel: Grade level (9, 10, 11, 12)

School-Level (Level 3): - SchoolID: School identifier (1-10)

Hierarchical Structure

  • 10 schools
  • 4 grades per school (9th, 10th, 11th, 12th) = 40 grades total
  • 25-30 students per grade ≈ 1,080 students total

Software-Specific Notes

R Version

  • Uses lmer() from lmerTest package
  • Variance components extracted via VarCorr()
  • Satterthwaite degrees of freedom by default
  • Pseudo R² calculated via custom function

STATA Version

  • Uses mixed command with || gradeid: notation
  • Variances stored as log(SD) in parameter estimates
  • Use small option for F-tests with Satterthwaite DF (not chi-squared)
  • Use estat icc for intraclass correlations
  • Use estat recovariance for random effects correlations

Key STATA Commands

mixed y x || group:, reml dfmethod(satterthwaite)    # Fit model
estat icc                                             # Compute ICC
testparm x1 x2 x3, small                             # F-test for multiple predictors
test x1 x2, small                                    # F-test for specific predictors
lrtest model1 model2                                 # Likelihood ratio test

Important: Always use the small option with test and testparm to get F-tests instead of chi-squared tests. This is consistent with Satterthwaite degrees of freedom.


Customization Options

Changing Sample Size

Modify the data generation parameters in the R script:

sim_data <- data_gen(
  nSchools = 10,          # Number of schools
  min_StuPerGrade = 25,   # Minimum students per grade
  max_StuPerGrade = 30    # Maximum students per grade
)

Changing Random Seed

For different random realizations:

set.seed(123)  # Change from default 777

Saving Output

Capture all output to a file:

# In R
sink("example2_output.txt")
source("example2_simulation_models_R.R")
sink()

# In STATA
log using "example2_output.log", replace
do "example2_models_STATA.do"
log close

Troubleshooting

Common Issues

Problem: Convergence warnings in R

boundary (singular) fit: see help('isSingular')

Solution: This may occur with small samples. For the simulation, this is acceptable. In real data, consider:

  • Checking predictor scaling
  • Simplifying random effects structure
  • Increasing sample size

Problem: Package installation fails in R

Error in install.packages...

Solution: Install packages manually:

install.packages(c("lmerTest", "performance", "ordinal"))

If you are working in RStudio, you can look in the toolbar for the ToolsInstall Packages. In the window that appears, type the package you need (for example, lmerTest) → Install.

Problem: STATA convergence warnings

convergence not achieved

Solution: Try simplifying the random effects structure or checking predictor scaling.

Problem: Script runs very slowly

Solution: Check system resources, reduce nSchools parameter

Problem: File path errors

file not found

Solution: - R: Use setwd("path/to/folder") or RStudio Projects - STATA: Use cd "path\to\folder" with backslashes on Windows

Getting Help

If you encounter issues:

  1. Check R version: R.version.string (must be ≥4.1.0)
  2. Check STATA version: Must be ≥16.0
  3. Update packages: update.packages(ask = FALSE) in R
  4. Check working directory: getwd() in R, pwd in STATA
  5. Restart session: Session → Restart R (in RStudio)

Interpreting Results

Within-Grade vs. Between-Grade Effects

Within-grade effect (Level 1, CMC predictor):

  • “For students in the same grade, a one-unit increase in permissiveness is associated with…”

Between-grade effect (Level 2, grade mean):

  • “Comparing grades, a one-unit increase in average permissiveness is associated with…”

Contextual effect:

  • Difference in between-grade and within-grade fixed effects. e.g., Incremental effect of grade after controlling for within-grade values of the variable of interest.
  • Can be tested using linear contrasts (contextual = between - within)

Cross-Level Interaction Interpretation

The interaction term shows moderation:

L1_x2_CMC:L2_x2_Mean_CMC

Interpretation: “The within-grade effect of individual permissiveness is stronger (or weaker) in grades with higher average permissiveness”

Random Slope Interpretation

Random slope variance (τ²U₁):

  • Quantifies how much the within-grade fixed effect varies across grades
  • Non-zero = the within-grade effect is not constant across all grades (i.e., some grades have stronger or weaker relationships between permissiveness and cyberbullying risk).
  • Can be explained (i.e., reduced) by cross-level interactions

Extensions and Modifications

Add More Predictors

Insert new predictors in the data generation section:

# Add new level-1 predictor
sim_data2$new_predictor <- rnorm(nrow(sim_data2), mean = 0, sd = 1)

# Include in model
L1_predictor + new_predictor

Test Alternative Centering

Compare CMC vs. grand-mean centering (GMC):

# Grand-mean center at constant
sim_data$L1_x2_GMC <- sim_data$L1_x2_Comp_Round - 25

Export Results

Save model results for later use:

# In R: Save final model
saveRDS(model5_full, "final_model.rds")

# Load later
model5_full <- readRDS("final_model.rds")

# In STATA: Save estimates
estimates save "Model5_Final", replace

# Load later
estimates use "Model5_Final"

Reproducibility

Session Information

The scripts automatically print session info at the end:

In R:

sessionInfo()

In STATA:

display "STATA version: " c(stata_version)
display "Date: " c(current_date)

This includes:

  • Software version
  • Platform details
  • Loaded packages/modules and versions
  • Locale information

Random Seed

Fixed at set.seed(777) in R for reproducibility. Same seed = same results.


Citation

If you use this code in your research, please cite:

Padilla, B., Dorman, E., & Hoffman, L. (2026). Multilevel analysis for quantitative
communication scientists. In Data Analysis. De Gruyter.

Additional Resources

Interested in Learning More About Multilevel Analysis?

Books:

  • Hoffman, L. (2015). Longitudinal Analysis: Modeling Within-Person Fluctuation and Change
  • Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling
  • Hox, J. J., Moerbeek, M., & Van de Schoot, R. (2017). Multilevel Analysis: Techniques and Applications

Online Resources:


Contact

For questions about this code or the methodology:

Primary Contact: Bladimir Padilla
Email:

Bug Reports: Please contact me directly with the details of the issue


License

This code is provided for educational and research purposes. You are free to use, modify, and distribute with appropriate citation.


Changelog

Version 1.2 (March 13, 2026)

  • Updated variable naming scheme (removed special characters)
  • Enhanced ICC calculations for Table 5
  • Improved documentation and comments
  • Fixed filename conventions for consistency

Version 1.1 (March 2, 2026)

  • Initial release accompanying book chapter
  • Complete workflow from data generation to interpretation
  • Five progressive models demonstrating key concepts
  • Both R and STATA versions provided

Acknowledgments

Personally, I am grateful to Professor Lesa Hoffman for her tireless work and helpful comments on this and many other papers/projects. I believe her website (https://www.lesahoffman.com/index.html) is an invaluable resource for those interested in learning about multilevel analysis and other statistical techniques.


Last Updated: 2026-03-13
Script Version: 1.2
Compatible R Versions: ≥4.1.0
Compatible STATA Versions: ≥16.0