Problem
vcd::assocstats() computes association statistics (chi-squared tests, phi, contingency
coefficient, Cramer's V) for two-way tables. For tables with 3+ dimensions, it
applies itself recursively to each 2-way slice via apply(x, 3:l, FUN = assocstats),
returning a named list of assocstats objects.
The problem is that this returned list has no class, so it prints using the
default list method, producing ugly output like the following. This is even works with 4+ table dimensions.
$`Sex:Male`
X^2 df P(> X^2)
Likelihood Ratio 166.20 9 0.00000
Pearson 106.66 9 0.00000
Phi-Coefficient : NA
Contingency Coeff.: 0.486
Cramer's V : 0.32
$`Sex:Female`
X^2 df P(> X^2)
Likelihood Ratio 116.52 9 0.00000
Pearson 82.60 9 0.00000
Phi-Coefficient : NA
Contingency Coeff.: 0.454
Cramer's V : 0.292
Issues
Issues with this output:
- Uses raw
$name list notation rather than informative headings
- No overall header describing the table or the stratifying variable(s)
- No way to compare statistics across strata at a glance
- For 4+-way tables, names like
"Var3:level1|Var4:level2" are hard to parse
- No
summary() method that works across strata
Relevant source code
In vcd/R/assocstats.R, the recursive case (lines 2-14) returns a plain
named list/vector with no class:
assocstats <- function(x) {
if(!is.matrix(x)) {
l <- length(dim(x))
str <- apply(x, 3 : l, FUN = assocstats)
if (l == 3) {
names(str) <- paste(names(dimnames(x))[3], names(str), sep = ":")
} else {
dn <- dimnames(str)
dim(str) <- NULL
names(str) <-
apply(expand.grid(dn), 1,
function(x) paste(names(dn), x, sep = ":", collapse = "|"))
}
return(str)
}
# ... 2-way case returns classed "assocstats" object
}
The print.assocstats() and summary.assocstats() methods only handle the
single 2-way case. There are no methods for the list returned by the recursive case.
Proposed solution for vcdExtra
Since we don't have maintainer access to vcd, we can provide enhanced methods
in vcdExtra. There are two main components:
1. A wrapper or class for the list result
Create a small wrapper function or S3 class (e.g., "assocstats_list") with
print() and summary() methods:
# Option A: A wrapper that re-classes the vcd result
assocstats_list <- function(x) {
result <- vcd::assocstats(x)
if (is.list(result) && !inherits(result, "assocstats")) {
class(result) <- "assocstats_list"
attr(result, "dimnames_orig") <- dimnames(x)
}
result
}
Alternatively, register print and summary methods that check for an
unnamed list of assocstats objects.
2. print.assocstats_list() -- formatted stratum-by-stratum output
Replace the raw $name list notation with clear headings:
## Association Statistics for HairEyeColor
## Stratified by: Sex
--- Sex: Male ---
X^2 df P(> X^2)
Likelihood Ratio 166.20 9 0.00000
Pearson 106.66 9 0.00000
Phi-Coefficient : NA
Contingency Coeff.: 0.486
Cramer's V : 0.32
--- Sex: Female ---
X^2 df P(> X^2)
...
3. summary.assocstats_list() -- comparison table across strata
This is the most valuable addition. Collect key statistics into a data frame
for easy comparison:
## Summary of association statistics across strata
Pearson X^2 df p-value Cramer's V Contingency
Sex:Male 106.66 9 < 0.001 0.320 0.486
Sex:Female 82.60 9 < 0.001 0.292 0.454
This could return a data frame (or tibble) of class "summary.assocstats_list"
with a custom print() method for clean formatting. The data frame structure
also makes it easy for users to further process the results (e.g., plotting
Cramer's V across strata).
4. Optional: as.data.frame() method
A convenience method to extract the summary statistics as a plain data frame
for downstream use:
as.data.frame.assocstats_list <- function(x, ...) {
data.frame(
stratum = names(x),
X2_LR = sapply(x, function(a) a$chisq_tests[1, 1]),
X2_Pearson = sapply(x, function(a) a$chisq_tests[2, 1]),
df = sapply(x, function(a) a$chisq_tests[2, 2]),
p_value = sapply(x, function(a) a$chisq_tests[2, 3]),
phi = sapply(x, function(a) a$phi),
contingency = sapply(x, function(a) a$contingency),
cramer_v = sapply(x, function(a) a$cramer),
row.names = NULL
)
}
Implementation notes
- The wrapper approach avoids any conflict with
vcd since we define a new class
- All methods would be exported from
vcdExtra and documented together
- The
summary() data frame output pairs well with color_table() for visual display
- For 4+-way tables, the stratum names from
vcd (e.g., "Var3:a|Var4:b") should
be parsed into separate columns in the data frame for clarity
Problem
vcd::assocstats()computes association statistics (chi-squared tests, phi, contingencycoefficient, Cramer's V) for two-way tables. For tables with 3+ dimensions, it
applies itself recursively to each 2-way slice via
apply(x, 3:l, FUN = assocstats),returning a named list of
assocstatsobjects.The problem is that this returned list has no class, so it prints using the
default list method, producing ugly output like the following. This is even works with 4+ table dimensions.
Issues
Issues with this output:
$namelist notation rather than informative headings"Var3:level1|Var4:level2"are hard to parsesummary()method that works across strataRelevant source code
In
vcd/R/assocstats.R, the recursive case (lines 2-14) returns a plainnamed list/vector with no class:
The
print.assocstats()andsummary.assocstats()methods only handle thesingle 2-way case. There are no methods for the list returned by the recursive case.
Proposed solution for vcdExtra
Since we don't have maintainer access to
vcd, we can provide enhanced methodsin
vcdExtra. There are two main components:1. A wrapper or class for the list result
Create a small wrapper function or S3 class (e.g.,
"assocstats_list") withprint()andsummary()methods:Alternatively, register
printandsummarymethods that check for anunnamed list of
assocstatsobjects.2.
print.assocstats_list()-- formatted stratum-by-stratum outputReplace the raw
$namelist notation with clear headings:3.
summary.assocstats_list()-- comparison table across strataThis is the most valuable addition. Collect key statistics into a data frame
for easy comparison:
This could return a data frame (or tibble) of class
"summary.assocstats_list"with a custom
print()method for clean formatting. The data frame structurealso makes it easy for users to further process the results (e.g., plotting
Cramer's V across strata).
4. Optional:
as.data.frame()methodA convenience method to extract the summary statistics as a plain data frame
for downstream use:
Implementation notes
vcdsince we define a new classvcdExtraand documented togethersummary()data frame output pairs well withcolor_table()for visual displayvcd(e.g.,"Var3:a|Var4:b") shouldbe parsed into separate columns in the data frame for clarity