Good software requires even better testing. Particularly, unit testing has been widely used by many R packages as a tool for reducing the number of bugs and improving code structure. A unit test is often written when a single unit of functionality is created in the program. Is there a good way to test a large program or system after it is created? The answer is yes, and one of the approaches people developed is fuzz testing.
As the name indicates, fuzz testing focuses on revealing hidden exceptions through the automated generation of a large number of randomized inputs and feeding them to the program (law of large numbers helps). This is especially useful for validating large programs’ robustness where the computational components have complex interactions, and the edge cases are tricky to realize. I agree that this description fits the characteristics of some statistical estimation or inference procedures due to their numerical or probabilistic nature.
Example: fuzz testing oneclust
In R, there is a nice R package fuzzr written by Matthew Lincoln. The package offers an example framework for fuzz testing R code. The off-the-shelf functions primarily emphasize the unexpected or non-standard input types, while custom tests can be easily created and evaluated.
Let’s use it to test my R package oneclust released in September 2020. The package is built for maximum homogeneity clustering of one-dimensional data. Although the core is implemented in C++ (statically typed), we will see that the R interface still allows some flexibility on input types.
The core function oneclust::oneclust()
has four arguments:
oneclust(x, k, w = NULL, sort = TRUE)
where x
is a numeric vector representing the samples to be clustered, k
is the number of clusters, w
is the optional sample weights vector, and sort
indicates if x
(and w
) should be sorted. Example:
library("oneclust")
set.seed(42)
x <- sample(c(
rnorm(50, sd = 0.2),
rnorm(50, mean = 1, sd = 0.3),
rnorm(100, mean = -1, sd = 0.25)
))
oneclust(x, 3)
#> $cluster
#> [1] 3 1 3 2 1 1 1 3 2 3 2 2 3 1 1 1 1 1 2 1 1 1 1 1 2 3 2 2 1 1 1 2
#> [33] 1 1 1 3 1 1 3 1 3 2 1 1 3 2 3 2 1 1 3 3 1 2 3 3 1 1 1 1 3 3 1 1
#> [65] 1 1 1 3 2 2 2 2 2 1 1 2 3 2 1 2 1 3 2 3 1 2 3 1 3 1 1 2 1 1 2 3
#> [97] 3 1 2 3 2 3 1 1 2 1 3 1 1 1 1 3 1 1 1 1 1 3 1 2 2 1 1 2 1 1 2 2
#> [129] 2 1 2 1 2 1 3 2 2 1 3 3 2 2 2 1 1 3 1 1 3 1 2 3 2 3 1 3 1 2 1 1
#> [161] 2 3 1 2 2 3 2 1 1 3 3 1 1 1 1 3 1 3 1 3 1 2 3 2 1 3 1 1 1 1 1 1
#> [193] 1 1 1 2 3 3 1 1
#>
#> $cut
#> [1] 1 101 152
Use fuzzr
to test how argument x
handles all sorts of input types:
library("fuzzr")
library("kableExtra")
f <- fuzz_function(oneclust, "x", k = 3, tests = test_all())
f |>
as.data.frame() |>
kbl() |>
kable_classic(
lightable_options = c("striped", "hover"),
html_font = "inherit",
font_size = 12
)
x | k | output | messages | warnings | errors | result_classes | results_index |
---|---|---|---|---|---|---|---|
char_empty | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 1 |
char_single | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 2 |
char_single_blank | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 3 |
char_multiple | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 4 |
char_multiple_blank | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 5 |
char_with_na | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 6 |
char_single_na | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 7 |
char_all_na | 3 | NA | NA | NA | Not compatible with requested type: [type=character; target=double]. | NA | 8 |
int_empty | 3 | NA | NA | NA | index error | NA | 9 |
int_single | 3 | NA | NA | NA | index error | NA | 10 |
int_multiple | 3 | NA | NA | NA | NA | list | 11 |
int_with_na | 3 | NA | NA | NA | index error | NA | 12 |
int_single_na | 3 | NA | NA | NA | index error | NA | 13 |
int_all_na | 3 | NA | NA | NA | index error | NA | 14 |
dbl_empty | 3 | NA | NA | NA | index error | NA | 15 |
dbl_single | 3 | NA | NA | NA | index error | NA | 16 |
dbl_mutliple | 3 | NA | NA | NA | NA | list | 17 |
dbl_with_na | 3 | NA | NA | NA | index error | NA | 18 |
dbl_single_na | 3 | NA | NA | NA | index error | NA | 19 |
dbl_all_na | 3 | NA | NA | NA | index error | NA | 20 |
fctr_empty | 3 | NA | NA | NA | index error | NA | 21 |
fctr_single | 3 | NA | NA | NA | index error | NA | 22 |
fctr_multiple | 3 | NA | NA | NA | NA | list | 23 |
fctr_with_na | 3 | NA | NA | NA | index error | NA | 24 |
fctr_missing_levels | 3 | NA | NA | NA | NA | list | 25 |
fctr_single_na | 3 | NA | NA | NA | index error | NA | 26 |
fctr_all_na | 3 | NA | NA | NA | index error | NA | 27 |
lgl_empty | 3 | NA | NA | NA | index error | NA | 28 |
lgl_single | 3 | NA | NA | NA | index error | NA | 29 |
lgl_mutliple | 3 | NA | NA | NA | NA | list | 30 |
lgl_with_na | 3 | NA | NA | NA | index error | NA | 31 |
lgl_single_na | 3 | NA | NA | NA | index error | NA | 32 |
lgl_all_na | 3 | NA | NA | NA | index error | NA | 33 |
date_single | 3 | NA | NA | NA | index error | NA | 34 |
date_multiple | 3 | NA | NA | NA | NA | list | 35 |
date_with_na | 3 | NA | NA | NA | index error | NA | 36 |
date_single_na | 3 | NA | NA | NA | index error | NA | 37 |
date_all_na | 3 | NA | NA | NA | index error | NA | 38 |
raw_empty | 3 | NA | NA | NA | index error | NA | 39 |
raw_char | 3 | NA | NA | NA | index error | NA | 40 |
raw_na | 3 | NA | NA | NA | NA | list | 41 |
df_complete | 3 | NA | NA | NA | Not compatible with requested type: [type=list; target=double]. | NA | 42 |
df_empty | 3 | NA | NA | NA | Not compatible with requested type: [type=list; target=double]. | NA | 43 |
df_one_row | 3 | NA | NA | NA | Not compatible with requested type: [type=list; target=double]. | NA | 44 |
df_one_col | 3 | NA | NA | NA | NA | list | 45 |
df_with_na | 3 | NA | NA | NA | Not compatible with requested type: [type=list; target=double]. | NA | 46 |
null_value | 3 | NA | NA | NA | Not compatible with requested type: [type=NULL; target=double]. | NA | 47 |
As expected, we see that the character and data frame inputs did not go through (Not compatible with requested type: [type=…; target=double]). Inputs with empty or NA values also returned index error. The tests that returned meaningful results include int_multiple
, dbl_mutliple
, fctr_multiple
, fctr_missing_levels
, lgl_mutliple
, date_multiple
, raw_na
, df_one_col
. It was a bit surprising for me that a factor with missing levels can run here:
fuzz_call(f, x = "fctr_missing_levels")
#> $fun
#> [1] "oneclust"
#>
#> $args
#> $args$x
#> [1] a b c
#> Levels: a b c d
#>
#> $args$k
#> [1] 3
It means that a “character factor” like the one below can actually be clustered:
oneclust(factor(letters[1:3], levels = letters[1:4]), k = 3)
#> $cluster
#> [1] 1 2 3
#>
#> $cut
#> [1] 1 2 3
After checking the textbooks, we know that factors in R are built on top of integer vectors, so they were probably treated like one. A deeper understanding of R’s vector types helps interpret the other results, too.
Note that we only focused on the input types here. For statistical computing, the exceptions can be caused by subtle numerical issues and strange artifacts in the data, such as the distribution shapes and outliers. In those domain-specific cases, creating your own tests or even frameworks would be helpful.