Fuzz Testing Your R Code

Nan Xiao 2020-09-28 7 min read

Good software requires even better testing. Particularly, unit testing has been widely used by many R packages as a tool for reducing the number of bugs and improving code structure. A unit test is often written when a single unit of functionality is created in the program. Is there a good way to test a large program or system after it is created? The answer is yes, and one of the approaches people developed is fuzz testing.

Log book with computer bug. Source: National Museum of American History, accession number 1994.0191.

As the name indicates, fuzz testing focuses on revealing hidden exceptions through the automated generation of a large number of randomized inputs and feeding them to the program (law of large numbers helps). This is especially useful for validating large programs’ robustness where the computational components have complex interactions, and the edge cases are tricky to realize. I agree that this description fits the characteristics of some statistical estimation or inference procedures due to their numerical or probabilistic nature.

Example: fuzz testing oneclust

In R, there is a nice R package fuzzr written by Matthew Lincoln. The package offers an example framework for fuzz testing R code. The off-the-shelf functions primarily emphasize the unexpected or non-standard input types, while custom tests can be easily created and evaluated.

Let’s use it to test my R package oneclust released in September 2020. The package is built for maximum homogeneity clustering of one-dimensional data. Although the core is implemented in C++ (statically typed), we will see that the R interface still allows some flexibility on input types.

The core function oneclust::oneclust() has four arguments:

oneclust(x, k, w = NULL, sort = TRUE)

where x is a numeric vector representing the samples to be clustered, k is the number of clusters, w is the optional sample weights vector, and sort indicates if x (and w) should be sorted. Example:

library("oneclust")

set.seed(42)
x <- sample(c(
  rnorm(50, sd = 0.2),
  rnorm(50, mean = 1, sd = 0.3),
  rnorm(100, mean = -1, sd = 0.25)
))

oneclust(x, 3)
#> $cluster
#>   [1] 3 1 3 2 1 1 1 3 2 3 2 2 3 1 1 1 1 1 2 1 1 1 1 1 2 3 2 2 1 1 1 2
#>  [33] 1 1 1 3 1 1 3 1 3 2 1 1 3 2 3 2 1 1 3 3 1 2 3 3 1 1 1 1 3 3 1 1
#>  [65] 1 1 1 3 2 2 2 2 2 1 1 2 3 2 1 2 1 3 2 3 1 2 3 1 3 1 1 2 1 1 2 3
#>  [97] 3 1 2 3 2 3 1 1 2 1 3 1 1 1 1 3 1 1 1 1 1 3 1 2 2 1 1 2 1 1 2 2
#> [129] 2 1 2 1 2 1 3 2 2 1 3 3 2 2 2 1 1 3 1 1 3 1 2 3 2 3 1 3 1 2 1 1
#> [161] 2 3 1 2 2 3 2 1 1 3 3 1 1 1 1 3 1 3 1 3 1 2 3 2 1 3 1 1 1 1 1 1
#> [193] 1 1 1 2 3 3 1 1
#> 
#> $cut
#> [1]   1 101 152

Use fuzzr to test how argument x handles all sorts of input types:

library("fuzzr")
library("kableExtra")

f <- fuzz_function(oneclust, "x", k = 3, tests = test_all())
f |>
  as.data.frame() |>
  kbl() |>
  kable_classic(
    lightable_options = c("striped", "hover"),
    html_font = "inherit",
    font_size = 12
  )
x k output messages warnings errors result_classes results_index
char_empty 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 1
char_single 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 2
char_single_blank 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 3
char_multiple 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 4
char_multiple_blank 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 5
char_with_na 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 6
char_single_na 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 7
char_all_na 3 NA NA NA Not compatible with requested type: [type=character; target=double]. NA 8
int_empty 3 NA NA NA index error NA 9
int_single 3 NA NA NA index error NA 10
int_multiple 3 NA NA NA NA list 11
int_with_na 3 NA NA NA index error NA 12
int_single_na 3 NA NA NA index error NA 13
int_all_na 3 NA NA NA index error NA 14
dbl_empty 3 NA NA NA index error NA 15
dbl_single 3 NA NA NA index error NA 16
dbl_mutliple 3 NA NA NA NA list 17
dbl_with_na 3 NA NA NA index error NA 18
dbl_single_na 3 NA NA NA index error NA 19
dbl_all_na 3 NA NA NA index error NA 20
fctr_empty 3 NA NA NA index error NA 21
fctr_single 3 NA NA NA index error NA 22
fctr_multiple 3 NA NA NA NA list 23
fctr_with_na 3 NA NA NA index error NA 24
fctr_missing_levels 3 NA NA NA NA list 25
fctr_single_na 3 NA NA NA index error NA 26
fctr_all_na 3 NA NA NA index error NA 27
lgl_empty 3 NA NA NA index error NA 28
lgl_single 3 NA NA NA index error NA 29
lgl_mutliple 3 NA NA NA NA list 30
lgl_with_na 3 NA NA NA index error NA 31
lgl_single_na 3 NA NA NA index error NA 32
lgl_all_na 3 NA NA NA index error NA 33
date_single 3 NA NA NA index error NA 34
date_multiple 3 NA NA NA NA list 35
date_with_na 3 NA NA NA index error NA 36
date_single_na 3 NA NA NA index error NA 37
date_all_na 3 NA NA NA index error NA 38
raw_empty 3 NA NA NA index error NA 39
raw_char 3 NA NA NA index error NA 40
raw_na 3 NA NA NA NA list 41
df_complete 3 NA NA NA Not compatible with requested type: [type=list; target=double]. NA 42
df_empty 3 NA NA NA Not compatible with requested type: [type=list; target=double]. NA 43
df_one_row 3 NA NA NA Not compatible with requested type: [type=list; target=double]. NA 44
df_one_col 3 NA NA NA NA list 45
df_with_na 3 NA NA NA Not compatible with requested type: [type=list; target=double]. NA 46
null_value 3 NA NA NA Not compatible with requested type: [type=NULL; target=double]. NA 47

As expected, we see that the character and data frame inputs did not go through (Not compatible with requested type: [type=…; target=double]). Inputs with empty or NA values also returned index error. The tests that returned meaningful results include int_multiple, dbl_mutliple, fctr_multiple, fctr_missing_levels, lgl_mutliple, date_multiple, raw_na, df_one_col. It was a bit surprising for me that a factor with missing levels can run here:

fuzz_call(f, x = "fctr_missing_levels")
#> $fun
#> [1] "oneclust"
#> 
#> $args
#> $args$x
#> [1] a b c
#> Levels: a b c d
#> 
#> $args$k
#> [1] 3

It means that a “character factor” like the one below can actually be clustered:

oneclust(factor(letters[1:3], levels = letters[1:4]), k = 3)
#> $cluster
#> [1] 1 2 3
#> 
#> $cut
#> [1] 1 2 3

After checking the textbooks, we know that factors in R are built on top of integer vectors, so they were probably treated like one. A deeper understanding of R’s vector types helps interpret the other results, too.

Note that we only focused on the input types here. For statistical computing, the exceptions can be caused by subtle numerical issues and strange artifacts in the data, such as the distribution shapes and outliers. In those domain-specific cases, creating your own tests or even frameworks would be helpful.