Fuzz Testing Your R Code

Nan Xiao 2020-09-28 7 min read

Good software requires even better testing. Particularly, unit testing has been widely used by many R packages as a tool for reducing the number of bugs and improving code structure. A unit test is often written when a single unit of functionality is created in the program. Is there a good way to test a large program or system after it is created? The answer is yes, and one of the approaches people developed is fuzz testing.

Log book with computer bug. Source: National Museum of American History, accession number 1994.0191.

As the name indicates, fuzz testing focuses on revealing hidden exceptions through the automated generation of a large number of randomized inputs and feeding them to the program (law of large numbers helps). This is especially useful for validating large programs’ robustness where the computational components have complex interactions, and the edge cases are tricky to realize. I agree that this description fits the characteristics of some statistical estimation or inference procedures due to their numerical or probabilistic nature.

Example: fuzz testing oneclust

In R, there is a nice R package fuzzr written by Matthew Lincoln. The package offers an example framework for fuzz testing R code. The off-the-shelf functions primarily emphasize the unexpected or non-standard input types, while custom tests can be easily created and evaluated.

Let’s use it to test my R package oneclust released in September 2020. The package is built for maximum homogeneity clustering of one-dimensional data. Although the core is implemented in C++ (statically typed), we will see that the R interface still allows some flexibility on input types.

The core function oneclust::oneclust() has four arguments:

oneclust(x, k, w = NULL, sort = TRUE)

where x is a numeric vector representing the samples to be clustered, k is the number of clusters, w is the optional sample weights vector, and sort indicates if x (and w) should be sorted. Example:

library("oneclust")

set.seed(42)
x <- sample(c(
  rnorm(50, sd = 0.2),
  rnorm(50, mean = 1, sd = 0.3),
  rnorm(100, mean = -1, sd = 0.25)
))

oneclust(x, 3)
#> $cluster
#>   [1] 3 1 3 2 1 1 1 3 2 3 2 2 3 1 1 1 1 1 2 1 1 1 1 1 2 3 2 2 1 1 1 2
#>  [33] 1 1 1 3 1 1 3 1 3 2 1 1 3 2 3 2 1 1 3 3 1 2 3 3 1 1 1 1 3 3 1 1
#>  [65] 1 1 1 3 2 2 2 2 2 1 1 2 3 2 1 2 1 3 2 3 1 2 3 1 3 1 1 2 1 1 2 3
#>  [97] 3 1 2 3 2 3 1 1 2 1 3 1 1 1 1 3 1 1 1 1 1 3 1 2 2 1 1 2 1 1 2 2
#> [129] 2 1 2 1 2 1 3 2 2 1 3 3 2 2 2 1 1 3 1 1 3 1 2 3 2 3 1 3 1 2 1 1
#> [161] 2 3 1 2 2 3 2 1 1 3 3 1 1 1 1 3 1 3 1 3 1 2 3 2 1 3 1 1 1 1 1 1
#> [193] 1 1 1 2 3 3 1 1
#> 
#> $cut
#> [1]   1 101 152

Use fuzzr to test how argument x handles all sorts of input types:

library("fuzzr")
library("kableExtra")

f <- fuzz_function(oneclust, "x", k = 3, tests = test_all())
f |>
  as.data.frame() |>
  kbl() |>
  kable_classic(
    lightable_options = c("striped", "hover"),
    html_font = "inherit",
    font_size = 12
  )

x	k	output	messages	warnings	errors	result_classes	results_index
char_empty	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	1
char_single	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	2
char_single_blank	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	3
char_multiple	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	4
char_multiple_blank	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	5
char_with_na	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	6
char_single_na	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	7
char_all_na	3	NA	NA	NA	Not compatible with requested type: [type=character; target=double].	NA	8
int_empty	3	NA	NA	NA	index error	NA	9
int_single	3	NA	NA	NA	index error	NA	10
int_multiple	3	NA	NA	NA	NA	list	11
int_with_na	3	NA	NA	NA	index error	NA	12
int_single_na	3	NA	NA	NA	index error	NA	13
int_all_na	3	NA	NA	NA	index error	NA	14
dbl_empty	3	NA	NA	NA	index error	NA	15
dbl_single	3	NA	NA	NA	index error	NA	16
dbl_mutliple	3	NA	NA	NA	NA	list	17
dbl_with_na	3	NA	NA	NA	index error	NA	18
dbl_single_na	3	NA	NA	NA	index error	NA	19
dbl_all_na	3	NA	NA	NA	index error	NA	20
fctr_empty	3	NA	NA	NA	index error	NA	21
fctr_single	3	NA	NA	NA	index error	NA	22
fctr_multiple	3	NA	NA	NA	NA	list	23
fctr_with_na	3	NA	NA	NA	index error	NA	24
fctr_missing_levels	3	NA	NA	NA	NA	list	25
fctr_single_na	3	NA	NA	NA	index error	NA	26
fctr_all_na	3	NA	NA	NA	index error	NA	27
lgl_empty	3	NA	NA	NA	index error	NA	28
lgl_single	3	NA	NA	NA	index error	NA	29
lgl_mutliple	3	NA	NA	NA	NA	list	30
lgl_with_na	3	NA	NA	NA	index error	NA	31
lgl_single_na	3	NA	NA	NA	index error	NA	32
lgl_all_na	3	NA	NA	NA	index error	NA	33
date_single	3	NA	NA	NA	index error	NA	34
date_multiple	3	NA	NA	NA	NA	list	35
date_with_na	3	NA	NA	NA	index error	NA	36
date_single_na	3	NA	NA	NA	index error	NA	37
date_all_na	3	NA	NA	NA	index error	NA	38
raw_empty	3	NA	NA	NA	index error	NA	39
raw_char	3	NA	NA	NA	index error	NA	40
raw_na	3	NA	NA	NA	NA	list	41
df_complete	3	NA	NA	NA	Not compatible with requested type: [type=list; target=double].	NA	42
df_empty	3	NA	NA	NA	Not compatible with requested type: [type=list; target=double].	NA	43
df_one_row	3	NA	NA	NA	Not compatible with requested type: [type=list; target=double].	NA	44
df_one_col	3	NA	NA	NA	NA	list	45
df_with_na	3	NA	NA	NA	Not compatible with requested type: [type=list; target=double].	NA	46
null_value	3	NA	NA	NA	Not compatible with requested type: [type=NULL; target=double].	NA	47

As expected, we see that the character and data frame inputs did not go through (Not compatible with requested type: [type=…; target=double]). Inputs with empty or NA values also returned index error. The tests that returned meaningful results include int_multiple, dbl_mutliple, fctr_multiple, fctr_missing_levels, lgl_mutliple, date_multiple, raw_na, df_one_col. It was a bit surprising for me that a factor with missing levels can run here:

fuzz_call(f, x = "fctr_missing_levels")
#> $fun
#> [1] "oneclust"
#> 
#> $args
#> $args$x
#> [1] a b c
#> Levels: a b c d
#> 
#> $args$k
#> [1] 3

It means that a “character factor” like the one below can actually be clustered:

oneclust(factor(letters[1:3], levels = letters[1:4]), k = 3)
#> $cluster
#> [1] 1 2 3
#> 
#> $cut
#> [1] 1 2 3

After checking the textbooks, we know that factors in R are built on top of integer vectors, so they were probably treated like one. A deeper understanding of R’s vector types helps interpret the other results, too.

Note that we only focused on the input types here. For statistical computing, the exceptions can be caused by subtle numerical issues and strange artifacts in the data, such as the distribution shapes and outliers. In those domain-specific cases, creating your own tests or even frameworks would be helpful.