Stress testing#
Stress testing is a technique used to verify the correctness of a solution by generating random inputs and checking if the solution behaves as expected.
Although the technique is often employed to prove (or disprove) the correctness of a solution, it's also often used to find tests that break a known incorrect solution.
Thus, it can be used both as a problem verification tool, but also as a testset construction tool. In this section, we'll go through the process of writing and running a stress test, and how to use this tool to improve our testset or the confidence in our solutions.
Defining a stress test#
If you haven't read the Generators section yet, please do so before continuing as generators are an essential part of the stress testing framework.
To run a stress test, we must define two expressions:
- A generator expression: a string in a special format that describes how to generate an infinite random sequence of generator calls;
- A finder expression: a string in a special format that describes a condition for a testcase to be considered a match for the stress test.
Generator expression#
The generator expression is a special variation of a generator call. In fact, a generator call
is a valid generator expression. Below, there are a few examples of valid generator expressions
for a hypothetical generator named gen
that generates a random integer between 1 and N
, where N
is a value passed on the generator call gen N
.
# A valid generator expression, but not super useful for a stress test.
# Since generators are idempotent, the testcases will always have the same number.
gen 100
# The `@` operator is replaced by a random 8-character string when evaluated.
# This will produce a different testcase each time, containing a random integer between 1 and 100.
gen 100 @
# Generates a number N between 1 and 100, and then generate a number between 1 and N.
gen [1..100] @
# Generates a number between 1 and MAX_N, where MAX_N is a variable defined for the problem.
gen <MAX_N> @
Thus, a generator expression supports a set of operators and, when evaluated, produces a generator call. This generator call is used to produce a testcase for the stress test.
In the table below, you can see the supported operators and their semantics.
Operator |
Description |
Example |
---|---|---|
|
Random 8-char string |
|
|
Variable defined for the problem |
|
|
Random integer between |
|
( |
Random element |
|
Finder expressions#
Finder expressions are domain-specific expressions evaluated by rbx that return a boolean value.
Instead of formally defining the grammar for finder expressions, we list a few examples, with an explanation of what they do. They should give a rough understanding of how this feature works.
# Find a test for which `sols/wa.cpp` returns any verdict considered incorrect.
[sols/wa.cpp] ~ INCORRECT
# Find a test that fails one incorrect solution and a TLE solution at the same time.
[sols/wa.cpp] ~ INCORRECT && [sols/tle.cpp] ~ TLE
# Find a test that fails one incorrect solution, but do not TLE other solution at the same time.
[sols/wa.cpp] ~ INCORRECT && [sols/tle.cpp] !~ TLE
[sols/wa.cpp] ~ INCORRECT && !([sols/tle.cpp] ~ TLE)
# Find a test that fails one or the other.
[sols/wa.cpp] ~ INCORRECT || [sols/wa2.cpp] ~ INCORRECT
# Find a test where solutions give different verdicts.
[sols/sol1.cpp] != [sols/sol2.cpp]
# By using the ON syntax, specifies a custom checker to be used (instead of the main one).
[sols/wa.cpp ON custom-checker.cpp] ~ INCORRECT
# Use no checker whatsoever. Useful when you don't have a checker yet.
[sols/tle.cpp ON :nil] ~ TLE
# Use a 2-way checker. This checker will only require the input and the
# output generated by the stressed program. In place of the output of the
# main solution, an empty file will be passed.
#
# Useful if you don't have a main solution yet.
[sols/wa.cpp ON 2:my_checker.cpp] ~ INCORRECT
# Special operators:
# Find a test that breaks the main solution (here, specified by a $).
[$] ~ INCORRECT
# Find a test that breaks the main solution, using the main checker in
# a 2-way fashion.
[$ ON 2:$] ~ INCORRECT
Running a stress test#
rbx exposes an rbx stress
command that can be used to run a stress test. The syntax is pretty straightforward.
By default, the stress test will be run for 10 seconds and will stop as soon as a match is found. You can tune these values
with the --findings / -n
and the --timeout / -t
flags.
# Runs for 2 minutes or stops after finding 3 matches.
rbx stress -g "<generator-expression>" -f "<finder-expression>" -n 3 -t 120
rbx stress -g "gen 100 @" -f "[sols/main.cpp] ~ INCORRECT" -n 3 -t 120
The command will show a summary of what tests were found, and in case there's at least one match, it will prompt you to inform a testplan to add it to. If you skip this part, you can always copy the generator calls that were found and add them later.
Saving a stress test#
You can save a stress test in your problem.rbx.yml
file.
stresses:
- name: 'my-stress-test'
generator:
name: 'gen'
args: '100 @'
finder: '[sols/main.cpp] ~ INCORRECT'
You can then run the stress test with:
Other applications of stress tests#
Besides using stress tests for checking solution outcomes, you can be creative and use it to test other components of your problem.
For example, you can use it to test your checkers.
# Find a test where the checker returns something different than WA,
# even though the given solution always WA.
[sols/always-wa.cpp ON custom-checker.cpp] != WA
# Compare two checkers to see if they're misbehaving.
[sols/sol.cpp ON custom-checker.cpp] != [sols/sol.cpp ON brute-force-checker.cpp]
Or you can even use it to test your validator and your interactor by simply stressing them to the limit.