Skip to content

Stress testing#

Stress testing is a technique used to verify the correctness of a solution by generating random inputs and checking if the solution behaves as expected.

Although the technique is often employed to prove (or disprove) the correctness of a solution, it's also often used to find tests that break a known incorrect solution.

Thus, it can be used both as a problem verification tool, but also as a testset construction tool. In this section, we'll go through the process of writing and running a stress test, and how to use this tool to improve our testset or the confidence in our solutions.

Defining a stress test#

If you haven't read the Generators section yet, please do so before continuing as generators are an essential part of the stress testing framework.

To run a stress test, we must define two expressions:

  • A generator expression: a string in a special format that describes how to generate an infinite random sequence of generator calls;
  • A finder expression: a string in a special format that describes a condition for a testcase to be considered a match for the stress test.

Generator expression#

The generator expression is a special variation of a generator call. In fact, a generator call is a valid generator expression. Below, there are a few examples of valid generator expressions for a hypothetical generator named gen that generates a random integer between 1 and N, where N is a value passed on the generator call gen N.

# A valid generator expression, but not super useful for a stress test.
# Since generators are idempotent, the testcases will always have the same number.
gen 100

# The `@` operator is replaced by a random 8-character string when evaluated.
# This will produce a different testcase each time, containing a random integer between 1 and 100.
gen 100 @

# Generates a number N between 1 and 100, and then generate a number between 1 and N.
gen [1..100] @

# Generates a number between 1 and N.max, where N.max is a variable defined for the problem.
gen <N.max> @

Thus, a generator expression supports a set of operators and, when evaluated, produces a generator call. This generator call is used to produce a testcase for the stress test.

In the table below, you can see the supported operators and their semantics.

Operator

Description

Example

@

Random 8-char string

gen 100 @

<var>

Variable defined for the problem

gen <N.max> @

[a..b]

Random integer between a and b

gen [1..100] @
gen [1..<N.max>] @

(a | b)

Random element a or b

gen (a | b) @

Finder expressions#

Finder expressions are domain-specific expressions evaluated by rbx that return a boolean value.

Instead of formally defining the grammar for finder expressions, we list a few examples, with an explanation of what they do. They should give a rough understanding of how this feature works.

# Find a test for which `sols/wa.cpp` returns any verdict considered incorrect.
# Both versions are equivalent, the first one being a shorthand for the second.
sols/wa.cpp
[sols/wa.cpp] ~ INCORRECT

# Find a test that fails one incorrect solution and a TLE solution at the same time.
[sols/wa.cpp] ~ INCORRECT && [sols/tle.cpp] ~ TLE

# Find a test that fails one incorrect solution, but do not TLE other solution at the same time.
[sols/wa.cpp] ~ INCORRECT && [sols/tle.cpp] !~ TLE
[sols/wa.cpp] ~ INCORRECT && !([sols/tle.cpp] ~ TLE)

# Find a test that fails one or the other.
[sols/wa.cpp] ~ INCORRECT || [sols/wa2.cpp] ~ INCORRECT

# Find a test where solutions give different verdicts.
[sols/sol1.cpp] != [sols/sol2.cpp]

# By using the ON syntax, specifies a custom checker to be used (instead of the main one).
[sols/wa.cpp ON custom-checker.cpp] ~ INCORRECT

# Use no checker whatsoever. Useful when you don't have a checker yet.
[sols/tle.cpp ON :nil] ~ TLE

# Use a 2-way checker. This checker will only require the input and the
# output generated by the stressed program. In place of the output of the
# main solution, an empty file will be passed.
#
# Useful if you don't have a main solution yet.
[sols/wa.cpp ON 2:my_checker.cpp] ~ INCORRECT

# Special operators:
# Find a test that breaks the main solution (here, specified by a $).
[$] ~ INCORRECT

# Find a test that breaks the main solution, using the main checker in
# a 2-way fashion.
[$ ON 2:$] ~ INCORRECT

Running a stress test#

rbx exposes an rbx stress command that can be used to run a stress test. The syntax is pretty straightforward.

rbx stress -g "<generator-expression>" -f "<finder-expression>"

By default, the stress test will be run for 10 seconds and will stop as soon as a match is found. You can tune these values with the --findings / -n and the --timeout / -t flags.

# Runs for 2 minutes or stops after finding 3 matches.
rbx stress -g "<generator-expression>" -f "<finder-expression>" -n 3 -t 120
rbx stress -g "gen 100 @" -f sols/main.cpp
rbx stress -g "gen 100 @" -f "[sols/main.cpp] ~ INCORRECT" -n 3 -t 120

The command will show a summary of what tests were found, and in case there's at least one match, it will prompt you to inform a testplan to add it to. If you skip this part, you can always copy the generator calls that were found and add them later.

Saving a stress test#

You can save a stress test in your problem.rbx.yml file.

stresses:
  - name: 'my-stress-test'
    generator:
      name: 'gen'
      args: '100 @'
    finder: '[sols/main.cpp] ~ INCORRECT'

You can then run the stress test with:

rbx stress my-stress-test

Fuzzing inputs#

You can also use the --fuzz flag to stress test using variations of the generator calls defined in your testset. This is useful to find corner cases around your existing tests.

# Fuzz all testgroups against the main solution
rbx stress --fuzz

# Fuzz specific testgroups (e.g. only 'random' and 'max')
rbx stress --fuzz-on random --fuzz-on max

rbx stress --fuzz -f sols/some-solution.cpp

When fuzzing, rbx takes the generator calls from the selected groups and appends a random suffix to them. If no finder (-f) is specified, it defaults to checking if the main solution crashes or returns an incorrect verdict.

Finding slowest tests#

You can use the --slowest flag to find strict time limit violations or simply the testcases that make your solution run the slowest. When this flag is enabled, the time limit for the solution is removed, and rbx will keep track of the slowest testcases found so far.

# Find the slowest testcases for the main solution using the given generator
rbx stress -g "gen 100 @" -f sols/main.cpp --slowest

# Find the 5 slowest testcases
rbx stress -g "gen 100 @" -f sols/main.cpp --slowest -n 5

Other applications of stress tests#

Besides using stress tests for checking solution outcomes, you can be creative and use it to test other components of your problem.

For example, you can use it to test your checkers.

# Find a test where the checker returns something different than WA,
# even though the given solution always WA.
[sols/always-wa.cpp ON custom-checker.cpp] != WA

# Compare two checkers to see if they're misbehaving.
[sols/sol.cpp ON custom-checker.cpp] != [sols/sol.cpp ON brute-force-checker.cpp]

Or you can even use it to test your validator and your interactor by simply stressing them to the limit.