Concepts
Before using OpenSearch Benchmark, familiarize yourself with the following concepts.
Core concepts and definitions
-
Workload: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using
opensearch-benchmark list workloads
or view any included workloads in the OpenSearch Benchmark Workloads repository. For more information about the elements of a workload, see Anatomy of a workload. For information about building a custom workload, see Creating custom workloads. - Pipeline: A series of steps occurring before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
from-sources
: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.from-distribution
: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.benchmark-only
: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.
- Test: A single invocation of the OpenSearch Benchmark binary.
A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:
- One or more data streams that are ingested into indexes.
- A set of queries and operations that are invoked as part of the benchmark.
Anatomy of a workload
The following example workload shows all of the essential elements needed to create a workload.json
file. You can run this workload in your own benchmark configuration to understand how all of the elements work together:
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
A workload usually includes the following elements:
- indices: Defines the relevant indexes and index templates used for the workload.
- corpora: Defines all document corpora used for the workload.
schedule
: Defines operations and the order in which the operations run inline. Alternatively, you can useoperations
to group operations and thetest_procedures
parameter to specify the order of operations.operations
: Optional. Describes which operations are available for the workload and how they are parameterized.
Indices
To create an index, specify its name
. To add definitions to your index, use the body
option and point it to the JSON file containing the index definitions. For more information, see indices.
Corpora
The corpora
element requires the name of the index containing the document corpus, for example, movies
, and a list of parameters that define the document corpora. This list includes the following parameters:
source-file
: The file name that contains the workload’s corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing abase_url
, use a compressed file format:.zip
,.bz2
,.gz
,.tar
,.tar.gz
,.tgz
, or.tar.bz2
. The compressed file must have one JSON file containing the name.document-count
: The number of documents in thesource-file
, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents.uncompressed-bytes
: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs.compressed-bytes
: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents.
Operations
The operations
element lists the OpenSearch API operations performed by the workload. For example, you can set an operation to create-index
, an index in the test cluster to which OpenSearch Benchmark can write documents. Operations are usually listed inside of schedule
.
Schedule
The schedule
element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the schedule
. The following example illustrates a schedule
with multiple operations, each defined by its operation-type
:
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
According to this schedule, the actions will run in the following order:
- The
create-index
operation creates an index. The index remains empty until thebulk
operation adds documents with benchmarked data. - The
cluster-health
operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster’s health isgreen
.- The
bulk
operation runs thebulk
API to index5000
documents simultaneously. - Before benchmarking, the workload waits until the specified
warmup-time-period
passes. In this example, the warmup period is120
seconds.
- The
- The
clients
field defines the number of clients that will run the remaining actions in the schedule concurrently. - The
search
runs amatch_all
query to match all documents after they have been indexed by thebulk
API using the 8 clients specified.- The
iterations
field indicates the number of times each client runs thesearch
operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations. - Lastly, the
target-throughput
field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, atarget-throughput
of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
- The