CSV

The csv processor is used to parse CSVs and store them as individual fields in a document. The processor ignores empty fields. The following is the syntax for the csv processor:

{
  "csv": {
    "field": "field_name",
    "target_fields": ["field1, field2, ..."]
  }
}

Configuration parameters

The following table lists the required and optional parameters for the csv processor.

Parameter	Required	Description
`field`	Required	The name of the field that contains the data to be converted. Supports template snippets.
`target_fields`	Required	The name of the field in which to store the parsed data.
`description`	Optional	A brief description of the processor.
`empty_value`	Optional	Represents optional parameters that are not required or are not applicable.
`if`	Optional	A condition for running this processor.
`ignore_failure`	Optional	If set to `true`, failures are ignored. Default is `false`.
`ignore_missing`	Optional	If set to `true`, the processor will not fail if the field does not exist. Default is `true`.
`on_failure`	Optional	A list of processors to run if the processor fails.
`quote`	Optional	The character used to quote fields in the CSV data. Default is `"`.
`separator`	Optional	The delimiter used to separate the fields in the CSV data. Default is `,`.
`tag`	Optional	An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type.
`trim`	Optional	If set to `true`, the processor trims white space from the beginning and end of the text. Default is `false`.

Using the processor

Follow these steps to use the processor in a pipeline.

Step 1: Create a pipeline.

The following query creates a pipeline, named csv-processor, that splits resource_usage into three new fields named cpu_usage, memory_usage, and disk_usage:

PUT _ingest/pipeline/csv-processor
{
  "description": "Split resource usage into individual fields",
  "processors": [
    {
      "csv": {
        "field": "resource_usage",
        "target_fields": ["cpu_usage", "memory_usage", "disk_usage"],
        "separator": ","
      }
    }
  ]
}

Step 2 (Optional): Test the pipeline.

It is recommended that you test your pipeline before you ingest documents.

To test the pipeline, run the following query:

POST _ingest/pipeline/csv-processor/_simulate
{
  "docs": [
    {
      "_index": "testindex1",
      "_id": "1",
      "_source": {
        "resource_usage": "25,4096,10",
        "memory_usage": "4096",
        "disk_usage": "10",
        "cpu_usage": "25"
      }
    }
  ]
}

Response

The following example response confirms that the pipeline is working as expected:

{
  "docs": [
    {
      "doc": {
        "_index": "testindex1",
        "_id": "1",
        "_source": {
          "memory_usage": "4096",
          "disk_usage": "10",
          "resource_usage": "25,4096,10",
          "cpu_usage": "25"
        },
        "_ingest": {
          "timestamp": "2023-08-22T16:40:45.024796379Z"
        }
      }
    }
  ]
}

Step 3: Ingest a document.

The following query ingests a document into an index named testindex1:

PUT testindex1/_doc/1?pipeline=csv-processor
{
  "resource_usage": "25,4096,10"
}

Step 4 (Optional): Retrieve the document.

To retrieve the document, run the following query:

GET testindex1/_doc/1

Configuration parameters
Using the processor

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Want to contribute? Edit this page or create an issue.

Documentation

CSV

Configuration parameters

Using the processor

Response

OpenSearch Links

Get Involved

Resources

Connect