DataExtractor
A node that extracts structured data from unstructured sources like text or files
Node Input
file
(file or file[]): File input for data extraction, which can be a single file or a list of files. This field is relevant wheninput_type
is set to “file.”text
(string or string[]): Text input for data extraction, which can be a single text string or a list of text strings. This field is relevant wheninput_type
is set to “text.”
Node Output
- The output structure is dynamically created based on the
schemas
defined in the node’s properties. Each schema item becomes an output key, named accordingly. The data types in the output match those specified in theschemas
.
Function
The DataExtractor node is designed to extract specific data fields from either text or file inputs according to a defined schema. This schema allows customization of which data elements to extract and their respective formats. The node can handle both single and multiple inputs, supporting extraction in various formats, including string, integer, and boolean.
When to Use It?
The DataExtractor node is particularly useful in scenarios such as:
- Extracting structured data from unstructured text or file inputs
- Pulling keywords, summaries, or specific data fields from documents
- Processing data for further use in automated workflows
- Handling single or batch data extraction tasks
This node works best when the schemas
are well-defined, as each schema item
specifies an output key and data type, improving the accuracy and relevance of
the extracted data.