String functions in GoDataFeed rules allow you to manipulate and extract information from text data within your product catalog. They enable you to modify, combine, or split text strings to create the desired output for your feed. By utilizing these functions, you can effectively transform and refine your product data to meet specific platform requirements.
Upper case function
UPPER_CASE
The UPPER_CASE function converts all lowercase letters within a string to their uppercase equivalents.
Return value
A new string containing the uppercase version of the input string.
Example
Input: this is a sample text string
Output: THIS IS A SAMPLE TEXT STRING
Use cases
- Converting text to a standardized uppercase format.
- Creating consistent data for comparison or processing.
- Formatting text for display purposes (e.g., headings, titles).
Limitations
- The function typically only affects lowercase letters and does not modify other characters.
- The specific behavior might vary based on the programming language and its character encoding.
- For extremely long strings, the conversion process could impact performance.
Lower case function
LOWER_CASE
Functionality
The LOWER_CASE function converts all uppercase characters in a given string to lowercase characters. It leaves other characters, such as numbers, punctuation, and special characters, unchanged.
Return value
The function returns a new string containing the lowercase version of the input string. The original string remains unmodified.
Example
Input: "THIS IS A MIXED CASE STRING."
Output: "this is a mixed case string."
Use cases
- Normalizing text: Converting text to lowercase can help standardize data for comparison and analysis.
- Case-insensitive searches: When performing searches that should ignore case differences, converting both the search term and the data to lowercase can improve efficiency.
- Database queries: Many database systems offer case-insensitive search options, but converting text to lowercase before querying can sometimes improve performance.
- Text processing: Lowercasing text can be a preliminary step in various text processing tasks, such as stemming or tokenization.
Limitations
- The
LOWER_CASEfunction only affects uppercase characters. Other characters remain unchanged.
- The function may not handle all character encodings consistently, potentially leading to unexpected results for certain characters.
- Converting text to lowercase can impact readability in some cases, such as proper nouns or acronyms.
Capital case function
CAPITAL_CASE
Functionality
The CAPITAL_CASE function capitalizes the first letter of each word in a given string. It leaves other characters, such as numbers, punctuation, and special characters, unchanged.
Return value
The function returns a new string with the first letter of each word capitalized. The original string remains unmodified.
Example
Input: "this is a mixed case string"
Output: "This Is A Mixed Case String"
Use cases
- Formatting text: Capitalizing the first letter of each word is commonly used for titles, headings, and proper nouns.
- Standardizing text: Applying
CAPITAL_CASEto text can help create a consistent format for data. - Improving readability: Capitalizing the first letter of each sentence can enhance text comprehension.
Limitations
- The
CAPITAL_CASEfunction only affects the first letter of each word. Other characters remain unchanged.
- The function's definition of a "word" may vary depending on the implementation. For example, some implementations might consider hyphens or apostrophes as word boundaries.
- The function may not handle all character encodings consistently, potentially leading to unexpected results for certain characters.
- Converting text to
CAPITAL_CASEmight not be suitable for all text formats, such as code or technical documentation.
Sentence case function
SENTENCE_CASE
Functionality
The SENTENCE_CASE function capitalizes the first letter of the first word in a string and any word that follows a period (.). It leaves other characters, such as numbers, punctuation, and special characters, unchanged.
Return value
The function returns a new string with the first letter of each sentence capitalized. The original string remains unmodified.
Example
Input: "this is a mixed case string. this is another sentence."
Output: "This is a mixed case string. This is another sentence."
Use cases
- Formatting text: Sentence case is commonly used for general text, such as paragraphs and body copy.
- Standardizing text: Applying
SENTENCE_CASEto text can help create a consistent format for documents. - Improving readability: Capitalizing the first letter of each sentence can enhance text comprehension.
Limitations
- The
SENTENCE_CASEfunction only affects the first letter of the first word and words following periods. Other characters remain unchanged.
- The function's definition of a "sentence" is based solely on the presence of periods. Other sentence-ending punctuation (e.g., question marks, exclamation points) might not be considered.
- The function may not handle all character encodings consistently, potentially leading to unexpected results for certain characters.
- Converting text to
SENTENCE_CASEmight not be suitable for all text formats, such as code or technical documentation.
Split function
SPLIT
Functionality
The SPLIT function divides a string into an array of substrings based on a specified delimiter. It searches for occurrences of the delimiter within the string and breaks the string at those points, creating substrings.
Return value
The function returns an array of substrings generated by splitting the input string.
Example
Input: "This,is,a,string,with,commas"
Delimiter: ","
Output: ["This", "is", "a", "string", "with", "commas"]
Use cases
- Parsing data: Breaking down strings into individual components, such as CSV data or delimited text files.
- Tokenization: Separating text into words or tokens for analysis or processing.
- Extracting information: Isolating specific parts of a string based on delimiters.
- Data manipulation: Transforming string data into a structured format for further operations.
Limitations
- The
SPLITfunction might produce unexpected results if the delimiter is not present in the string or if it appears at the beginning or end of the string.
- The behavior of the
SPLITfunction can vary depending on the programming language or library used, including how it handles empty substrings or multiple consecutive delimiters.
- The efficiency of the
SPLITfunction can be impacted by the length of the string and the frequency of the delimiter.
Substring function
SUBSTRING
Functionality
The SUBSTRING function extracts a portion of a string based on specified starting and ending positions. It returns a new string containing the characters within the specified range.
Return value
The function returns a new string representing the extracted substring.
Example
Input: "This is a sample string"
Start position: 5
End position: 10
Output: "is a s"
Use cases
- Extracting information: Retrieving specific parts of a string, such as a username from an email address or a product code from a description.
- Text manipulation: Modifying strings by combining substrings with other text.
- Data validation: Checking if a string contains a specific substring within a certain range.
- Text formatting: Creating formatted output by extracting and combining different parts of a string.
Limitations
- The
SUBSTRINGfunction might produce unexpected results if the start or end positions are out of bounds of the string length.
- The behavior of the
SUBSTRINGfunction can vary depending on the programming language or library used, including whether the starting position is zero-based or one-based.
- The efficiency of the
SUBSTRINGfunction can be impacted by the length of the string and the size of the extracted substring.
Substring left function
SUBSTRING_LEFT
Functionality
The SUBSTRING_LEFT function extracts a specified number of characters from the beginning of a string.
Return value
The function returns a new string containing the extracted leftmost characters.
Example
Input: "This is a sample string"
Length: 5
Output: "This "
Use cases
- Truncating text: Shortening long strings by removing characters from the end.
- Extracting prefixes: Isolating the beginning part of a string, such as a file extension or a prefix code.
- Data formatting: Creating fixed-length fields by extracting a specific number of characters from the left.
- Text manipulation: Combining substrings with other text to form new strings.
Limitations
- If the specified length is greater than the length of the string, the entire string is returned.
- The
SUBSTRING_LEFTfunction might produce unexpected results if the length is negative.
- The efficiency of the
SUBSTRING_LEFTfunction can be impacted by the length of the string and the number of characters to extract.
Substring right function
SUBSTRING_RIGHT
Functionality
The SUBSTRING_RIGHT function extracts a specified number of characters from the end of a string.
Return value
The function returns a new string containing the extracted rightmost characters.
Example
Input: "This is a sample string"
Length: 6
Output: "string"
Use cases
- Truncating text: Shortening long strings by removing characters from the beginning.
- Extracting suffixes: Isolating the ending part of a string, such as a file extension or a suffix code.
- Data formatting: Creating fixed-length fields by extracting a specific number of characters from the right.
- Text manipulation: Combining substrings with other text to form new strings.
Limitations
- If the specified length is greater than the length of the string, the entire string is returned.
- The
SUBSTRING_RIGHTfunction might produce unexpected results if the length is negative.
- The efficiency of the
SUBSTRING_RIGHTfunction can be impacted by the length of the string and the number of characters to extract.
Count length function
COUNT_LENGTH
Functionality
The COUNT_LENGTH function determines the number of characters in a given string.
Return value
The function returns an integer representing the length of the input string.
Example
Input: "This is a sample string"
Output: 21
Use cases
- Data validation: Ensuring that input strings meet specific length requirements.
- Array or list creation: Allocating memory for character arrays based on string length.
- Text formatting: Adjusting text layout based on string length.
- Performance optimization: Determining the size of data to process.
Limitations
- The
COUNT_LENGTHfunction might produce unexpected results for strings containing special characters or control characters, depending on the character encoding used.
- The efficiency of the
COUNT_LENGTHfunction can be impacted by the length of the string, especially for very long strings.
Count string function
COUNT_STRING
Functionality
The COUNT_STRING function counts the number of occurrences of a specific substring within a given string.
Return value
The function returns an integer representing the count of the substring occurrences.
Example
Input: "This is a test string with the word test repeated."
Substring: "test"
Output: 2
Use cases
- Text analysis: Counting the frequency of words or phrases in a text.
- Data validation: Checking if a substring exists within a string a certain number of times.
- Search functionality: Implementing search features that count matches.
- Text processing: Analyzing text patterns and occurrences.
Limitations
- The
COUNT_STRINGfunction might be case-sensitive, meaning it differentiates between uppercase and lowercase characters.
- The function might not consider overlapping occurrences of the substring.
- The efficiency of the
COUNT_STRINGfunction can be impacted by the length of the input string and the substring.
Format date function
FORMAT_DATE
Functionality
The FORMAT_DATE function converts a date value into a string representation based on a specified format code.
Return value
The function returns a string representing the formatted date.
Format codes
The following format codes can be used within the FORMAT_DATE function:
- FORMAT_DATE(:d): Formats the date in a short format, typically displaying the day, month, and year separated by hyphens.
- FORMAT_DATE(:D): Formats the date in a long format, displaying the day, full month name, and year.
- FORMAT_DATE(:t): Formats the time in a short format, showing hours, minutes, and seconds.
- FORMAT_DATE(:T): Formats the time in a long format, similar to the short format but used in different contexts for clarity.
- FORMAT_DATE(:f): Combines the long date format with the short time format, showing both date and time.
- FORMAT_DATE(:g): Combines the short date format with the short time format, showing both date and time in a concise manner.
- FORMAT_DATE(:M): Formats the date to display only the month and day.
- FORMAT_DATE(:r): Formats the date and time according to the RFC1123 standard, often used in internet protocols.
- FORMAT_DATE(:s): Formats the date and time in a sortable format, following the ISO 8601 standard.
- FORMAT_DATE(:u): Formats the date and time in a universal sortable format, using UTC time.
- FORMAT_DATE(:U): Combines the long date format with the long time format, showing both date and time in Universal Time (UTC).
- FORMAT_DATE(:Y): Formats the date to display only the year and full month name.
Example
| FORMAT_DATE Code | Description | Example Output | Explanation |
|---|---|---|---|
FORMAT_DATE(:d) | Short date format | 19-03-2021 | Date in a short format, with day, month, and year. |
FORMAT_DATE(:D) | Long date format | 19 March 2021 | Date with full month name, day, and year. |
FORMAT_DATE(:t) | Short time format | 06:49:20 | Time in hours, minutes, and seconds (24-hour clock). |
FORMAT_DATE(:T) | Long time format | 06:49:20 | Similar to :t, typically used in contexts needing separation from the date. |
FORMAT_DATE(:f) | Full date and short time format | 19 March 2021 06:49:00 | Combines long date with short time format. |
FORMAT_DATE(:g) | Short date and time format | 19-03-2021 06:49:44 | Combines short date and short time formats. |
FORMAT_DATE(:M) | Month and day format | March 19 | Displays only the month and the day. |
FORMAT_DATE(:r) | RFC1123 date format | Thu, 19 March 2021 06:49:22 GMT | Date and time in the RFC1123 format, used in HTTP headers. |
FORMAT_DATE(:s) | Sortable date/time format | 2021-03-19T06:49:11 | Date and time in a sortable format (ISO 8601). |
FORMAT_DATE(:u) | Universal sortable date/time format | 2021-03-19 06:49:49Z | Similar to :s but includes "Z" to denote UTC time. |
FORMAT_DATE(:U) | Full date and long time format (UTC) | 19 March 2021 00:18:55 | Long date and long time format in Universal Time (UTC). |
FORMAT_DATE(:Y) | Year and month format | March, 2021 | Displays only the year and month name. |
Use cases
- Displaying dates: Formatting dates for presentation to users in various formats (e.g., short date, long date, time).
- Data exchange: Converting dates into specific formats for data transfer or storage.
- Calculations: Performing date-based calculations after formatting dates into a standardized format.
- Data validation: Checking the validity of date formats.
Limitations
- Important: When using this function, note that it rounds decimal places, which might not be suitable for scenarios where precision is critical.
- The available format specifiers might vary depending on the programming language or library used.
- The function might have limitations in handling different date and time zones — check the feed destination for more details. For example, Google has its own required format that may need to be adjusted to produce a valid acceptable value.
- Incorrect format specifiers can lead to unexpected results or errors.
- The performance of the
FORMAT_DATEfunction can be impacted by the complexity of the format string.
Format number function
FORMAT_NUMBER
Functionality
The FORMAT_NUMBER function converts a numeric value into a string representation based on a specified format code.
Return value
The function returns a string representing the formatted number.
Format codes
The following format codes can be used within the FORMAT_NUMBER function:
- :f — Fixed-point notation with a default number of decimal places (usually 6).
- :e — Scientific notation.
- :g — General format, using either fixed-point or scientific notation based on the magnitude of the number.
- :n — Number format with grouping separators (e.g., commas for thousands).
- :00.00 — Fixed-point notation with exactly two decimal places, padded with zeros if necessary.
- :0.000 — Fixed-point notation with exactly three decimal places, padded with zeros if necessary.
- :0,0 — Number format with grouping separators and no decimal places.
- :0.0 — Fixed-point notation with exactly one decimal place, padded with zeros if necessary.
- :0% — Percentage format, multiplying the number by 100 and appending a percent sign.
Example
| Input | FORMAT_NUMBER function | Output |
|---|---|---|
| 12345.6789 | FORMAT_NUMBER(:f) | 12345.678900 |
FORMAT_NUMBER(:e) | 1.234568e+04 | |
FORMAT_NUMBER(:g) | 12345.68 | |
FORMAT_NUMBER(:n) | 12,345.68 | |
FORMAT_NUMBER(:00.00) | 12345.68 | |
FORMAT_NUMBER(:0.000) | 12345.679 | |
FORMAT_NUMBER(:0,0) | 12,346 | |
FORMAT_NUMBER(:0.0) | 12345.7 | |
FORMAT_NUMBER(:0%) | 1234567.89% |
Use cases
- Formatting numbers for display: Presenting numbers in a user-friendly format (e.g., currency, percentages).
- Data visualization: Creating charts and graphs with formatted numbers.
- Data exchange: Converting numbers into specific formats for data transfer or storage.
- Calculations: Performing calculations on formatted numbers (with caution due to potential rounding errors).
Limitations
- Important: When using this function, note that it rounds decimal places, which might not be suitable for scenarios where precision is critical.
- The available format specifiers might vary depending on the programming language or library used.
- The function might have limitations in handling very large or very small numbers.
- Incorrect format specifiers can lead to unexpected results or errors.
- The performance of the
FORMAT_NUMBERfunction can be impacted by the complexity of the format string.
Trim function
TRIM
Functionality
The TRIM function removes all leading and trailing whitespace characters from a string. Whitespace characters typically include spaces, tabs, newlines, and carriage returns.
Return value
The function returns a new string with leading and trailing whitespace characters removed. The original string remains unchanged.
Example
Input: " This is a string with leading and trailing spaces "
Output: "This is a string with leading and trailing spaces"
Use cases
- Data cleaning: Removing unwanted whitespace characters from user input or imported data.
- Text formatting: Ensuring consistent formatting by eliminating extra spaces.
- Data comparison: Comparing strings accurately by removing whitespace differences.
- String manipulation: Preparing strings for further processing, such as concatenation or splitting.
Limitations
- The
TRIMfunction only removes leading and trailing whitespace characters; it does not affect whitespace within the string.
- The specific whitespace characters considered for removal might vary depending on the programming language or library implementation.
- The
TRIMfunction might not be sufficient for complex whitespace handling scenarios, such as removing specific whitespace characters or handling different types of whitespace.
Trim left function
TRIM_LEFT
Functionality
The TRIM_LEFT function removes leading whitespace characters from a string. Whitespace characters typically include spaces, tabs, newlines, and carriage returns.
Return value
The function returns a new string with leading whitespace characters removed. The original string remains unchanged.
Example
Input: " This is a string with leading spaces"
Output: "This is a string with leading spaces"
Use cases
- Data cleaning: Removing unwanted leading whitespace characters from user input or imported data.
- Text formatting: Ensuring consistent formatting by eliminating leading spaces.
- Data comparison: Comparing strings accurately by removing leading whitespace differences.
- String manipulation: Preparing strings for further processing, such as concatenation or splitting.
Limitations
- The
TRIM_LEFTfunction only removes leading whitespace characters; it does not affect whitespace within the string or trailing whitespace.
- The specific whitespace characters considered for removal might vary depending on the programming language or library implementation.
- The
TRIM_LEFTfunction might not be sufficient for complex whitespace handling scenarios, such as removing specific whitespace characters or handling different types of whitespace.
Trim right function
TRIM_RIGHT
Functionality
The TRIM_RIGHT function removes trailing whitespace characters from a string. Whitespace characters typically include spaces, tabs, newlines, and carriage returns.
Return value
The function returns a new string with trailing whitespace characters removed. The original string remains unchanged.
Example
Input: "This is a string with trailing spaces "
Output: "This is a string with trailing spaces"
Use cases
- Data cleaning: Removing unwanted trailing whitespace characters from user input or imported data.
- Text formatting: Ensuring consistent formatting by eliminating trailing spaces.
- Data comparison: Comparing strings accurately by removing trailing whitespace differences.
- String manipulation: Preparing strings for further processing, such as concatenation or splitting.
Limitations
- The
TRIM_RIGHTfunction only removes trailing whitespace characters; it does not affect whitespace within the string or leading whitespace.
- The specific whitespace characters considered for removal might vary depending on the programming language or library implementation.
- The
TRIM_RIGHTfunction might not be sufficient for complex whitespace handling scenarios, such as removing specific whitespace characters or handling different types of whitespace.
Encode SHA256 function
ENCODE_SHA256
Functionality
The ENCODE_SHA256 function applies the SHA-256 cryptographic hash algorithm to an input string and returns a fixed-length hexadecimal string representing the hash value.
Return value
The function returns a hexadecimal string with a length of 64 characters.
Example
Input string: "This is a sample string for SHA256 encoding"
Output: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Use cases
- Data integrity verification: Ensuring data has not been tampered with by comparing the calculated hash with a stored hash.
- Password storage: Storing hashed passwords instead of plain text to protect user credentials.
- Digital signatures: Creating digital signatures to verify the authenticity and integrity of data.
- Blockchain technology: Generating unique identifiers for transactions and blocks.
Limitations
- The
ENCODE_SHA256function is a one-way function, meaning it is computationally infeasible to reverse the hash and obtain the original string.
- While SHA-256 is considered a strong cryptographic hash function, it is susceptible to brute-force attacks for short input strings.
- The function might have performance implications for large input strings due to the computational complexity of the SHA-256 algorithm.
Encode base64 function
ENCODE_BASE64
Functionality
The ENCODE_BASE64 function converts binary data into a text string using the Base64 encoding scheme. This encoding process represents binary data in an ASCII string format.
Return value
The function returns a text string representing the Base64-encoded version of the input data.
Example
Input string: "This is a sample string for Base64 encoding"
Output: "VGhpc2lzIGEgc2FtcGxlIHN0cmluZyBmb3IgQmFzZTY0IGVuY29kaW5n"
Use cases
- Data transfer: Transmitting binary data over channels that only support text (e.g., email).
- Data storage: Storing binary data in text-based formats (e.g., JSON, XML).
- Image encoding: Encoding image data for embedding in HTML or other text-based formats.
- Authentication: Encoding credentials for secure transmission.
Limitations
- Base64 encoding increases the data size by approximately 33% compared to the original binary data.
- The resulting Base64 string may contain characters that are not printable or displayable.
- Base64 encoding is not a form of encryption; it only provides a way to represent binary data as text.
- For very large data sets, Base64 encoding can be inefficient in terms of storage and transmission.
Encode URL function
ENCODE_URL
Functionality
The ENCODE_URL function converts a string into a format suitable for use in a URL. It replaces certain characters with their encoded equivalents to ensure compatibility with URL syntax.
Return value
The function returns a string representing the URL-encoded version of the input string.
Example
Input string: "This is a string with spaces and special characters: &*()"
Output: "This%20is%20a%20string%20with%20spaces%20and%20special%20characters%3A%2A%2A%28%29"
Use cases
- Building URLs: Constructing URLs with dynamic parameters or user-generated content.
- Form submission: Encoding form data before sending it to a server.
- REST API calls: Creating properly formatted query parameters for API requests.
- Data transmission: Ensuring data integrity when passing information through URLs.
Limitations
- The
ENCODE_URLfunction might not encode all special characters, depending on the implementation.
- The encoding scheme might vary between different programming languages or libraries.
- Overly long URLs might cause issues with some web servers or browsers.
- The encoded string can be less readable than the original string.
Encode HTML function
ENCODE_HTML
Functionality
The ENCODE_HTML function converts special characters in a string into their HTML entities, preventing potential cross-site scripting (XSS) attacks and ensuring correct rendering of the string within an HTML document.
Return value
The function returns a string with special characters replaced by their corresponding HTML entities.
Example
Input string: "This string contains & < > and " characters."
Output: "This string contains & < > and " characters."
Use cases
- Protecting against XSS attacks: Preventing malicious scripts from being executed by encoding user-generated content before displaying it on a web page.
- Displaying user-generated content: Safely rendering user-provided text within HTML elements.
- Data sanitization: Ensuring data integrity by preventing unexpected HTML behavior.
Limitations
- The
ENCODE_HTMLfunction might not encode all special characters, depending on the implementation.
- Overly aggressive encoding can impact readability and user experience if not used judiciously.
- The encoded string might be longer than the original string due to the added entity characters.
- While encoding helps prevent XSS, it's essential to combine it with other security measures for a comprehensive approach.
Decode base64 function
DECODE_BASE64
Functionality
The DECODE_BASE64 function converts a Base64-encoded text string back into its original binary data format. It is the inverse operation of the ENCODE_BASE64 function.
Return value
The function returns the decoded binary data as a string.
Example
Input string: "VGhpc2lzIGEgc2FtcGxlIHN0cmluZyBmb3IgQmFzZTY0IGVuY29kaW5n"
Output: "This is a sample string for Base64 encoding"
Use cases
- Receiving data: Decoding Base64-encoded data received from a network or storage.
- Image decoding: Converting Base64-encoded image data into a binary format for display.
- Data processing: Processing binary data after decoding it from Base64.
- Authentication: Decoding Base64-encoded credentials for verification.
Limitations
- The input string must be a valid Base64-encoded string. Invalid characters or padding issues will result in decoding errors.
- The decoded data might not be directly human-readable, as it represents binary data.
- For very large Base64-encoded strings, decoding can be computationally intensive.
- The
DECODE_BASE64function does not perform any cryptographic decryption; it simply reverses the Base64 encoding process.
Decode URL function
DECODE_URL
Functionality
The DECODE_URL function converts a URL-encoded string back to its original format by replacing encoded characters with their original counterparts. This is the inverse operation of the ENCODE_URL function.
Return value
The function returns a string representing the decoded version of the input string.
Example
Input string: "This%20is%20a%20string%20with%20spaces%20and%20special%20characters%3A%2A%2A%28%29"
Output: "This is a string with spaces and special characters: &*()"
Use cases
- Processing URL parameters: Extracting original values from URL-encoded query parameters.
- Handling user input: Decoding user-submitted data that might have been URL-encoded.
- Data reconstruction: Recovering original data from URL-encoded representations.
Limitations
- The
DECODE_URLfunction might not be able to decode all encoded characters, depending on the implementation.
- Incorrectly encoded strings can lead to decoding errors.
- The decoded string might contain characters that are not safe for display or further processing.
- While decoding is often necessary, it's essential to validate and sanitize decoded data before using it to prevent security vulnerabilities.
Decode HTML function
DECODE_HTML
Functionality
The DECODE_HTML function converts HTML entities within a string back to their original characters. It is the inverse operation of the ENCODE_HTML function.
Return value
The function returns a string with HTML entities replaced by their corresponding characters.
Example
Input string: "This string contains & < > and " characters."
Output: "This string contains & < > and " characters."
Use cases
- Processing HTML content: Converting HTML-encoded text back to its original format for further processing.
- Displaying decoded data: Rendering decoded HTML content on a web page without security risks.
- Data manipulation: Extracting original data from HTML-encoded strings.
Limitations
- The
DECODE_HTMLfunction might not be able to decode all HTML entities, depending on the implementation.
- Incorrectly encoded strings can lead to decoding errors.
- The decoded string might contain characters that are not safe for display or further processing.
- While decoding is often necessary, it's essential to validate and sanitize decoded data before using it to prevent security vulnerabilities.
Data select function
DATA_SELECT
Functionality
The DATA_SELECT function allows you to extract specific data from a product attribute that contains structured content formatted as JSON, XML, or HTML. It takes two arguments:
- Data type: Specifies the format of the source data —
"JSON","XML", or"HTML". - Selector: Defines the path to the desired value using dot notation.
In GoDataFeed rules, use DATA_SELECT when a catalog attribute stores nested or structured data and you need to surface a specific value as a feed field output.
Return value
The function returns a string containing the extracted data from the specified path within the structured content.
JSON Example
Syntax:
DATA_SELECT(JSON)[data.array~2.name]Source JSON:
{
"data": {
"array": [
{ "name": "Item 1" },
{ "name": "Item 2" },
{ "name": "Item 3" }
]
}
}The ~ symbol indicates an array position — array~2 selects the element at index 2. Selectors use dot notation to navigate nested levels.
Result: "Item 3"
HTML Example
Syntax:
DATA_SELECT(HTML)[html.body.table.tr~2.td~1.text.innerText]Source HTML:
<html>
<body>
<table>
<tr><th>Name</th><th>Description</th></tr>
<tr><td>Item 1</td><td>A great product</td></tr>
<tr><td>Item 2</td><td>This is the content we want to extract</td></tr>
<tr><td>Item 3</td><td>Another product</td></tr>
</table>
</body>
</html>Result: "This is the content we want to extract"
XML Example
Syntax:
DATA_SELECT(XML)[data.array.item[2].name]Uses bracket-based element indexing to navigate XML nodes and retrieve the value of a named element.
Use cases
- Data extraction: Isolating specific fields from a structured attribute in your product catalog, such as pulling a weight value from a nested JSON specifications field.
- Data filtering: Selecting values based on specific criteria within structured data.
- Data transformation: Mapping a deeply nested attribute value to a flat feed field required by a channel.
- Data analysis: Preparing structured data for further rule-based processing or reporting.
Limitations
- The
DATA_SELECTfunction is limited to basic data extraction. Complex transformations or joins may require additional rule logic.
- Performance can be impacted by the size of the input string and the complexity of the selector path.
- The function may have limitations handling different character encodings or malformed structured data.
- Array indexing uses 0-based positioning with the
~notation — always verify your selector against actual data to confirm the expected output.
Regex replace function
REGEX_REPLACE
Functionality
The REGEX_REPLACE function replaces occurrences of a pattern within a string with a specified replacement string. The pattern is defined using regular expressions, which provide a powerful way to match and manipulate text. In GoDataFeed rules, use this function when you need to clean, reformat, or standardize attribute values across your catalog — for example, stripping HTML tags from a description field, removing special characters from a product ID, or reformatting phone numbers to a consistent structure.
Return value
The function returns a new string with the matched pattern replaced by the replacement string. The original string remains unchanged.
Example
Input string: "The quick brown fox jumps over the lazy dog."
Pattern: "fox" Replacement: "cat"
Output: "The quick brown cat jumps over the lazy dog."
Use cases
- Text manipulation: Replacing specific text patterns within a product attribute string.
- Data cleaning: Removing or modifying unwanted characters or patterns — for example, stripping currency symbols from price fields or removing HTML markup from description attributes.
- Data formatting: Converting text to a specific format required by a channel (e.g., standardizing phone numbers, reformatting SKUs).
- Text extraction: Isolating specific information from an attribute string based on a defined pattern.
Limitations
- The complexity of regular expressions can make them difficult to understand and write — test your pattern carefully before applying it across your full catalog.
- Inefficient regular expressions can impact performance, especially for large input strings or large catalogs.
- Different programming languages or libraries may have variations in regular expression syntax.
- Overly complex regular expressions can lead to unexpected results or errors.
Extract remove function
EXTRACT_REMOVE
Functionality
The EXTRACT_REMOVE function extracts a specific substring from a larger string and simultaneously removes the extracted portion from the original string. The function operates on a specified delimiter or pattern to identify the substring to be extracted. In GoDataFeed rules, this is useful when an attribute contains concatenated data — for example, a field that combines a product code and a description separated by a delimiter — and you need to isolate one part for a feed field while discarding the rest.
Return value
The function typically returns two values:
- The extracted substring.
- The modified original string with the extracted portion removed.
Example
Input string: "This is a sample string for extraction"
Delimiter: "is"
Output:
- Extracted substring: "is"
- Modified string: "Th a sample string for extraction"
Use cases
- Data parsing: Extracting specific information from a larger text block within a catalog attribute.
- Text manipulation: Removing parts of a string while preserving the extracted content for use in a separate feed field.
- Tokenization: Breaking a string into tokens by extracting and removing delimiters.
- Data processing: Preparing data for further analysis or processing by pulling out only the relevant portion of a compound attribute value.
Limitations
- The
EXTRACT_REMOVEfunction might have limitations in handling overlapping matches or complex extraction patterns.
- Performance can be impacted by the length of the input string and the complexity of the extraction criteria.
- The function might not handle all character encodings consistently.
- The specific behavior of the
EXTRACT_REMOVEfunction can vary depending on the programming language or library implementation.
Related to
Comments
0 comments
Please sign in to leave a comment.