Matching Patterns with dcbor CLI
The dcbor CLI tool includes powerful pattern matching capabilities that allow you to search for, extract, and validate specific structures within dCBOR data. This chapter introduces the dcbor match subcommand, which leverages the comprehensive pattern expression (AKA "patex") syntax of the dcbor-pattern crate to enable sophisticated data analysis and extraction workflows.
This chapter builds on the foundation established in The dcbor Command Line Tool chapter. If you haven't read that chapter yet, we recommend doing so first to familiarize yourself with the basic dcbor CLI operations.
What is Pattern Matching?
Pattern matching in the context of dCBOR allows you to:
- Find specific data structures within complex CBOR documents
- Extract values that match certain criteria
- Validate data conformance to expected patterns
- Find the paths that lead to matching values within nested structures
- Transform data by capturing and reformatting matches
The dcbor match Command
The basic syntax of the dcbor match command is:
dcbor match <PATTERN> [INPUT] [OPTIONS]
Where:
is a pattern expression (AKA "patex") written in dcbor-pattern expression syntax we'll explore in detail[INPUT]is the dCBOR data to match against (or read from stdin)[OPTIONS]control input/output formats and matching behavior
Pattern Syntax Reference
You can find a complete reference for the patex syntax in the dCBOR Expression Syntax Appendix. This appendix provides a quick reference for the patex syntax, including value patterns, structure patterns, and meta patterns we'll cover later.
Value Patterns
Value patterns are the foundation of dCBOR pattern matching. They allow you to match specific data types and exact values. Let's start with the most basic patterns and build up your understanding progressively.
Numbers
Recall that if you simply type:
dcbor 42
You get back the hex representation of the CBOR number 42:
│ 182a│ 182aIf you want the CBOR diagnostic notation, you can use the --diag option:
dcbor -o diag 42
│ 42│ 42In the examples in this chapter, the actual patex used is shown in its own block, and referred to in the command lines that follow it as $PATTERN. So when you see a block like this:
PATTERN=
numberPATTERN=
numberWhat we're hiding is that we really wrote this:
PATTERN=$(cat <<'EOF'
number
EOF
)
This little bit of heredoc awkwardness is the most reliable way to make sure everything in a pattern is assigned to a shell variable verbatim. For many patterns you won't need to use it yourself.
But if you do, now you know.
What if you have two pieces of CBOR data, and you want to check whether one of them is a number?
CBOR1=182a
CBOR2=6548656c6c6f
You can use the dcbor match command to check whether either of these is a number:
NUMBER=
numberNUMBER=
numberdcbor match $NUMBER -i hex $CBOR1
│ 42│ 42dcbor match $NUMBER -i hex $CBOR2
│ Error: No Match│ Error: No MatchWe can see that CBOR1 is the number 42, and CBOR2 is not a numeric value. So let's see whether it is a textual string by using the TEXT pattern:
TEXT=
textTEXT=
textdcbor match $TEXT -i hex $CBOR2
│ "Hello"│ "Hello"The pattern matches, and we can see it is the string "Hello".
The number pattern matches any numeric value, whether it's an integer or floating-point number:
NUMBER=
numberNUMBER=
numberdcbor match $NUMBER 42
│ 42│ 42dcbor match $NUMBER 3.14
│ 3.14│ 3.14To avoid confusion with command-line flags, you can use -- to separate the pattern from the input. -- signals that there are no command-line flags following it, allowing you to pass values that might otherwise be interpreted as flags. This is especially useful for negative numbers or special values like -Infinity.
NUMBER=
numberNUMBER=
numberdcbor match $NUMBER -- -1
│ -1│ -1Text Strings
As we demonstrated above, the text pattern matches any text string:
TEXT=
textTEXT=
textdcbor match $TEXT '"hello"'
│ "hello"│ "hello"dcbor match $TEXT '"🌎"'
│ "🌎"│ "🌎"Notice that when providing text strings as input to the CLI, you need to include the double-quotes as part of the dCBOR diagnostic notation. This is the same quoting consideration we discussed in the basic dcbor CLI chapter.
Byte Strings
The bstr pattern matches any byte string. Byte strings in CBOR are sequences of raw bytes, distinct from text strings which have UTF-8 character encoding semantics:
BSTR=
bstrBSTR=
bstrdcbor match $BSTR "h'68656c6c6f'"
│ h'68656c6c6f'│ h'68656c6c6f'The empty byte string is perfectly legal:
dcbor match $BSTR "h''"
│ h''│ h''Booleans and Null
The bool pattern matches both boolean values:
BOOL=
boolBOOL=
booldcbor match $BOOL true
│ true│ truedcbor match $BOOL false
│ false│ falseDon't confuse the response falsefalse here as meaning that the pattern didn't match; it means that the input value was falsefalse, which is a valid match for the bool pattern.
The nullnull pattern matches CBOR's nullnull value:
NULL=
nullNULL=
nulldcbor match $NULL null
│ null│ nullThe Universal Pattern
The ** ("any") pattern matches any CBOR value whatsoever.
ANY=
*ANY=
*dcbor match $ANY 42
│ 42│ 42dcbor match $ANY '"hello"'
│ "hello"│ "hello"dcbor match $ANY "h'1234'"
│ h'1234'│ h'1234'** is useful when you want to match any value in a particular position within a larger structure.
Specific Value Matching
Beyond matching types, you can match exact values by providing the specific value as your pattern.
Specific Numbers
FORTY_TWO=
42FORTY_TWO=
42dcbor match $FORTY_TWO 42
│ 42│ 42This won't match because 43 ≠ 42:
dcbor match $FORTY_TWO 43
│ Error: No match│ Error: No matchSpecific Text Strings
HELLO=
"hello"HELLO=
"hello"dcbor match $HELLO '"hello"'
│ "hello"│ "hello"This won't match because the strings are different:
dcbor match $HELLO '"world"'
│ Error: No match│ Error: No matchSpecific Byte Strings
TWO_BYTES=
h'1234'TWO_BYTES=
h'1234'dcbor match $TWO_BYTES "h'1234'"
│ h'1234'│ h'1234'Specific Boolean Values
BOOL_TRUE=
trueBOOL_TRUE=
truedcbor match $BOOL_TRUE true
│ true│ trueThis won't match because false ≠ true:
dcbor match $BOOL_TRUE false
│ Error: No match│ Error: No matchAdvanced Value Patterns
Beyond basic type and exact value matching, dCBOR patterns support sophisticated matching criteria including ranges for numbers and regular expressions for text and byte strings.
Number Ranges
Numbers can be matched using ranges and inequality operators, which is useful for validating data within acceptable bounds.
Range Matching
You can match numbers within a specific range using the ... syntax:
ONE_TO_TEN=
1...10ONE_TO_TEN=
1...10dcbor match $ONE_TO_TEN 5
│ 5│ 5dcbor match $ONE_TO_TEN 15
│ Error: No match│ Error: No matchThe ... syntax is shorthand for an inclusive, or closed range, meaning it includes the start and end values in the range.
The same range of numbers can also be specified with a more complex syntax using the & operator, which we'll cover later.
ONE_TO_TEN=
>=1 & <=10ONE_TO_TEN=
>=1 & <=10dcbor match $ONE_TO_TEN 5
│ 5│ 5Inequality Operators
Numbers support various inequality operators. Quoting is important here to ensure the shell doesn't misinterpret the operators as command-line directives:
Greater than:
dcbor match ">5" 10
│ 10│ 10Greater than or equal to:
dcbor match ">=5" 5
│ 5│ 5Less than:
dcbor match "<10" 8
│ 8│ 8Less than or equal to:
dcbor match "<=10" 10
│ 10│ 10Half-Open Ranges
Using the & operator allows you to construct patterns that match half-open ranges (where one end is inclusive and the other is exclusive):
dcbor match ">1 & <=10" 10
│ 10│ 10dcbor match ">1 & <=10" 1
│ Error: No match│ Error: No matchSpecial Number Values
You can also match three special floating-point values: NaN ("not a number"), Infinity, and -Infinity.
dcbor match "NaN" NaN
│ NaN│ NaNdcbor match "Infinity" Infinity
│ Infinity│ Infinitydcbor match -- "-Infinity" -Infinity
│ -Infinity│ -InfinityNote the use of -- to signal the end of command-line options, allowing you to pass values that might otherwise be interpreted as flags.
Text Regular Expressions
Regular expressions (or regexes) are powerful pattern matching tools for text, allowing you to search for specific patterns rather than exact text. They use special characters and syntax to define search patterns. For instance, d+ matches one or more digits, [A-Z]+ matches one or more uppercase letters, and ^ and $ anchor patterns to the beginning and end of a string respectively. With regular expressions, you can validate formats, extract information, and perform sophisticated text processing operations.
dCBOR patexes that this chapter describes are based on some of the same concepts as regexes, but they are not the same. The dCBOR pattern expression syntax is designed specifically for matching CBOR data structures and values, while regular expressions are specifically for processing text. Nonetheless, some of the types you can match with dCBOR patterns, such as text strings and byte strings, can be matched using regular expressions.
Text strings can be matched using regular expressions, by using the a regex enclosed in forward slashes: /regex//regex/:
Match strings starting with "temp"
STARTS_WITH_TEMP=
/^temp/STARTS_WITH_TEMP=
/^temp/dcbor match $STARTS_WITH_TEMP '"temporary"'
│ "temporary"│ "temporary"This won't match because it doesn't start with "temp":
dcbor match $STARTS_WITH_TEMP '"permanent"'
│ Error: No match│ Error: No matchMatch any email-like pattern
EMAIL_ADDRESS=
/^[^@]+@[^@]+\.[^@]+$/EMAIL_ADDRESS=
/^[^@]+@[^@]+\.[^@]+$/dcbor match $EMAIL_ADDRESS '"user@example.com"'
│ "user@example.com"│ "user@example.com"Regular expressions use standard Rust regex syntax, which is based on Perl-compatible regular expressions (PCRE). This allows for complex pattern matching including:
- Literal characters:
/abc//abc/,/123//123/ - Any character:
/.//./ - Character classes:
/[a-z]//[a-z]/,/[0-9]//[0-9]/,/\d//\d/(digit),/\w//\w/(word character) - Quantifiers:
/<pattern>*//<pattern>*/(zero or more),/<pattern>+//<pattern>+/(one or more),/<pattern>?//<pattern>?/(zero or one),/<pattern>{n,m}//<pattern>{n,m}/(between n and m times) - Anchors:
/^<pattern>//^<pattern>/(start),/<pattern>$//<pattern>$/(end) - Groups:
/(<pattern>)//(<pattern>)/ - Alternation:
/<pattern1>|<pattern2>//<pattern1>|<pattern2>/
Explaining the full syntax of regular expressions is beyond the scope of this book, but you can find more information on the specific Rust implementation in the Rust regex documentation.
Byte String Regular Expressions
Byte strings also support regular expression matching, useful for matching binary patterns or encoded data. Binary regexes operate on raw byte content, not on the hex string representation you see in diagnostic notation. The syntax is like h'<hex>'h'<hex>' above, but for regexes its: h'/<regex>/'h'/<regex>/'.
Binary regexes must start with the (?s-u) flags to work correctly:
(?s)enables "dot matches newline" mode, allowing.to match across newlines (like byte0x0a)(?-u)disables Unicode mode, allowing.to match any byte value instead of just valid UTF-8 sequences- Use
xnotation for specific byte values (e.g.,xFFfor byte 255)
Without these flags, patterns may fail on byte strings containing newlines or invalid UTF-8 sequences.
Match byte strings containing the byte 0xFF0xFF anywhere
CONTAINS_FF=
h'/(?s-u).*\xFF.*/'CONTAINS_FF=
h'/(?s-u).*\xFF.*/'dcbor match $CONTAINS_FF "h'ff01020304'"
│ h'ff01020304'│ h'ff01020304'Match byte strings starting with specific bytes 01020102
STARTS_WITH_0102=
h'/(?s-u)^\x01\x02/'STARTS_WITH_0102=
h'/(?s-u)^\x01\x02/'dcbor match $STARTS_WITH_0102 "h'01020304'"
│ h'01020304'│ h'01020304'Match byte strings ending with specific bytes
ENDS_WITH_0304=
h'/(?s-u)\x03\x04$/'ENDS_WITH_0304=
h'/(?s-u)\x03\x04$/'dcbor match $ENDS_WITH_0304 "h'01020304'"
│ h'01020304'│ h'01020304'Match any 4-byte sequence
ANY_FOUR_BYTES=
h'/(?s-u)^.{4}$/'ANY_FOUR_BYTES=
h'/(?s-u)^.{4}$/'dcbor match $ANY_FOUR_BYTES "h'12345678'"
│ h'12345678'│ h'12345678'Practical Examples
These advanced patterns are particularly useful for data validation and extraction:
Validate that ages are reasonable (0-120)
dcbor match "0...120" 25
│ 25│ 25Extract valid email addresses from text
EMAIL_ADDRESS=
/^\w+@\w+\.\w+$/EMAIL_ADDRESS=
/^\w+@\w+\.\w+$/dcbor match $EMAIL_ADDRESS '"john@example.com"'
│ "john@example.com"│ "john@example.com"Find numeric IDs above a threshold
dcbor match ">1000" 1001
│ 1001│ 1001Match ISO-8601 date-like strings
ISO_DATE=
/^\d{4}-\d{2}-\d{2}$/ISO_DATE=
/^\d{4}-\d{2}-\d{2}$/dcbor match $ISO_DATE '"2023-12-25"'
│ "2023-12-25"│ "2023-12-25"These advanced value patterns form the building blocks for more complex structure matching, which we'll explore in the next section.
Understanding Match Output
When a pattern matches, the default output shows the matched value. This seems simple now, but it becomes more meaningful when we start working with complex structures where patterns might match multiple values or nested elements.
dcbor match number 42
│ 42│ 42The output 42 tells us that the pattern number matched the input value 42. When we move to structure patterns, you'll see how this output format shows the path through complex data structures.
Pattern Validation and Error Messages
When a pattern doesn't match, the CLI returns an error:
dcbor match text 42
│ Error: No match│ Error: No matchThis happens because the input 42 is a number, but the pattern text expects a string. Understanding these error messages helps you debug your patterns and understand why they might not be working as expected.
Finally, here's are a couple of example of patterns that fail to parse:
dcbor match tex '"Hello"'
│ Error: Failed to parse pattern at position 0..1: unrecognized token 't'
│ Pattern: tex
│ ^│ Error: Failed to parse pattern at position 0..1: unrecognized token 't'
│ Pattern: tex
│ ^dcbor match '"Hello' '"Hello"'
│ Error: Failed to parse pattern: Unterminated string literal at 0..1│ Error: Failed to parse pattern: Unterminated string literal at 0..1Structure Patterns
Beyond matching individual values, dCBOR patterns support matching complex structures like arrays, maps, and tagged values. These patterns allow you to validate data schemas and extract elements from nested structures.
Array Patterns
Basic Array Matching
The arrayarray pattern matches any array structure:
ANY_ARRAY=
arrayANY_ARRAY=
arraydcbor match $ANY_ARRAY '[1, 2, 3]'
│ [1, 2, 3]│ [1, 2, 3]dcbor match $ANY_ARRAY '["hello", "world"]'
│ ["hello", "world"]│ ["hello", "world"]dcbor match $ANY_ARRAY '[]'
│ []│ []If you want to match the empty array specifically, then the pattern is just the empty array: [][].
Array Sequence Patterns
The array pattern can contain a comma-separated list of patterns, where each pattern matches zero or more elements in the array in sequence.
[ <patex>, <patex>, ... ][ <patex>, <patex>, ... ]Match an array with a number followed by text
NUMBER_THEN_TEXT=
[number, text]NUMBER_THEN_TEXT=
[number, text]dcbor match $NUMBER_THEN_TEXT '[42, "hello"]'
│ [42, "hello"]│ [42, "hello"][number, text][number, text] means the first element must be a number, followed by a text string, and that's it: these must be the only elements and they must appear in that order, so adding another element would not match:
dcbor match $NUMBER_THEN_TEXT '[42, "hello", 0]'
│ Error: No match│ Error: No matchIn this case the first element must be the exact number 42, but the second element can be any text string:
FORTY_TWO_THEN_TEXT=
[42, text]FORTY_TWO_THEN_TEXT=
[42, text]dcbor match $FORTY_TWO_THEN_TEXT '[42, "hello"]'
│ [42, "hello"]│ [42, "hello"]This won't match because the elements are in wrong order:
dcbor match $FORTY_TWO_THEN_TEXT '["hello", 42]'
│ Error: No match│ Error: No matchMatch array starting with number, then text, then anything else
NUMBER_THEN_TEXT_THEN_ANY=
[number, text, *]NUMBER_THEN_TEXT_THEN_ANY=
[number, text, *]dcbor match $NUMBER_THEN_TEXT_THEN_ANY '[42, "hello", true]'
│ [42, "hello", true]│ [42, "hello", true]In the example above, the ** operator by itself matches exactly one element. If you want to match zero or more of any elements from this point on, you can use the repeating pattern (*)*(*)*:
NUMBER_THEN_TEXT_THEN_REST=
[number, text, (*)*]NUMBER_THEN_TEXT_THEN_REST=
[number, text, (*)*]dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello"]'
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello", true]'
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello", true, false]'
│ [42, "hello"]
│ [42, "hello", true]
│ [42, "hello", true, false]│ [42, "hello"]
│ [42, "hello", true]
│ [42, "hello", true, false]We'll cover repeating patterns more thoroughly later.
Map Patterns
Basic Map Matching
The mapmap pattern matches any map structure
ANY_MAP=
mapANY_MAP=
mapdcbor match $ANY_MAP '{1: 2, 3: 4}'
│ {1: 2, 3: 4}│ {1: 2, 3: 4}dcbor match $ANY_MAP '{"hello": "world"}'
│ {"hello": "world"}│ {"hello": "world"}dcbor match $ANY_MAP '{}'
│ {}│ {}Key-Value Constraints
Maps can be matched by specifying key-value constraints using <key>: <value><key>: <value> notation. For each constraint, the target map must have at least one key-value pair that satisfies the constraint.
Match map with a specific key, and a text value
HAS_KEY_NAME=
{"name": text}HAS_KEY_NAME=
{"name": text}dcbor match $HAS_KEY_NAME '{"name": "Alice", "age": 30}'
│ {"age": 30, "name": "Alice"}│ {"age": 30, "name": "Alice"}Notice that it is not necessary to match every key-value pair in the map; you can match just the ones you care about. The output will show the entire map.
Match map with number-valued key
HAS_KEY_1=
{1: text}HAS_KEY_1=
{1: text}dcbor match $HAS_KEY_1 '{1: "first", 2: "second"}'
│ {1: "first", 2: "second"}│ {1: "first", 2: "second"}If you want to match a map that only contains a specific key-value pair, you can specify the exact number of entries using the & operator and a map pattern containing a quantifier:
Match map with exactly one key-value pair, where key is 1 and value is any text
HAS_SINGLE_ENTRY_WITH_KEY_1=
{ {1} } & {1: text}HAS_SINGLE_ENTRY_WITH_KEY_1=
{ {1} } & {1: text}This will not match because it has two entries, and the patex specifies one:
dcbor match $HAS_SINGLE_ENTRY_WITH_KEY_1 '{1: "first"}'
│ {1: "first"}│ {1: "first"}There are two entries, so no match:
dcbor match $HAS_SINGLE_ENTRY_WITH_KEY_1 '{1: "first", 2: "second"}'
│ Error: No match│ Error: No matchMatch map with multiple required entries
HAS_ID_AND_NAME=
{"id": number, "name": text}HAS_ID_AND_NAME=
{"id": number, "name": text}Both key-value pairs must exist, but other entries are allowed
dcbor match $HAS_ID_AND_NAME '{"id": 1, "name": "Alice", "age": 30}'
│ {"id": 1, "age": 30, "name": "Alice"}│ {"id": 1, "age": 30, "name": "Alice"}Tagged Value Patterns
CBOR tagged values apply semantic meaning to data. Patterns can match both the tag and the content.
Tag Number Matching
Match any value with tag 1234 containing a number
NUMBER_TAGGED_1234=
tagged(1234, number)NUMBER_TAGGED_1234=
tagged(1234, number)dcbor match $NUMBER_TAGGED_1234 "1234(42)"
│ 1234(42)│ 1234(42)Match tag 12345 with any content
ANY_TAGGED_12345=
tagged(12345, *)ANY_TAGGED_12345=
tagged(12345, *)dcbor match $ANY_TAGGED_12345 '12345("tagged string")'
│ 12345("tagged string")│ 12345("tagged string")Content Pattern Matching
Tagged patterns specify both the tag value and required content patterns:
Match tag 2 (bignum) with byte string content
BIGNUM=
tagged(2, bstr)BIGNUM=
tagged(2, bstr)dcbor match $BIGNUM "2(h'0102')"
│ 2(h'0102')│ 2(h'0102')Match tag with array content having specific structure
NUMBER_TEXT_ARRAY_TAGGED_42=
tagged(42, [number, text])NUMBER_TEXT_ARRAY_TAGGED_42=
tagged(42, [number, text])dcbor match $NUMBER_TEXT_ARRAY_TAGGED_42 '42([1, "data"])'
│ 42([1, "data"])│ 42([1, "data"])Introducing Paths
Single Path, Single Element Output
When a pattern matches, the default output shows the matching value. For structures, this represents the entire matching structure:
dcbor match 'array' '[1, 2, 3]'
│ [1, 2, 3]│ [1, 2, 3]dcbor match '{"key": *}' '{"key": "value", "other": 42}'
│ {"key": "value", "other": 42}│ {"key": "value", "other": 42}The examples above only include one match, and one way to get there. But dCBOR items are actually trees, with arrays and maps representing possible branchs. This becomes more meaningful when working with search patterns or captures that can match multiple items or nested elements. For example, later we'll discuss the search pattern, which visits all the elements in a dcbor item. For a quick example, if you match a pattern that finds all numbers in an array, the output will show each number along with its context, or path from the root of the structure:
dcbor match 'search(number)' '[1, [2, 3]]'
The output shows three paths from the root item to numbers within it:
│ [1, [2, 3]]
│ 1
│ [1, [2, 3]]
│ [2, 3]
│ 2
│ [1, [2, 3]]
│ [2, 3]
│ 3│ [1, [2, 3]]
│ 1
│ [1, [2, 3]]
│ [2, 3]
│ 2
│ [1, [2, 3]]
│ [2, 3]
│ 3You can choose to output the last item of each path using the --last-only option, which will only show the final matched items:
dcbor match --last-only "search(number)" '[1, [2, 3]]'
│ 1
│ 2
│ 3│ 1
│ 2
│ 3Output Options Overview
The dcbor match command provides several options for controlling output format:
--captures: Show named capture information (covered in advanced chapter)--last-only: Show only the final matched items--in FORMAT/--out FORMAT: Control input/output formats (hex, diag, etc.)
The next chapter will cover advanced matching techniques.
The appendices include a dCBOR Patex Reference.