Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Matching Patterns with dcbor CLI

The dcbor CLI tool includes powerful pattern matching capabilities that allow you to search for, extract, and validate specific structures within dCBOR data. This chapter introduces the dcbor match subcommand, which leverages the comprehensive pattern expression (AKA "patex") syntax of the dcbor-pattern crate to enable sophisticated data analysis and extraction workflows.

Tip

This chapter builds on the foundation established in The dcbor Command Line Tool chapter. If you haven't read that chapter yet, we recommend doing so first to familiarize yourself with the basic dcbor CLI operations.

What is Pattern Matching?

Pattern matching in the context of dCBOR allows you to:

  • Find specific data structures within complex CBOR documents
  • Extract values that match certain criteria
  • Validate data conformance to expected patterns
  • Find the paths that lead to matching values within nested structures
  • Transform data by capturing and reformatting matches

The dcbor match Command

The basic syntax of the dcbor match command is:

dcbor match <PATTERN> [INPUT] [OPTIONS]

Where:

  • is a pattern expression (AKA "patex") written in dcbor-pattern expression syntax we'll explore in detail
  • [INPUT] is the dCBOR data to match against (or read from stdin)
  • [OPTIONS] control input/output formats and matching behavior

Pattern Syntax Reference

You can find a complete reference for the patex syntax in the dCBOR Expression Syntax Appendix. This appendix provides a quick reference for the patex syntax, including value patterns, structure patterns, and meta patterns we'll cover later.

Value Patterns

Value patterns are the foundation of dCBOR pattern matching. They allow you to match specific data types and exact values. Let's start with the most basic patterns and build up your understanding progressively.

Numbers

Recall that if you simply type:

dcbor 42

You get back the hex representation of the CBOR number 42:

182a
182a

If you want the CBOR diagnostic notation, you can use the --diag option:

dcbor -o diag 42
42
42

Note

In the examples in this chapter, the actual patex used is shown in its own block, and referred to in the command lines that follow it as $PATTERN. So when you see a block like this:

PATTERN=
number
PATTERN=
number

What we're hiding is that we really wrote this:

PATTERN=$(cat <<'EOF'
number
EOF
)

This little bit of heredoc awkwardness is the most reliable way to make sure everything in a pattern is assigned to a shell variable verbatim. For many patterns you won't need to use it yourself.

But if you do, now you know.

What if you have two pieces of CBOR data, and you want to check whether one of them is a number?

CBOR1=182a
CBOR2=6548656c6c6f

You can use the dcbor match command to check whether either of these is a number:

NUMBER=
number
NUMBER=
number
dcbor match $NUMBER -i hex $CBOR1
42
42
dcbor match $NUMBER -i hex $CBOR2
Error: No Match
Error: No Match

We can see that CBOR1 is the number 42, and CBOR2 is not a numeric value. So let's see whether it is a textual string by using the TEXT pattern:

TEXT=
text
TEXT=
text
dcbor match $TEXT -i hex $CBOR2
"Hello"
"Hello"

The pattern matches, and we can see it is the string "Hello".

The number pattern matches any numeric value, whether it's an integer or floating-point number:

NUMBER=
number
NUMBER=
number
dcbor match $NUMBER 42
42
42
dcbor match $NUMBER 3.14
3.14
3.14

Note

Numbers in CBOR can be positive or negative integers, or floating-point values.

Tip

To avoid confusion with command-line flags, you can use -- to separate the pattern from the input. -- signals that there are no command-line flags following it, allowing you to pass values that might otherwise be interpreted as flags. This is especially useful for negative numbers or special values like -Infinity.

NUMBER=
number
NUMBER=
number
dcbor match $NUMBER -- -1
-1
-1

Text Strings

As we demonstrated above, the text pattern matches any text string:

TEXT=
text
TEXT=
text
dcbor match $TEXT '"hello"'
"hello"
"hello"
dcbor match $TEXT '"🌎"'
"🌎"
"🌎"

Notice that when providing text strings as input to the CLI, you need to include the double-quotes as part of the dCBOR diagnostic notation. This is the same quoting consideration we discussed in the basic dcbor CLI chapter.

Byte Strings

The bstr pattern matches any byte string. Byte strings in CBOR are sequences of raw bytes, distinct from text strings which have UTF-8 character encoding semantics:

BSTR=
bstr
BSTR=
bstr
dcbor match $BSTR "h'68656c6c6f'"
h'68656c6c6f'
h'68656c6c6f'

The empty byte string is perfectly legal:

dcbor match $BSTR "h''"
h''
h''

Booleans and Null

The bool pattern matches both boolean values:

BOOL=
bool
BOOL=
bool
dcbor match $BOOL true
true
true
dcbor match $BOOL false
false
false

Note

Don't confuse the response falsefalse here as meaning that the pattern didn't match; it means that the input value was falsefalse, which is a valid match for the bool pattern.

The nullnull pattern matches CBOR's nullnull value:

NULL=
null
NULL=
null
dcbor match $NULL null
null
null

The Universal Pattern

The ** ("any") pattern matches any CBOR value whatsoever.

ANY=
*
ANY=
*
dcbor match $ANY 42
42
42
dcbor match $ANY '"hello"'
"hello"
"hello"
dcbor match $ANY "h'1234'"
h'1234'
h'1234'

** is useful when you want to match any value in a particular position within a larger structure.

Specific Value Matching

Beyond matching types, you can match exact values by providing the specific value as your pattern.

Specific Numbers

FORTY_TWO=
42
FORTY_TWO=
42
dcbor match $FORTY_TWO 42
42
42

This won't match because 43 ≠ 42:

dcbor match $FORTY_TWO 43
Error: No match
Error: No match

Specific Text Strings

HELLO=
"hello"
HELLO=
"hello"
dcbor match $HELLO '"hello"'
"hello"
"hello"

This won't match because the strings are different:

dcbor match $HELLO '"world"'
Error: No match
Error: No match

Specific Byte Strings

TWO_BYTES=
h'1234'
TWO_BYTES=
h'1234'
dcbor match $TWO_BYTES "h'1234'"
h'1234'
h'1234'

Specific Boolean Values

BOOL_TRUE=
true
BOOL_TRUE=
true
dcbor match $BOOL_TRUE true
true
true

This won't match because false ≠ true:

dcbor match $BOOL_TRUE false
Error: No match
Error: No match

Advanced Value Patterns

Beyond basic type and exact value matching, dCBOR patterns support sophisticated matching criteria including ranges for numbers and regular expressions for text and byte strings.

Number Ranges

Numbers can be matched using ranges and inequality operators, which is useful for validating data within acceptable bounds.

Range Matching

You can match numbers within a specific range using the ... syntax:

ONE_TO_TEN=
1...10
ONE_TO_TEN=
1...10
dcbor match $ONE_TO_TEN 5
5
5
dcbor match $ONE_TO_TEN 15
Error: No match
Error: No match

Note

The ... syntax is shorthand for an inclusive, or closed range, meaning it includes the start and end values in the range.

The same range of numbers can also be specified with a more complex syntax using the & operator, which we'll cover later.

ONE_TO_TEN=
>=1 & <=10
ONE_TO_TEN=
>=1 & <=10
dcbor match $ONE_TO_TEN 5
5
5
Inequality Operators

Numbers support various inequality operators. Quoting is important here to ensure the shell doesn't misinterpret the operators as command-line directives:

Greater than:

dcbor match ">5" 10
10
10

Greater than or equal to:

dcbor match ">=5" 5
5
5

Less than:

dcbor match "<10" 8
8
8

Less than or equal to:

dcbor match "<=10" 10
10
10
Half-Open Ranges

Using the & operator allows you to construct patterns that match half-open ranges (where one end is inclusive and the other is exclusive):

dcbor match ">1 & <=10" 10
10
10
dcbor match ">1 & <=10" 1
Error: No match
Error: No match
Special Number Values

You can also match three special floating-point values: NaN ("not a number"), Infinity, and -Infinity.

dcbor match "NaN" NaN
NaN
NaN
dcbor match "Infinity" Infinity
Infinity
Infinity
dcbor match -- "-Infinity" -Infinity
-Infinity
-Infinity

Note

Note the use of -- to signal the end of command-line options, allowing you to pass values that might otherwise be interpreted as flags.

Text Regular Expressions

Regular expressions (or regexes) are powerful pattern matching tools for text, allowing you to search for specific patterns rather than exact text. They use special characters and syntax to define search patterns. For instance, d+ matches one or more digits, [A-Z]+ matches one or more uppercase letters, and ^ and $ anchor patterns to the beginning and end of a string respectively. With regular expressions, you can validate formats, extract information, and perform sophisticated text processing operations.

dCBOR patexes that this chapter describes are based on some of the same concepts as regexes, but they are not the same. The dCBOR pattern expression syntax is designed specifically for matching CBOR data structures and values, while regular expressions are specifically for processing text. Nonetheless, some of the types you can match with dCBOR patterns, such as text strings and byte strings, can be matched using regular expressions.

Text strings can be matched using regular expressions, by using the a regex enclosed in forward slashes: /regex//regex/:

Match strings starting with "temp"
STARTS_WITH_TEMP=
/^temp/
STARTS_WITH_TEMP=
/^temp/
dcbor match $STARTS_WITH_TEMP '"temporary"'
"temporary"
"temporary"

This won't match because it doesn't start with "temp":

dcbor match $STARTS_WITH_TEMP '"permanent"'
Error: No match
Error: No match
Match any email-like pattern
EMAIL_ADDRESS=
/^[^@]+@[^@]+\.[^@]+$/
EMAIL_ADDRESS=
/^[^@]+@[^@]+\.[^@]+$/
dcbor match $EMAIL_ADDRESS '"user@example.com"'
"user@example.com"
"user@example.com"

About Regular Expressions

Regular expressions use standard Rust regex syntax, which is based on Perl-compatible regular expressions (PCRE). This allows for complex pattern matching including:

  • Literal characters: /abc//abc/, /123//123/
  • Any character: /.//./
  • Character classes: /[a-z]//[a-z]/, /[0-9]//[0-9]/, /\d//\d/ (digit), /\w//\w/ (word character)
  • Quantifiers: /<pattern>*//<pattern>*/ (zero or more), /<pattern>+//<pattern>+/ (one or more), /<pattern>?//<pattern>?/ (zero or one), /<pattern>{n,m}//<pattern>{n,m}/ (between n and m times)
  • Anchors: /^<pattern>//^<pattern>/ (start), /<pattern>$//<pattern>$/ (end)
  • Groups: /(<pattern>)//(<pattern>)/
  • Alternation: /<pattern1>|<pattern2>//<pattern1>|<pattern2>/

Explaining the full syntax of regular expressions is beyond the scope of this book, but you can find more information on the specific Rust implementation in the Rust regex documentation.

Byte String Regular Expressions

Byte strings also support regular expression matching, useful for matching binary patterns or encoded data. Binary regexes operate on raw byte content, not on the hex string representation you see in diagnostic notation. The syntax is like h'<hex>'h'<hex>' above, but for regexes its: h'/<regex>/'h'/<regex>/'.

Flags for Binary Regexes

Binary regexes must start with the (?s-u) flags to work correctly:

  • (?s) enables "dot matches newline" mode, allowing . to match across newlines (like byte 0x0a)
  • (?-u) disables Unicode mode, allowing . to match any byte value instead of just valid UTF-8 sequences
  • Use x notation for specific byte values (e.g., xFF for byte 255)

Without these flags, patterns may fail on byte strings containing newlines or invalid UTF-8 sequences.

Match byte strings containing the byte 0xFF0xFF anywhere
CONTAINS_FF=
h'/(?s-u).*\xFF.*/'
CONTAINS_FF=
h'/(?s-u).*\xFF.*/'
dcbor match $CONTAINS_FF "h'ff01020304'"
h'ff01020304'
h'ff01020304'
Match byte strings starting with specific bytes 01020102
STARTS_WITH_0102=
h'/(?s-u)^\x01\x02/'
STARTS_WITH_0102=
h'/(?s-u)^\x01\x02/'
dcbor match $STARTS_WITH_0102 "h'01020304'"
h'01020304'
h'01020304'
Match byte strings ending with specific bytes
ENDS_WITH_0304=
h'/(?s-u)\x03\x04$/'
ENDS_WITH_0304=
h'/(?s-u)\x03\x04$/'
dcbor match $ENDS_WITH_0304 "h'01020304'"
h'01020304'
h'01020304'
Match any 4-byte sequence
ANY_FOUR_BYTES=
h'/(?s-u)^.{4}$/'
ANY_FOUR_BYTES=
h'/(?s-u)^.{4}$/'
dcbor match $ANY_FOUR_BYTES "h'12345678'"
h'12345678'
h'12345678'

Practical Examples

These advanced patterns are particularly useful for data validation and extraction:

Validate that ages are reasonable (0-120)
dcbor match "0...120" 25
25
25
Extract valid email addresses from text
EMAIL_ADDRESS=
/^\w+@\w+\.\w+$/
EMAIL_ADDRESS=
/^\w+@\w+\.\w+$/
dcbor match $EMAIL_ADDRESS '"john@example.com"'
"john@example.com"
"john@example.com"
Find numeric IDs above a threshold
dcbor match ">1000" 1001
1001
1001
Match ISO-8601 date-like strings
ISO_DATE=
/^\d{4}-\d{2}-\d{2}$/
ISO_DATE=
/^\d{4}-\d{2}-\d{2}$/
dcbor match $ISO_DATE '"2023-12-25"'
"2023-12-25"
"2023-12-25"

These advanced value patterns form the building blocks for more complex structure matching, which we'll explore in the next section.

Understanding Match Output

When a pattern matches, the default output shows the matched value. This seems simple now, but it becomes more meaningful when we start working with complex structures where patterns might match multiple values or nested elements.

dcbor match number 42
42
42

The output 42 tells us that the pattern number matched the input value 42. When we move to structure patterns, you'll see how this output format shows the path through complex data structures.

Pattern Validation and Error Messages

When a pattern doesn't match, the CLI returns an error:

dcbor match text 42
Error: No match
Error: No match

This happens because the input 42 is a number, but the pattern text expects a string. Understanding these error messages helps you debug your patterns and understand why they might not be working as expected.

Finally, here's are a couple of example of patterns that fail to parse:

dcbor match tex '"Hello"'
Error: Failed to parse pattern at position 0..1: unrecognized token 't'
Pattern: tex
│          ^
Error: Failed to parse pattern at position 0..1: unrecognized token 't'
Pattern: tex
│          ^
dcbor match '"Hello' '"Hello"'
Error: Failed to parse pattern: Unterminated string literal at 0..1
Error: Failed to parse pattern: Unterminated string literal at 0..1

Structure Patterns

Beyond matching individual values, dCBOR patterns support matching complex structures like arrays, maps, and tagged values. These patterns allow you to validate data schemas and extract elements from nested structures.

Array Patterns

Basic Array Matching

The arrayarray pattern matches any array structure:

ANY_ARRAY=
array
ANY_ARRAY=
array
dcbor match $ANY_ARRAY '[1, 2, 3]'
│ [1, 2, 3]
[1, 2, 3]
dcbor match $ANY_ARRAY '["hello", "world"]'
│ ["hello", "world"]
["hello", "world"]
dcbor match $ANY_ARRAY '[]'
│ []
[]

Note

If you want to match the empty array specifically, then the pattern is just the empty array: [][].

Array Sequence Patterns

The array pattern can contain a comma-separated list of patterns, where each pattern matches zero or more elements in the array in sequence.

[ <patex>, <patex>, ... ]
[ <patex>, <patex>, ... ]
Match an array with a number followed by text
NUMBER_THEN_TEXT=
[number, text]
NUMBER_THEN_TEXT=
[number, text]
dcbor match $NUMBER_THEN_TEXT '[42, "hello"]'
│ [42, "hello"]
[42, "hello"]

Note

[number, text][number, text] means the first element must be a number, followed by a text string, and that's it: these must be the only elements and they must appear in that order, so adding another element would not match:

dcbor match $NUMBER_THEN_TEXT '[42, "hello", 0]'
Error: No match
Error: No match

In this case the first element must be the exact number 42, but the second element can be any text string:

FORTY_TWO_THEN_TEXT=
[42, text]
FORTY_TWO_THEN_TEXT=
[42, text]
dcbor match $FORTY_TWO_THEN_TEXT '[42, "hello"]'
│ [42, "hello"]
[42, "hello"]

This won't match because the elements are in wrong order:

dcbor match $FORTY_TWO_THEN_TEXT '["hello", 42]'
Error: No match
Error: No match
Match array starting with number, then text, then anything else
NUMBER_THEN_TEXT_THEN_ANY=
[number, text, *]
NUMBER_THEN_TEXT_THEN_ANY=
[number, text, *]
dcbor match $NUMBER_THEN_TEXT_THEN_ANY '[42, "hello", true]'
│ [42, "hello", true]
[42, "hello", true]

Note

In the example above, the ** operator by itself matches exactly one element. If you want to match zero or more of any elements from this point on, you can use the repeating pattern (*)*(*)*:

NUMBER_THEN_TEXT_THEN_REST=
[number, text, (*)*]
NUMBER_THEN_TEXT_THEN_REST=
[number, text, (*)*]
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello"]'
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello", true]'
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello", true, false]'
│ [42, "hello"]
│ [42, "hello", true]
│ [42, "hello", true, false]
[42, "hello"]
[42, "hello", true]
[42, "hello", true, false]

We'll cover repeating patterns more thoroughly later.

Map Patterns

Basic Map Matching

The mapmap pattern matches any map structure

ANY_MAP=
map
ANY_MAP=
map
dcbor match $ANY_MAP '{1: 2, 3: 4}'
│ {1: 2, 3: 4}
│ {1: 2, 3: 4}
dcbor match $ANY_MAP '{"hello": "world"}'
│ {"hello": "world"}
│ {"hello": "world"}
dcbor match $ANY_MAP '{}'
│ {}
│ {}

Note

If you want to match the empty map specifically, then the pattern is just the empty map: {}{}.

Key-Value Constraints

Maps can be matched by specifying key-value constraints using <key>: <value><key>: <value> notation. For each constraint, the target map must have at least one key-value pair that satisfies the constraint.

Match map with a specific key, and a text value
HAS_KEY_NAME=
{"name": text}
HAS_KEY_NAME=
{"name": text}
dcbor match $HAS_KEY_NAME '{"name": "Alice", "age": 30}'
│ {"age": 30, "name": "Alice"}
│ {"age": 30, "name": "Alice"}

Notice that it is not necessary to match every key-value pair in the map; you can match just the ones you care about. The output will show the entire map.

Match map with number-valued key
HAS_KEY_1=
{1: text}
HAS_KEY_1=
{1: text}
dcbor match $HAS_KEY_1 '{1: "first", 2: "second"}'
│ {1: "first", 2: "second"}
│ {1: "first", 2: "second"}

If you want to match a map that only contains a specific key-value pair, you can specify the exact number of entries using the & operator and a map pattern containing a quantifier:

Match map with exactly one key-value pair, where key is 1 and value is any text
HAS_SINGLE_ENTRY_WITH_KEY_1=
{ {1} } & {1: text}
HAS_SINGLE_ENTRY_WITH_KEY_1=
{ {1} } & {1: text}

This will not match because it has two entries, and the patex specifies one:

dcbor match $HAS_SINGLE_ENTRY_WITH_KEY_1 '{1: "first"}'
│ {1: "first"}
│ {1: "first"}

There are two entries, so no match:

dcbor match $HAS_SINGLE_ENTRY_WITH_KEY_1 '{1: "first", 2: "second"}'
Error: No match
Error: No match
Match map with multiple required entries
HAS_ID_AND_NAME=
{"id": number, "name": text}
HAS_ID_AND_NAME=
{"id": number, "name": text}

Both key-value pairs must exist, but other entries are allowed

dcbor match $HAS_ID_AND_NAME '{"id": 1, "name": "Alice", "age": 30}'
│ {"id": 1, "age": 30, "name": "Alice"}
│ {"id": 1, "age": 30, "name": "Alice"}

Tagged Value Patterns

CBOR tagged values apply semantic meaning to data. Patterns can match both the tag and the content.

Tag Number Matching

Match any value with tag 1234 containing a number
NUMBER_TAGGED_1234=
tagged(1234, number)
NUMBER_TAGGED_1234=
tagged(1234, number)
dcbor match $NUMBER_TAGGED_1234 "1234(42)"
1234(42)
1234(42)
Match tag 12345 with any content
ANY_TAGGED_12345=
tagged(12345, *)
ANY_TAGGED_12345=
tagged(12345, *)
dcbor match $ANY_TAGGED_12345 '12345("tagged string")'
12345("tagged string")
12345("tagged string")

Content Pattern Matching

Tagged patterns specify both the tag value and required content patterns:

Match tag 2 (bignum) with byte string content
BIGNUM=
tagged(2, bstr)
BIGNUM=
tagged(2, bstr)
dcbor match $BIGNUM "2(h'0102')"
2(h'0102')
2(h'0102')
Match tag with array content having specific structure
NUMBER_TEXT_ARRAY_TAGGED_42=
tagged(42, [number, text])
NUMBER_TEXT_ARRAY_TAGGED_42=
tagged(42, [number, text])
dcbor match $NUMBER_TEXT_ARRAY_TAGGED_42 '42([1, "data"])'
42([1, "data"])
42([1, "data"])

Introducing Paths

Single Path, Single Element Output

When a pattern matches, the default output shows the matching value. For structures, this represents the entire matching structure:

dcbor match 'array' '[1, 2, 3]'
│ [1, 2, 3]
[1, 2, 3]
dcbor match '{"key": *}' '{"key": "value", "other": 42}'
│ {"key": "value", "other": 42}
│ {"key": "value", "other": 42}

The examples above only include one match, and one way to get there. But dCBOR items are actually trees, with arrays and maps representing possible branchs. This becomes more meaningful when working with search patterns or captures that can match multiple items or nested elements. For example, later we'll discuss the search pattern, which visits all the elements in a dcbor item. For a quick example, if you match a pattern that finds all numbers in an array, the output will show each number along with its context, or path from the root of the structure:

dcbor match 'search(number)' '[1, [2, 3]]'

The output shows three paths from the root item to numbers within it:

│ [1, [2, 3]]
1
│ [1, [2, 3]]
│     [2, 3]
2
│ [1, [2, 3]]
│     [2, 3]
3
[1, [2, 3]]
1
[1, [2, 3]]
[2, 3]
2
[1, [2, 3]]
[2, 3]
3

You can choose to output the last item of each path using the --last-only option, which will only show the final matched items:

dcbor match --last-only "search(number)" '[1, [2, 3]]'
1
2
3
1
2
3

Output Options Overview

The dcbor match command provides several options for controlling output format:

  • --captures: Show named capture information (covered in advanced chapter)
  • --last-only: Show only the final matched items
  • --in FORMAT / --out FORMAT: Control input/output formats (hex, diag, etc.)

Work in Progress

The next chapter will cover advanced matching techniques.

The appendices include a dCBOR Patex Reference.