Qoyod
Pricing

Knowledge Base

TLV Encoding in the Invoice QR Code: The Byte Structure Explained

Every simplified invoice in Saudi Arabia carries a Quick Response (QR) code printed in its corner. Behind that code sits a precise data structure known as TLV encoding, the format mandated by the Zakat, Tax and Customs Authority (ZATCA) to wrap the invoice fields before turning them into text that any phone app can read. This technical guide explains the byte structure of TLV: how each field is built from a Tag, a Length, and a Value, how these fields are concatenated, and then encoded in Base64 to be injected inside the QR code. The goal is for the developer or technical accountant to come away with a full understanding of how the content behind every code on the invoice is built.

We focus here on the TLV layer alone: byte structure, how the length is calculated, and a numbered, step-by-step practical example. The details of Base64 conversion and the full list of QR fields for each phase have their own dedicated references, which we point to in place without repeating them here.

What is TLV encoding?

The abbreviation TLV stands for Tag-Length-Value, meaning “Tag, Length, Value.” It is a binary encoding format known for decades in payment cards and telecommunications standards, and the Authority chose it to wrap the invoice fields inside the Quick Response code. The idea is simple: instead of writing the fields as free text that is hard to parse, each field is wrapped in three consecutive parts:

  • Tag: a number that identifies the field type (seller name, VAT registration number, and so on).
  • Length: the number of bytes occupied by the value that immediately follows the length.
  • Value: the actual content of the field, encoded in UTF-8.

This triplet repeats for every field, so the fields concatenate into a single continuous byte string. The reader parses the string by reading a tag, then a length, then jumping forward by the length to capture the value, then moving on to the next tag. It needs no separators and no end markers, because the length itself tells it where each value ends.

This is what makes TLV suitable for a QR code: a compact, unambiguous format that can be parsed programmatically on any device without needing an internet connection. To understand the QR code itself and its full role on the invoice, see the guide The Quick Response (QR) code on the electronic invoice.

The TLV triplet: Tag, Length, and Value
How each field is built from three consecutive parts in bytes.
1

Tag: the field number

2

Length: the number of value bytes

3

Value: the data

The length defines the width of the value, so it is known where the field ends and the next one begins.

The byte structure in TLV: anatomy of a single field

Every field in the TLV string the Authority adopted for Phase One is built from a fixed, clear byte structure. Let us break down each part separately.

Tag: one byte

The tag occupies just one byte, and its value is an integer from 1 to 5 for the core Phase One fields. This number acts as an identifier for the field, so the reader knows the type of data it is receiving from the tag without relying on their order. The five core fields and their tag numbers:

  • Tag 1: Seller name.
  • Tag 2: VAT registration number.
  • Tag 3: Invoice timestamp in ISO 8601 format.
  • Tag 4: Invoice total with VAT.
  • Tag 5: VAT total.

Since the tag is one byte, its value ranges theoretically between 0 and 255, a range that is more than enough for the number of fields required. Phase Two adds higher tags (6, 7, and beyond) for the cryptographic stamp and public key, but the one-byte-per-tag principle stays fixed.

Length: one byte

The length immediately follows the tag, and it too occupies one byte. Its value is an integer equal to exactly the number of bytes the following value occupies. For example, if the seller name value occupies 12 bytes after UTF-8 encoding, then the length byte carries the value 12 (or 0x0C in hexadecimal).

The subtle point here is that the length is measured in bytes, not characters. A single Arabic character in UTF-8 usually occupies two bytes, and an emoji may occupy four. So an Arabic seller name of six characters may equal 12 bytes in the length field, not 6. Confusing the number of characters with the number of bytes is the most common mistake made by those building the encoding manually for the first time.

And since the length is one byte, the maximum value for a single field is 255 bytes. This is enough for every simplified-invoice field, because the longest of them (seller name and timestamp) stays well below this limit.

Value: variable length

The value is the actual content of the field, and its length varies with the data. It is always encoded in UTF-8, the encoding that supports Arabic, English, and numbers in a unified representation. The number of bytes the value occupies is exactly the number stored in the length byte that preceded it.

The values in Phase One fields are entirely textual: the seller name is text, the VAT registration number is a 15-digit numeric text, the timestamp is a date-and-time text, and the amounts are numeric texts. There is no conversion of numbers into a compact binary format at this phase; every value is written as a human reads it and then encoded in UTF-8.

The five TLV fields in Phase One
The core tags the QR code carries in Phase One.
Standard Field Example
1 Seller name Qoyod
2 VAT number 300000000000003
3 Date and time 2026-06-23T12:00:00
4 Total 115.00
5 Tax 15.00
Each field is encoded as a Tag-Length-Value triplet, then the fields are concatenated.

How the fields concatenate into a single string

After building each field as a “Tag, Length, Value” triplet, the fields are packed one after another into a continuous byte string with no separator between them. The order adopted in Phase One is Tag 1, then 2, then 3, then 4, then 5. The resulting string looks like this conceptually:

[Tag1][Len1][Value1][Tag2][Len2][Value2][Tag3][Len3][Value3][Tag4][Len4][Value4][Tag5][Len5][Value5]

The reader starts from the first byte and reads it as a tag, then the next byte as a length, then reads as many bytes as the length as the value, then goes back to read the next byte as a new tag, and so on until the end of the string. No separator markers are needed because each length precisely determines where its value ends and where the tag after it begins.

This binary string is the raw “payload.” It is not yet ready to be printed inside a QR code, because it is binary bytes that may contain values that cannot be displayed as text. This is where the next step comes in: Base64 encoding.

From TLV to Base64: the last step before the QR code

The binary TLV string is passed through Base64 encoding, which converts the raw bytes into text made of letters and numbers that are safe to print and transmit. The output is a single Base64 text, which is what actually gets injected inside the Quick Response code, making it scannable by any phone app that supports QR.

In short, the full path of the code content passes through three stations:

  1. The invoice fields are built as TLV triplets and packed into a single byte string.
  2. The string is encoded in Base64 to become text that is safe to print.
  3. The Base64 text is injected inside the QR code and printed on the invoice.

The details of how Base64 itself works (how it converts every three bytes into four characters, and the role of padding) have a dedicated guide within this technical series. What matters here is that TLV is the wrapping layer, and Base64 is the text-conversion layer that immediately follows it.

A practical encoding example, step by step

Let us build a TLV string for a hypothetical simplified invoice. We will take the five core Phase One fields and encode each one manually to see how the bytes are formed. We will use a short English seller name in the example to simplify the byte calculation, then comment on the Arabic case.

Hypothetical invoice data:

  • Seller name: Qoyod
  • VAT registration number: 301234567800003
  • Timestamp: 2026-06-24T10:30:00Z
  • Total with VAT: 115.00
  • Tax amount: 15.00

Field one: seller name

Tag 1, and the value Qoyod occupies 5 bytes in UTF-8 (each Latin character is one byte). So the length is 5. Field structure:

Tag    = 01            (this field is the seller name)
Length = 05            (the value occupies 5 bytes)
Value  = 51 6F 79 6F 64  (hex encoding of Q o y o d)

Resulting bytes: 01 05 51 6F 79 6F 64

Field two: VAT registration number

Tag 2, and the value 301234567800003 is a 15-digit number, each digit one byte in UTF-8, so the length is 15 (or 0F in hex):

Tag    = 02
Length = 0F            (15 bytes)
Value  = UTF-8 encoding of the text "301234567800003"

Resulting bytes: 02 0F 33 30 31 32 33 34 35 36 37 38 30 30 30 30 33

Field three: timestamp

Tag 3, and the value 2026-06-24T10:30:00Z is an ISO 8601 text occupying 20 bytes, so the length is 20 (or 14 in hex):

Tag    = 03
Length = 14            (20 bytes)
Value  = UTF-8 encoding of the text "2026-06-24T10:30:00Z"

Fields four and five: total and tax

Tag 4 for the total 115.00 (6 bytes, length 6), and Tag 5 for the tax amount 15.00 (5 bytes, length 5):

Tag    = 04
Length = 06
Value  = UTF-8 encoding of the text "115.00"
Bytes: 04 06 31 31 35 2E 30 30

Tag    = 05
Length = 05
Value  = UTF-8 encoding of the text "15.00"
Bytes: 05 05 31 35 2E 30 30

Assembling the final string

We pack the resulting bytes from the five fields in order into a single string. Then we pass this string through Base64 to get the final text that goes into the QR code:

TLV string (conceptually):
01 05 [Qoyod] 02 0F [VAT] 03 14 [Timestamp] 04 06 [Total] 05 05 [Tax]

       │
       ▼  Base64 encoding

Base64 text ready for injection into the QR code

Note that an Arabic seller name such as Qoyod behaves differently in the length calculation. A single Arabic character occupies two bytes in UTF-8, so a four-character word Qoyod occupies 8 bytes, not 4, and the length byte becomes 8 (or 08 in hex). This exact point breaks any application that calculates the length by character count instead of bytes.

Calculating bytes precisely: hexadecimal and UTF-8

Building a correct TLV encoding requires mastering byte calculation, and here two concepts meet: hexadecimal representation and UTF-8 encoding. The values in our examples are written in the hexadecimal system (Hexadecimal), where each byte is represented by two digits from 00 to FF. The number 0x0C equals 12 in decimal, 0x0F equals 15, and 0x14 equals 20. This representation is standard in encoding protocols because it maps precisely to byte boundaries.

As for calculating the value length, it depends entirely on UTF-8. The practical rule: Latin characters, numbers, and basic symbols occupy one byte per character. Arabic characters usually occupy two bytes per character. Some extended symbols may occupy three or four bytes. Therefore the length must be calculated after actually encoding the text, not by counting its characters on the screen.

Take the timestamp as an example. The full ISO 8601 format 2026-06-24T10:30:00Z is made of Latin characters, numbers, and separator symbols, all one byte per character, so their total is 20 bytes. If a different format with a different length were used, the length byte would change accordingly. This confirms that the length is not a fixed value but is calculated for each invoice according to its actual content.

The practical takeaway: do not assume a length; always calculate it from the bytes produced by the encoding. A simple tool that converts text to UTF-8 and then counts its bytes is enough for manual verification. Certified systems handle this calculation automatically for every field.

How an application reads and decodes a TLV string

Understanding the decoding method is no less important than building it, because it reveals why this format was chosen over others. The reader (the verification app on the phone or the Authority’s system) processes the byte string in deterministic, unambiguous steps:

  1. It reads the first byte and interprets it as a tag, so it knows the type of the upcoming field.
  2. It reads the next byte and interprets it as a length, so it knows how many bytes the value occupies.
  3. It reads a number of bytes exactly equal to the length, and interprets them as a value after UTF-8 decoding.
  4. It returns to the next byte directly and treats it as a new tag, repeating the cycle until the end of the string.

This determinism is the essence of TLV’s strength. The reader needs no prior knowledge of each field’s length, because every field carries its own length. And it needs no separators or end markers, because the length precisely determines where the value ends. This design makes the encoding error-resistant: even if a new field with an unknown tag is added in the future, an old reader can skip it by jumping forward by its length without breaking.

Before decoding TLV itself, the reader first decodes the Base64 encoding to restore the raw byte string, then begins the reading cycle above. Any flaw in the Base64 layer (such as double encoding or missing padding) corrupts the string before it even reaches the TLV logic.

How an application decodes a TLV string
The reading cycle the reader follows to extract the fields.
1

Read the Tag

2

Read the Length

3

Jump by the length and read the Value

4

Return to the next tag

The cycle repeats until the end of the string to extract all fields.

Why did the Authority choose the TLV format specifically?

The Authority could have chosen other formats to wrap the QR code data, such as comma-separated text or JSON format. But TLV outperformed them for practical reasons:

  • Compactness: a QR code has limited space. Every extra byte enlarges the code and makes it harder to scan. TLV adds only two wrapping bytes per field (tag and length), while JSON adds brackets, quotation marks, and long key names.
  • Unambiguity: there is no textual content that could conflict with a separator, because the length defines the boundaries, not a separator. A seller name containing a comma will not confuse the reader as happens in separated formats.
  • Ease of programmatic parsing: the reading cycle is simple and deterministic, working on any processor and any programming language without complex parsing libraries.
  • Extensibility: adding a new field does not break old readers, because they safely skip unknown tags.

These advantages combined make TLV the standard choice for wrapping data in tight spaces, which is what led the Authority to adopt it as a unified standard that every accounting system in Saudi Arabia complies with.

The difference between TLV and ordinary free text

To illustrate the value of TLV, compare wrapping two fields as comma-separated free text with wrapping them in TLV format:

Comma-separated free text:
Qoyod,301234567800003

The problem: if the seller name contains a comma (Qoyod, LLC)
parsing breaks because the reader does not know where the field ends.

TLV encoding:
01 05 [Qoyod] 02 0F [301234567800003]

The advantage: the length defines the boundaries precisely. Any character inside the value
(comma, space, symbol) does not confuse the reader at all.

The fundamental difference is that TLV separates the data by “length” rather than by a “separator character.” This eliminates an entire class of errors that arise when the value itself contains the character used for separation. And it is a real problem in invoices, where business names contain commas, spaces, and various symbols.

Verifying the integrity of TLV encoding

When building or auditing a QR code, the integrity of the TLV layer can be verified with practical steps:

  1. Decode the Base64 encoding to restore the raw byte string.
  2. Read the first byte as a tag, and confirm it is an expected value (1 to 5 for Phase One).
  3. Read the next byte as a length, then confirm the string actually contains that number of bytes after it.
  4. Jump forward by the length, and repeat the reading until exactly the end of the string with no extra or missing bytes.
  5. Confirm that the sum of all field lengths plus the wrapping bytes equals the total length of the string.

If the string ends in the middle of a value, or bytes remain after the last expected field, the encoding is corrupt. The most common cause is an error in calculating the length by characters instead of bytes, or double Base64 encoding. For more context on the full structure of the invoice file from which these fields are derived, see the guide The XML invoice structure in electronic invoicing.

Phase Two: additional fields with the same logic

In Phase Two of electronic invoicing (the integration phase), the Authority adds new fields to the TLV string with exactly the same structure: a one-byte tag, a one-byte length, then a value. The most notable of these additional fields:

  • Tag 6: Invoice hash based on the SHA-256 algorithm.
  • Tag 7: Cryptographic stamp / digital signature.
  • Higher tags: the public key and the Authority’s stamp on the public key in establishment invoices.

The fundamental difference is that the values of these fields are encrypted binary rather than human text, but they remain wrapped with the same logic: tag, length, value. The SHA-256 mechanism that produces the hash field has its own detailed treatment in the guide The SHA-256 hashing algorithm in the electronic invoice. As for the full structure of the invoice file from which these values are derived, it is explained in the guide The XML invoice structure in electronic invoicing.

The full list of QR code fields for each phase, and which is mandatory and when, has its own dedicated reference within this series. What concerns us here is that the TLV layer does not change its structure between the two phases; only the number of fields it wraps increases.

Common mistakes in building TLV encoding

Those who build the encoding manually or verify it often fall into the following mistakes:

  • Calculating the length by characters, not bytes: the most famous mistake. Arabic text doubles the number of bytes, so always calculate the length after UTF-8 encoding.
  • Overlooking the tag order: the fields are packed in the order Tag 1, then 2, then 3, and so on. Mixing the order confuses some strict readers.
  • Adding separators between fields: TLV needs no separator. Any extra byte corrupts the reading of the next length.
  • Double Base64 encoding: passing the string through Base64 twice produces text the reader cannot decode.
  • Exceeding the one-byte limit for the length: a value longer than 255 bytes needs special handling. Simplified-invoice fields stay under the limit, but watch out when scaling.

TLV encoding in the practical reality of businesses

This technical detail may seem far from the concern of a small-business owner, but understanding it helps in two practical cases. The first is when choosing an accounting system: the certified system must build correct TLV encoding automatically, so if an unreadable QR code appears on your invoices, the problem is most likely in the encoding layer. The second is when verifying a supplier’s invoice: the Authority’s verification app decodes the TLV string and displays the fields, so if the display fails, that indicates corrupt encoding in the supplier’s invoice.

In both cases, you do not need to build the encoding yourself nor read its bytes manually. You only need to know that behind the code is a precise structure, and that any flaw in it means a non-compliant invoice. This understanding helps you ask the right question of your accounting-system provider: does it generate a QR code with TLV encoding compliant with the Authority’s requirements in both phases?

The most important point is that the correctness of the encoding is not a cosmetic option. A corrupt QR code makes the invoice unacceptable at verification, and may expose the business to a violation. That is why building the encoding is left to a certified system that guarantees its integrity in every invoice without exception.

A practical consideration regarding QR code capacity emerges here. Every byte in the TLV string adds to the size of the printed code. The simplified invoice with its five fields stays within a comfortable capacity, but adding the Phase Two fields (the hash and the cryptographic stamp) greatly enlarges the string. Here the value of TLV’s compactness shows: had a more wasteful wrapping format been used, the code would have swelled until it became hard to scan on small invoices. The Authority’s choice of a compact format was a practical decision that serves scannability on narrow thermal invoice paper.

So when your accounting system generates a QR code, it automatically balances including all mandatory fields with keeping the code at a scannable size. This balance is part of what it means for a system to be “certified”: it is not enough to build a structurally correct encoding, but to produce a practical code readable by an ordinary phone camera from a printed invoice.

How Qoyod helps you with TLV encoding and compliance

In practical reality, the business owner does not build the TLV string manually. A certified accounting system handles generating it automatically in every invoice. Qoyod’s electronic invoicing software is officially certified by the Zakat, Tax and Customs Authority, and it builds TLV triplets for every field, packs them in the correct order, encodes them in Base64, then injects the output into the QR code on the invoice without any manual intervention.

Qoyod also manages the Phase Two fields: it calculates the SHA-256 hash, applies the cryptographic stamp, manages the Cryptographic Stamp Identifier certificate (CSID) for compliance automatically, and stores the hash chain for later verification. The result is a correct, compliant QR code on every invoice, without the accountant worrying about byte structure or length calculation.

Start today

A correct QR code on every invoice, automatically

Let Qoyod build the TLV encoding, encode it in Base64, and inject it into a QR code compliant with the Zakat, Tax and Customs Authority on every invoice, without manual byte calculation.

Start your free trial and issue compliant invoices

Frequently asked questions

What does TLV mean in the invoice code?

TLV is an abbreviation of Tag-Length-Value, meaning “Tag, Length, Value.” It is a format for wrapping each invoice field in three consecutive parts: a tag that identifies the field type, a length that specifies the number of value bytes, then the value itself encoded in UTF-8.

How many bytes do the tag and length occupy in TLV?

The tag occupies one byte, and the length occupies one byte as well. The value, however, is variable in length, and equals exactly the number stored in the length byte.

Is the length calculated by character count or by bytes?

By byte count after UTF-8 encoding, not by character count. An Arabic character usually occupies two bytes, so an Arabic seller name of four characters equals 8 bytes in the length field, not 4.

Why is the TLV string encoded in Base64?

Because the TLV string is raw binary bytes that may contain non-printable values. Base64 converts it into text made of letters and numbers that are safe to inject into a QR code and scan with any phone app.

Does the TLV structure change between Phase One and Phase Two?

The “Tag, Length, Value” triplet structure does not change. Phase Two adds new fields (the hash, the cryptographic stamp, and the public key) with the same logic, but their values are encrypted binary instead of human text.

Do I need to build TLV encoding manually?

No. Any certified accounting system builds the TLV string, encodes it in Base64, and injects it into the QR code automatically in every invoice. Qoyod handles this entirely without manual intervention.

Guides

Continue your learning journey

Explore the rest of Qoyod’s guides, or start applying what you’ve learned.

Live webinars hosted by the Qoyod team to help you use the software easily and answer your questions.

Discover Qoyod’s latest updates, ongoing improvements, and new features in one place.

Our team is ready to help you and provide instant support for any issue you face, around the clock.