MC description specification

Overview

Machine code description file or MC description file is defined as a YAML format. Here is an overview of MC description.

machine: # Describes a machine
  byteorder: little # Byte order of a machine
  extras: # [optional] User-defined data for a machine
    arch_type: arm

instructions: # Describes instructions
  - name: add_1 # Name of an instruction

    # Encoding format of an instruction
    format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12

    # [optional] Condition when an instruction applys
    match_condition: (cond in_range 0-14 and S == 1) or cond == 15

    # [optional] Condition when an instruction does not apply
    unmatch_condition: cond == 15

    extras: # [optional] User-defined data for an instruction
      clocks: 10
    field_extras: # [optional] User-defined data for each field
      Rn: {type: register}

decoder: # [optional] Decoder information of the global scope
  namespace: ns # [optional] Namespace for the symbols of a generated decoder
  # [optional] Name of the hook function to process user-specific information for an instruction
  process_instruction_hook: process_instruction

extras: # [optional] User-defined data for the global scope
  compiler: gcc

machine

machine describes the specification of a machine.

machine.byteorder

byteorder is the byte order of a machine. It can be big (big endian) or little (little endian).

machine.extras

extras defines user-defined data for a machine. Any structure can be defined as user-defined data.

Here is an example of defining a mapping as user-defined data.

extras:
  arch_type: arm

Here is an example of defining a sequence as user-defined data.

extras:
  - 10
  - 20

instructions

instructions describes the specification of instructions. Each list element in instructions represents an instruction.

instructions.name

name defines the name of an instruction.

instructions.format

format defines the encoding format of an instruction.

Instruction can be split into multiple parts, which are named instruction fields. One field can be split into several bit ranges. Each bit range is called a subfield in a field.

An instruction can be constructed by multiple N-byte words. Each word is called an encoding element in an instruction.

Here is an example of instruction fields (the fields ‘cond’, ‘S’, ‘Rn’, ‘Rd’ and ‘imm12’).

format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12

Here is an example of instruction subfields (the field ‘imm’ is split into several bit ranges).

format: 000:funct3|x:imm[5]|xxxx x:dest|xxx xx:imm[4:0]|01:op

Here is another example of instruction subfields (the field ‘offset’ is split into adjacent bit ranges).

format: 111:funct3|x xxxx x:offset[5:3,8:6]|xxx xx:src|10:op

Here is an example of encoding elements in an instruction (2 words of 16-bit).

format: xxxx:cond|00|1|0100|x:S|xxxx:Rn // xxxx:Rd|xxxx xxxx xxxx:imm12

Expression: <field_bits>:<field_name>[<field_bit_ranges>]|... // ...

Fields are separated by bar symbol(|). If an instruction is constructed by multiple encoding elements, they’re split by double-slash symbol(//).

where

<field_bits>

Encoding format of an instruction field

It takes an array of 0, 1 or x. 0 or 1 represents a fixed bit of an instruction field. x represents any bit.

<field_name>

[optional] Name of an instruction field

One field name can be used multiple times in one instruction.

<field_bit_ranges>

[optional] Field bit ranges of an instruction field

Expression: <subfield_start>:<subfield_end>,...

It takes multiple adjacent bit ranges. Bit ranges are separated by comma symbol(,).

<subfield_start>

MSB of a subfield in an instruction field

NOTE: This is a MSB in a field, not in an instruction.

<subfield_end>

[optional] LSB of a subfield in an instruction field

NOTE: This is an LSB in a field, not in an instruction.

instructions.match_condition

match_condition defines the condition when an instruction applys. The following condition types are supported.

  • Equality: The equality between a field value and a given value.

  • In a set: A field value is in a given value set.

  • In a range: A field value is in a given value range(inclusive).

These conditions can be combined with a logical operator and or or. You can also use ( and ) for grouping conditions or just for readability.

Equality condition

It defines a condition to test the equality between a field value and a given value.

When the field ‘cond’ equals 15,

match_condition: cond == 15

When the field ‘Rn’ equals the field ‘Rd’,

match_condition: Rn == Rd

When the 15th bit of the field ‘register_list’ equals 1,

match_condition: register_list[15] == 1

When the set bit count of the field ‘register_list’ is less than 2,

match_condition: setbit_count(register_list) < 2

Expression: <subject> <operator> <object>

where

<operator>

Equality operator to test

It can be ==, !=, <, <=, > or >=.

See also the common expressions below.

In-a-set condition

It defines a condition to test that a field value is in a given value set.

When the field ‘cond’ is in a set [13, 15],

match_condition: cond in [13, 15]

Expression: <subject> in <values>

where

<values>

Value set to test with

Expression: [<value>,...]

It takes multiple values with the separator , (comma symbol).

See also the common expressions below.

In-a-range condition

It defines a condition to test that a field value is in a given value range(inclusive).

When the field ‘cond’ is in a range from 10 to 15,

match_condition: cond in_range 10-15

Expression: <subject> in_range <value_start>-<value_end>

where

<value_start>

Start of a value range(inclusive)

Base 2, 10 or 16 integer values like 15, 0b1111, 0xf, etc.

<value_end>

End of a value range(inclusive)

Base 2, 10 or 16 integer values like 15, 0b1111, 0xf, etc.

See also the common expressions below.

Complex condition

The conditions explained before can be combined together.

Here’s an example of a combination of the conditions.

match_condition: |
  Rn != 15
  and (cond in_range 10-11 or cond in [13, 15])

Common expressions

<subject>

Subject to be tested

Expression: <field_object> or <function_object>

<object>

Object to test with

Expression: <value> or <field_object> or <function_object>

<function_object>

The result of a function call to be tested

Expression: <function>(<field_object>)

<function>

Function name that its result is to be tested

Supported functions are:

  • setbit_count: Count the set bit count of a given argument.

<field_object>

Field object to be tested

Expression: <field>[<field_element_index>]

<field>

Field name to be tested

<field_element_index>

[optional] Bit element index of a field

Base 10 integer value like 1, 2, etc.

<value>

Value to test the equality with

Base 2, 10 or 16 integer values like 15, 0b1111, 0xf, etc.

instructions.unmatch_condition

unmatch_condition defines the condition when an instruction does not apply. The expression is the same as that of match_condition. unmatch_condition is mutually exclusive with match_condition.

instructions.extras

extras defines user-defined data for an instruction. Any structure can be defined as user-defined data.

Here is an example of defining a mapping as user-defined data.

extras:
  clocks: 10

Here is an example of defining a sequence as user-defined data.

extras:
  - 10
  - 20

instructions.field_extras

field_extras defines user-defined data for each field. Each mapping key in field_extras represents a field defining user-defined data and its value holds user-defined data for the field. Any structure can be defined as user-defined data.

Here is an example of defining a mapping as user-defined data.

field_extras:
  Rn: {type: register} # User-defined data for the field 'Rn'
  imm12: {type: immediate} # User-defined data for the field 'imm12'

Here is an example of defining a sequence as user-defined data.

field_extras:
  Rn: [10, 20] # User-defined data for the field 'Rn'
  imm12: [30, 40] # User-defined data for the field 'imm12'

decoder

decoder is a decoder information for the global scope, which isn’t related to a machine, an instruction and a field.

decoder.namespace

namespace defines the namespace for the symbols of a generated decoder.

decoder.process_instruction_hook

Warning

This is an experimental feature. The name and form of this attribute might be changed in the future release.

process_instruction_hook defines the name of the hook function to process user-specific information for an instruction into a different form. The hook function must be defined in the python config file named config.py. Yoo must put the config file in the same directory as the MC description file.

The signature of the hook function is

def <name_of_hook_function>(instruction: InstructionDecoder) -> None:

Here’s an example of defining a hook function to process user-specific information.

decoder:
  process_instruction_hook: process_instruction
config.py
from mcdecoder.core import InstructionDecoder

def process_instruction(instruction: InstructionDecoder) -> None:
    extra_value = instruction.extras['extra_attribute']
    instruction.extras['extra_attribute'] = 'processed ' + extra_value

extras

extras defines user-defined data for the global scope, which isn’t related to a machine, an instruction and a field. Any structure can be defined as user-defined data.

Here is an example of defining a mapping as user-defined data.

extras:
  compiler: gcc
  version: 1.0

Here is an example of defining a sequence as user-defined data.

extras:
  - 10
  - 20

Additional specifications

!include tag

You can split a description into multiple files by using !include tag. You can use it anywhere in a description and the contents of specified files will be inserted into the place where the tag is. If multiple files are specified, the contents will be combined. The behavior depends on the types of the contents of included files:

  • Sequence: Produce one sequence consisting of the elements of all sequences.

  • Mapping: Produce one mapping consisting of the elements of all mappings.

  • Scalar: Produce one sequence consisting of scalars.

  • Mixture: Prompt an error.

Here’s an example of including instructions from other files.

arm.yaml
machine:
  byteorder: little
instructions: !include instructions/*_instructions.yaml
instructions/add_instructions.yaml
- name: add_1
  format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12
  unmatch_condition: cond == 15
instructions/push_instructions.yaml
- name: push_1
  format: xxxx:cond|1001 00|1|0|1101|xxxx xxxx xxxx xxxx:register_list
  match_condition: cond in_range 0-14

You can see this example in github.

Expression: !include <path-included>

where

<path-included>

A path to files included. You can use the wildcard character * (asterisk symbol) to specify multiple files.

Schema specification

The schema is defined with JSON Schema and explained with the terms of it.

type

object

properties

  • machine

type

object

properties

  • byteorder

type

string

enum

big, little

  • extras

additionalProperties

False

  • instructions

type

array

items

type

object

properties

  • name

type

string

pattern

^[A-Za-z][A-Za-z0-9_]*$

  • format

type

string

pattern

^[A-Za-z0-9_:\[\],|/\u0020\u0009\u000a\u000d]+$

  • match_condition

type

string

pattern

^[A-Za-z0-9_!=><\-\[\],()\u0020\u0009\u000a\u000d]+$

  • unmatch_condition

type

string

pattern

^[A-Za-z0-9_!=><\-\[\],()\u0020\u0009\u000a\u000d]+$

  • extras

  • field_extras

type

object

patternProperties

  • ^[A-Za-z][A-Za-z0-9_]*$

additionalProperties

False

additionalProperties

False

  • decoder

type

object

properties

  • namespace

type

string

pattern

^[A-Za-z][A-Za-z0-9_]*$

  • process_instruction_hook

type

string

pattern

^[A-Za-z][A-Za-z0-9_]*$

additionalProperties

False

  • extras

additionalProperties

False