MC description specification
Overview
Machine code description file or MC description file is defined as a YAML format. Here is an overview of MC description.
machine: # Describes a machine
byteorder: little # Byte order of a machine
extras: # [optional] User-defined data for a machine
arch_type: arm
instructions: # Describes instructions
- name: add_1 # Name of an instruction
# Encoding format of an instruction
format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12
# [optional] Condition when an instruction applys
match_condition: (cond in_range 0-14 and S == 1) or cond == 15
# [optional] Condition when an instruction does not apply
unmatch_condition: cond == 15
extras: # [optional] User-defined data for an instruction
clocks: 10
field_extras: # [optional] User-defined data for each field
Rn: {type: register}
decoder: # [optional] Decoder information of the global scope
namespace: ns # [optional] Namespace for the symbols of a generated decoder
# [optional] Name of the hook function to process user-specific information for an instruction
process_instruction_hook: process_instruction
extras: # [optional] User-defined data for the global scope
compiler: gcc
machine
machine
describes the specification of a machine.
machine.byteorder
byteorder
is the byte order of a machine.
It can be big
(big endian) or little
(little endian).
machine.extras
extras
defines user-defined data for a machine.
Any structure can be defined as user-defined data.
Here is an example of defining a mapping as user-defined data.
extras:
arch_type: arm
Here is an example of defining a sequence as user-defined data.
extras:
- 10
- 20
instructions
instructions
describes the specification of instructions.
Each list element in instructions
represents an instruction.
instructions.name
name
defines the name of an instruction.
instructions.format
format
defines the encoding format of an instruction.
Instruction can be split into multiple parts, which are named instruction fields. One field can be split into several bit ranges. Each bit range is called a subfield in a field.
An instruction can be constructed by multiple N-byte words. Each word is called an encoding element in an instruction.
Here is an example of instruction fields (the fields ‘cond’, ‘S’, ‘Rn’, ‘Rd’ and ‘imm12’).
format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12
Here is an example of instruction subfields (the field ‘imm’ is split into several bit ranges).
format: 000:funct3|x:imm[5]|xxxx x:dest|xxx xx:imm[4:0]|01:op
Here is another example of instruction subfields (the field ‘offset’ is split into adjacent bit ranges).
format: 111:funct3|x xxxx x:offset[5:3,8:6]|xxx xx:src|10:op
Here is an example of encoding elements in an instruction (2 words of 16-bit).
format: xxxx:cond|00|1|0100|x:S|xxxx:Rn // xxxx:Rd|xxxx xxxx xxxx:imm12
Expression: <field_bits>:<field_name>[<field_bit_ranges>]|... // ...
Fields are separated by bar symbol(|
).
If an instruction is constructed by multiple encoding elements,
they’re split by double-slash symbol(//
).
where
- <field_bits>
Encoding format of an instruction field
It takes an array of
0
,1
orx
.0
or1
represents a fixed bit of an instruction field.x
represents any bit.- <field_name>
[optional] Name of an instruction field
One field name can be used multiple times in one instruction.
- <field_bit_ranges>
[optional] Field bit ranges of an instruction field
Expression:
<subfield_start>:<subfield_end>,...
It takes multiple adjacent bit ranges. Bit ranges are separated by comma symbol(
,
).- <subfield_start>
MSB of a subfield in an instruction field
NOTE: This is a MSB in a field, not in an instruction.
- <subfield_end>
[optional] LSB of a subfield in an instruction field
NOTE: This is an LSB in a field, not in an instruction.
instructions.match_condition
match_condition
defines the condition when an instruction applys.
The following condition types are supported.
Equality: The equality between a field value and a given value.
In a set: A field value is in a given value set.
In a range: A field value is in a given value range(inclusive).
These conditions can be combined with a logical operator and
or
or
.
You can also use (
and )
for grouping conditions or
just for readability.
Equality condition
It defines a condition to test the equality between a field value and a given value.
When the field ‘cond’ equals 15,
match_condition: cond == 15
When the field ‘Rn’ equals the field ‘Rd’,
match_condition: Rn == Rd
When the 15th bit of the field ‘register_list’ equals 1,
match_condition: register_list[15] == 1
When the set bit count of the field ‘register_list’ is less than 2,
match_condition: setbit_count(register_list) < 2
Expression: <subject> <operator> <object>
where
- <operator>
Equality operator to test
It can be
==
,!=
,<
,<=
,>
or>=
.
See also the common expressions below.
In-a-set condition
It defines a condition to test that a field value is in a given value set.
When the field ‘cond’ is in a set [13, 15],
match_condition: cond in [13, 15]
Expression: <subject> in <values>
where
- <values>
Value set to test with
Expression:
[<value>,...]
It takes multiple values with the separator
,
(comma symbol).
See also the common expressions below.
In-a-range condition
It defines a condition to test that a field value is in a given value range(inclusive).
When the field ‘cond’ is in a range from 10 to 15,
match_condition: cond in_range 10-15
Expression: <subject> in_range <value_start>-<value_end>
where
- <value_start>
Start of a value range(inclusive)
Base 2, 10 or 16 integer values like
15
,0b1111
,0xf
, etc.- <value_end>
End of a value range(inclusive)
Base 2, 10 or 16 integer values like
15
,0b1111
,0xf
, etc.
See also the common expressions below.
Complex condition
The conditions explained before can be combined together.
Here’s an example of a combination of the conditions.
match_condition: |
Rn != 15
and (cond in_range 10-11 or cond in [13, 15])
Common expressions
- <subject>
Subject to be tested
Expression:
<field_object>
or<function_object>
- <object>
Object to test with
Expression:
<value>
or<field_object>
or<function_object>
- <function_object>
The result of a function call to be tested
Expression:
<function>(<field_object>)
- <function>
Function name that its result is to be tested
Supported functions are:
setbit_count
: Count the set bit count of a given argument.
- <field_object>
Field object to be tested
Expression:
<field>[<field_element_index>]
- <field>
Field name to be tested
- <field_element_index>
[optional] Bit element index of a field
Base 10 integer value like
1
,2
, etc.- <value>
Value to test the equality with
Base 2, 10 or 16 integer values like
15
,0b1111
,0xf
, etc.
instructions.unmatch_condition
unmatch_condition
defines the condition
when an instruction does not apply.
The expression is the same as that of match_condition
.
unmatch_condition
is mutually exclusive with match_condition
.
instructions.extras
extras
defines user-defined data for an instruction.
Any structure can be defined as user-defined data.
Here is an example of defining a mapping as user-defined data.
extras:
clocks: 10
Here is an example of defining a sequence as user-defined data.
extras:
- 10
- 20
instructions.field_extras
field_extras
defines user-defined data for each field.
Each mapping key in field_extras
represents a field
defining user-defined data and its value holds user-defined data for the field.
Any structure can be defined as user-defined data.
Here is an example of defining a mapping as user-defined data.
field_extras:
Rn: {type: register} # User-defined data for the field 'Rn'
imm12: {type: immediate} # User-defined data for the field 'imm12'
Here is an example of defining a sequence as user-defined data.
field_extras:
Rn: [10, 20] # User-defined data for the field 'Rn'
imm12: [30, 40] # User-defined data for the field 'imm12'
decoder
decoder
is a decoder information for the global scope,
which isn’t related to a machine, an instruction and a field.
decoder.namespace
namespace
defines the namespace for the symbols of a generated decoder.
decoder.process_instruction_hook
Warning
This is an experimental feature. The name and form of this attribute might be changed in the future release.
process_instruction_hook
defines the name of the hook function to process user-specific information for an instruction into a different form.
The hook function must be defined in the python config file named config.py
. Yoo must put the config file in the same directory as the MC description file.
The signature of the hook function is
def <name_of_hook_function>(instruction: InstructionDecoder) -> None:
Here’s an example of defining a hook function to process user-specific information.
decoder:
process_instruction_hook: process_instruction
from mcdecoder.core import InstructionDecoder
def process_instruction(instruction: InstructionDecoder) -> None:
extra_value = instruction.extras['extra_attribute']
instruction.extras['extra_attribute'] = 'processed ' + extra_value
extras
extras
defines user-defined data for the global scope,
which isn’t related to a machine, an instruction and a field.
Any structure can be defined as user-defined data.
Here is an example of defining a mapping as user-defined data.
extras:
compiler: gcc
version: 1.0
Here is an example of defining a sequence as user-defined data.
extras:
- 10
- 20
Additional specifications
!include tag
You can split a description into multiple files by using !include
tag.
You can use it anywhere in a description and
the contents of specified files will be inserted into the place
where the tag is.
If multiple files are specified, the contents will be combined.
The behavior depends on the types of the contents of included files:
Sequence: Produce one sequence consisting of the elements of all sequences.
Mapping: Produce one mapping consisting of the elements of all mappings.
Scalar: Produce one sequence consisting of scalars.
Mixture: Prompt an error.
Here’s an example of including instructions from other files.
machine:
byteorder: little
instructions: !include instructions/*_instructions.yaml
- name: add_1
format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12
unmatch_condition: cond == 15
- name: push_1
format: xxxx:cond|1001 00|1|0|1101|xxxx xxxx xxxx xxxx:register_list
match_condition: cond in_range 0-14
You can see this example in github.
Expression: !include <path-included>
where
- <path-included>
A path to files included. You can use the wildcard character
*
(asterisk symbol) to specify multiple files.
Schema specification
The schema is defined with JSON Schema and explained with the terms of it.
type |
object |
|||
properties |
||||
|
type |
object |
||
properties |
||||
|
type |
string |
||
enum |
big, little |
|||
|
||||
additionalProperties |
False |
|||
|
type |
array |
||
items |
type |
object |
||
properties |
||||
|
type |
string |
||
pattern |
^[A-Za-z][A-Za-z0-9_]*$ |
|||
|
type |
string |
||
pattern |
^[A-Za-z0-9_:\[\],|/\u0020\u0009\u000a\u000d]+$ |
|||
|
type |
string |
||
pattern |
^[A-Za-z0-9_!=><\-\[\],()\u0020\u0009\u000a\u000d]+$ |
|||
|
type |
string |
||
pattern |
^[A-Za-z0-9_!=><\-\[\],()\u0020\u0009\u000a\u000d]+$ |
|||
|
||||
|
type |
object |
||
patternProperties |
||||
|
||||
additionalProperties |
False |
|||
additionalProperties |
False |
|||
|
type |
object |
||
properties |
||||
|
type |
string |
||
pattern |
^[A-Za-z][A-Za-z0-9_]*$ |
|||
|
type |
string |
||
pattern |
^[A-Za-z][A-Za-z0-9_]*$ |
|||
additionalProperties |
False |
|||
|
||||
additionalProperties |
False |