Quickstart tutorial

You can generate a machine code decoder with mcdecoder by defining specifications of machine code. In this tutorial, you’ll see how to define machine code specifications and generate a decoder from the specification.

The guide steps of the tutorial are:

  1. Introduce an example instruction encoding to be decoded

  2. Write an MC description to express the encoding

  3. Check if the MC description is working

  4. Generate a decoder from the MC description

  5. Run the decoder from a C client code

1. Introduce an example instruction encoding to be decoded

In this tutorial, We use the ARM instructions below as an example.

  • ADD (immediate, ARM) Encoding A1

  • PUSH Encoding A1

We ignore instruction matching conditions using field values (e.g. Rn == 0b1111 and S == 0) here for simplicity.

The instruction encoding of ADD (immediate, ARM) Encoding A1

MSB

LSB

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

cond

0 0

1

0 1 0 0

S

Rn

Rn

imm12

The instruction encoding of PUSH Encoding A1

MSB

LSB

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

cond

1 0 0 1 0 0

1

0

1 1 0 1

register_list

2. Write an MC description to express the encoding

Write machine code specifications as a file. In mcdecoder, we call it a machine code description file or an MC description file. It’s defined as a YAML format.

You must add a sequence element of instructions for each instruction. Name each instruction and define an instruction encoding in format according to the encoding introduced before. See MC description specification to understand the grammar of MC description.

Make arm.yaml with the following content.

arm.yaml
machine:
  byteorder: little
instructions:
  - name: add_immediate_a1
    format: xxxx:cond|00|1|0100|x:S|xxxx:Rn|xxxx:Rd|xxxx xxxx xxxx:imm12
  - name: push_a1
    format: xxxx:cond|1001 00|1|0|1101|xxxx xxxx xxxx xxxx:register_list

3. Check if the MC description is working

Now you have a minimum MC description. Let’s check if it is working. For this, you can use mcdecoder emulate command to emulate a decoder behavior.

Input the machine code e28db004 for example. It means add FP, SP, #4 in ARM assembly language and the fields in its encoding format should be:

  • Rn = 13 (which is R13 or SP)

  • Rd = 11 (which is R11 or FP)

  • imm12 = 4

Run the command:

mcdecoder emulate arm.yaml --input e28db004

Its output will be…

instruction: add_immediate_a1

cond: 14, 0xe, 0b1110
S: 0, 0x0, 0b0
Rn: 13, 0xd, 0b1101
Rd: 11, 0xb, 0b1011
imm12: 4, 0x4, 0b100

Fine. It looks working.

See Command line option specification for more information about emulate sub-command if you’d like.

4. Generate a decoder from the MC description

Run mcdecoder generate command to generate a decoder from the MC description.

mcdecoder generate --output out arm.yaml

You’ll get generated files below:

out
├── mcdecoder.c
└── mcdecoder.h

See Command line option specification for more details about generate sub-command if you’d like.

5. Run the decoder from a C client code

Create a C client code to test the function of the generated decoder. Use a decoder API DecodeInstruction in the client.

bool DecodeInstruction(const DecodeRequest *request, DecodeResult *result);

In the client code, input the machine code e28db004 as you did with mcdecoder emulate and check if the result is the same.

Make the following C client code.

client.c
#include <stdio.h>
#include "out/mcdecoder.h"

int main(void) {
  /* Machine codes to be decoded */
  const uint8_t kMachineCodes[] = {
      0x04, 0xB0, 0x8D, 0xE2, /* add FP, SP, #4 */
  };

  /* Decode an instruction */
  DecodeRequest request;
  DecodeResult result;
  bool succeeded;

  request.codes = &kMachineCodes[0];
  succeeded = DecodeInstruction(&request, &result);

  /* Decoding succeeded? */
  if (!succeeded) {
    printf("Decoding failed.\n");

  } else {
    printf("Decoding succeeded.\n");

    /* Which instruction is decoded? */
    switch (result.instruction_id) {
      case InstructionId_k_add_immediate_a1:
        /* Get the decoded result of add_immediate_a1 */
        printf("Instruction: add_immediate_a1\n");
        printf("Rn: %d\nRd: %d\nimm12: %d\n", result.instruction.add_immediate_a1.Rn, result.instruction.add_immediate_a1.Rd,
               result.instruction.add_immediate_a1.imm12);
        break;
      case InstructionId_k_push_a1:
        /* Handle push_a1 */
        break;
      case InstructionId_kUnknown:
        /* Handle an unknown instruction */
        break;
      default:
        break;
    }
  }

  return 0;
}

Now compile and execute the client code to get the decoding result.

gcc client.c out/mcdecoder.c
./a.out

The result will be:

Decoding succeeded.
Instruction: add_immediate_a1
Rn: 13
Rd: 11
imm12: 4

Good! It coincides with the result of mcdecoder emulate and the tutorial is over.

See MC decoder API specification for more details about the decoder API. You can see example files in the tutorial in github.

What’s next?