reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212
  213
  214
  215
  216
  217
  218
  219
  220
  221
  222
  223
  224
  225
  226
  227
  228
  229
  230
  231
  232
  233
  234
  235
  236
  237
  238
  239
  240
  241
  242
  243
  244
  245
  246
  247
  248
  249
  250
  251
  252
  253
  254
  255
  256
  257
  258
  259
  260
  261
  262
  263
  264
  265
  266
  267
  268
  269
  270
  271
  272
  273
  274
  275
  276
  277
  278
  279
  280
  281
  282
  283
  284
  285
  286
  287
  288
  289
  290
  291
  292
  293
  294
  295
  296
  297
  298
  299
  300
  301
  302
  303
  304
========
TableGen
========

.. contents::
   :local:

.. toctree::
   :hidden:

   BackEnds
   LangRef
   LangIntro
   Deficiencies

Introduction
============

TableGen's purpose is to help a human develop and maintain records of
domain-specific information.  Because there may be a large number of these
records, it is specifically designed to allow writing flexible descriptions and
for common features of these records to be factored out.  This reduces the
amount of duplication in the description, reduces the chance of error, and makes
it easier to structure domain specific information.

The core part of TableGen parses a file, instantiates the declarations, and
hands the result off to a domain-specific `backend`_ for processing.

The current major users of TableGen are :doc:`../CodeGenerator`
and the
`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.

Note that if you work on TableGen much, and use emacs or vim, that you can find
an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
``llvm/utils/vim`` directories of your LLVM distribution, respectively.

.. _intro:


The TableGen program
====================

TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
on your build directory under `bin`. It is not installed in the system (or where
your sysroot is set to), since it has no use beyond LLVM's build process.

Running TableGen
----------------

TableGen runs just like any other LLVM tool.  The first (optional) argument
specifies the file to read.  If a filename is not specified, ``llvm-tblgen``
reads from standard input.

To be useful, one of the `backends`_ must be used.  These backends are
selectable on the command line (type '``llvm-tblgen -help``' for a list).  For
example, to get a list of all of the definitions that subclass a particular type
(which can be useful for building up an enum list of these records), use the
``-print-enums`` option:

.. code-block:: bash

  $ llvm-tblgen X86.td -print-enums -class=Register
  AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
  ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
  MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
  R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
  R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
  RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
  XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
  XMM6, XMM7, XMM8, XMM9,

  $ llvm-tblgen X86.td -print-enums -class=Instruction 
  ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
  ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
  ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
  ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
  ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...

The default backend prints out all of the records. There is also a general
backend which outputs all the records as a JSON data structure, enabled using
the `-dump-json` option.

If you plan to use TableGen, you will most likely have to write a `backend`_
that extracts the information specific to what you need and formats it in the
appropriate way. You can do this by extending TableGen itself in C++, or by
writing a script in any language that can consume the JSON output.

Example
-------

With no other arguments, `llvm-tblgen` parses the specified file and prints out all
of the classes, then all of the definitions.  This is a good way to see what the
various definitions expand to fully.  Running this on the ``X86.td`` file prints
this (at the time of this writing):

.. code-block:: text

  ...
  def ADD32rr {   // Instruction X86Inst I
    string Namespace = "X86";
    dag OutOperandList = (outs GR32:$dst);
    dag InOperandList = (ins GR32:$src1, GR32:$src2);
    string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
    list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
    list<Register> Uses = [];
    list<Register> Defs = [EFLAGS];
    list<Predicate> Predicates = [];
    int CodeSize = 3;
    int AddedComplexity = 0;
    bit isReturn = 0;
    bit isBranch = 0;
    bit isIndirectBranch = 0;
    bit isBarrier = 0;
    bit isCall = 0;
    bit canFoldAsLoad = 0;
    bit mayLoad = 0;
    bit mayStore = 0;
    bit isImplicitDef = 0;
    bit isConvertibleToThreeAddress = 1;
    bit isCommutable = 1;
    bit isTerminator = 0;
    bit isReMaterializable = 0;
    bit isPredicable = 0;
    bit hasDelaySlot = 0;
    bit usesCustomInserter = 0;
    bit hasCtrlDep = 0;
    bit isNotDuplicable = 0;
    bit hasSideEffects = 0;
    InstrItinClass Itinerary = NoItinerary;
    string Constraints = "";
    string DisableEncoding = "";
    bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
    Format Form = MRMDestReg;
    bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
    ImmType ImmT = NoImm;
    bits<3> ImmTypeBits = { 0, 0, 0 };
    bit hasOpSizePrefix = 0;
    bit hasAdSizePrefix = 0;
    bits<4> Prefix = { 0, 0, 0, 0 };
    bit hasREX_WPrefix = 0;
    FPFormat FPForm = ?;
    bits<3> FPFormBits = { 0, 0, 0 };
  }
  ...

This definition corresponds to the 32-bit register-register ``add`` instruction
of the x86 architecture.  ``def ADD32rr`` defines a record named
``ADD32rr``, and the comment at the end of the line indicates the superclasses
of the definition.  The body of the record contains all of the data that
TableGen assembled for the record, indicating that the instruction is part of
the "X86" namespace, the pattern indicating how the instruction is selected by
the code generator, that it is a two-address instruction, has a particular
encoding, etc.  The contents and semantics of the information in the record are
specific to the needs of the X86 backend, and are only shown as an example.

As you can see, a lot of information is needed for every instruction supported
by the code generator, and specifying it all manually would be unmaintainable,
prone to bugs, and tiring to do in the first place.  Because we are using
TableGen, all of the information was derived from the following definition:

.. code-block:: text

  let Defs = [EFLAGS],
      isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y
      isConvertibleToThreeAddress = 1 in // Can transform into LEA.
  def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst),
                                     (ins GR32:$src1, GR32:$src2),
                   "add{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;

This definition makes use of the custom class ``I`` (extended from the custom
class ``X86Inst``), which is defined in the X86-specific TableGen file, to
factor out the common features that instructions of its class share.  A key
feature of TableGen is that it allows the end-user to define the abstractions
they prefer to use when describing their information.

Syntax
======

TableGen has a syntax that is loosely based on C++ templates, with built-in
types and specification. In addition, TableGen's syntax introduces some
automation concepts like multiclass, foreach, let, etc.

Basic concepts
--------------

TableGen files consist of two key parts: 'classes' and 'definitions', both of
which are considered 'records'.

**TableGen records** have a unique name, a list of values, and a list of
superclasses.  The list of values is the main data that TableGen builds for each
record; it is this that holds the domain specific information for the
application.  The interpretation of this data is left to a specific `backend`_,
but the structure and format rules are taken care of and are fixed by
TableGen.

**TableGen definitions** are the concrete form of 'records'.  These generally do
not have any undefined values, and are marked with the '``def``' keyword.

.. code-block:: text

  def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
                                        "Enable ARMv8 FP">;

In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
with some values. The names of the classes are defined via the
keyword `class` either on the same file or some other included. Most target
TableGen files include the generic ones in ``include/llvm/Target``.

**TableGen classes** are abstract records that are used to build and describe
other records.  These classes allow the end-user to build abstractions for
either the domain they are targeting (such as "Register", "RegisterClass", and
"Instruction" in the LLVM code generator) or for the implementor to help factor
out common properties of records (such as "FPInst", which is used to represent
floating point instructions in the X86 backend).  TableGen keeps track of all of
the classes that are used to build up a definition, so the backend can find all
definitions of a particular class, such as "Instruction".

.. code-block:: text

 class ProcNoItin<string Name, list<SubtargetFeature> Features>
       : Processor<Name, NoItineraries, Features>;

Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
a list of target features is specializing the class Processor by passing the
arguments down as well as hard-coding NoItineraries.

**TableGen multiclasses** are groups of abstract records that are instantiated
all at once.  Each instantiation can result in multiple TableGen definitions.
If a multiclass inherits from another multiclass, the definitions in the
sub-multiclass become part of the current multiclass, as if they were declared
in the current multiclass.

.. code-block:: text

  multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
                          dag address, ValueType sty> {
  def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
            (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
              Base, Offset, Extend)>;

  def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
            (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
              Base, Offset, Extend)>;
  }

  defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
                        !foreach(decls.pattern, address,
                                 !subst(SHIFT, imm_eq0, decls.pattern)),
                        i8>;



See the :doc:`TableGen Language Introduction <LangIntro>` for more generic
information on the usage of the language, and the
:doc:`TableGen Language Reference <LangRef>` for more in-depth description
of the formal language specification.

.. _backend:
.. _backends:

TableGen backends
=================

TableGen files have no real meaning without a back-end. The default operation
of running ``llvm-tblgen`` is to print the information in a textual format, but
that's only useful for debugging of the TableGen files themselves. The power
in TableGen is, however, to interpret the source files into an internal 
representation that can be generated into anything you want.

Current usage of TableGen is to create huge include files with tables that you
can either include directly (if the output is in the language you're coding),
or be used in pre-processing via macros surrounding the include of the file.

Direct output can be used if the back-end already prints a table in C format
or if the output is just a list of strings (for error and warning messages).
Pre-processed output should be used if the same information needs to be used
in different contexts (like Instruction names), so your back-end should print
a meta-information list that can be shaped into different compile-time formats.

See the `TableGen BackEnds <BackEnds.html>`_ for more information.

TableGen Deficiencies
=====================

Despite being very generic, TableGen has some deficiencies that have been
pointed out numerous times. The common theme is that, while TableGen allows
you to build Domain-Specific-Languages, the final languages that you create
lack the power of other DSLs, which in turn increase considerably the size
and complexity of TableGen files.

At the same time, TableGen allows you to create virtually any meaning of
the basic concepts via custom-made back-ends, which can pervert the original
design and make it very hard for newcomers to understand the evil TableGen
file.

There are some in favour of extending the semantics even more, but making sure
back-ends adhere to strict rules. Others are suggesting we should move to less,
more powerful DSLs designed with specific purposes, or even re-using existing
DSLs.

Either way, this is a discussion that will likely span across several years,
if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
document.