3.9. Object Files and Sections

The object file target is very useful for large project because it allows multiple files to be assembled independently and then linked into the final binary at a later time. It allows only the small portion of the project that was modified to be re-assembled rather than requiring the entire set of source code to be available to the assembler in a single assembly process. This can be particularly important if there are a large number of macros, symbol definitions, or other metadata that uses resources at assembly time. By far the largest benefit, however, is keeping the source files small enough for a mere mortal to find things in them.

With multi-file projects, there needs to be a means of resolving references to symbols in other source files. These are known as external references. The addresses of these symbols cannot be known until the linker joins all the object files into a single binary. This means that the assembler must be able to output the object code without knowing the value of the symbol. This places some restrictions on the code generated by the assembler. For example, the assembler cannot generate direct page addressing for instructions that reference external symbols because the address of the symbol may not be in the direct page. Similarly, relative branches and PC relative addressing cannot be used in their eight bit forms. Everything that must be resolved by the linker must be assembled to use the largest address size possible to allow the linker to fill in the correct value at link time. Note that the same problem applies to absolute address references as well, even those in the same source file, because the address is not known until link time.

It is often desired in multi-file projects to have code of various types grouped together in the final binary generated by the linker as well. The same applies to data. In order for the linker to do that, the bits that are to be grouped must be tagged in some manner. This is where the concept of sections comes in. Each chunk of code or data is part of a section in the object file. Then, when the linker reads all the object files, it coalesces all sections of the same name into a single section and then considers it as a unit.

The existence of sections, however, raises a problem for symbols even within the same source file. Thus, the assembler must treat symbols from different sections within the same source file in the same manner as external symbols. That is, it must leave them for the linker to resolve at link time, with all the limitations that entails.

In the object file target mode, LWASM requires all source lines that cause bytes to be output to be inside a section. Any directives that do not cause any bytes to be output can appear outside of a section. This includes such things as EQU or RMB. Even ORG can appear outside a section. ORG, however, makes no sense within a section because it is the linker that determines the starting address of the section's code, not the assembler.

All symbols defined globally in the assembly process are local to the source file and cannot be exported. All symbols defined within a section are considered local to the source file unless otherwise explicitly exported. Symbols referenced from external source files must be declared external, either explicitly or by asking the assembler to assume that all undefined symbols are external.

It is often handy to define a number of memory addresses that will be used for data at run-time but which need not be included in the binary file. These memory addresses are not initialized until run-time, either by the program itself or by the program loader, depending on the operating environment. Such sections are often known as BSS sections. LWASM supports generating sections with a BSS attribute set which causes the section definition including symbols exported from that section and those symbols required to resolve references from the local file, but with no actual code in the object file. It is illegal for any source lines within a BSS flagged section to cause any bytes to be output.

The following directives apply to section handling.

SECTION name[,flags], SECT name[,flags], .AREA name[,flags]

Instructs the assembler that the code following this directive is to be considered part of the section name. A section name may appear multiple times in which case it is as though all the code from all the instances of that section appeared adjacent within the source file. However, flags may only be specified on the first instance of the section.

flags is a comma separated list of flags. If a flag is "bss", the section will be treated as a BSS section and no statements that generate output are permitted.

If the flag is "constant", the same restrictions apply as for BSS sections. Additionally, all symbols defined in a constant section define absolute values and will not be adjusted by the linker at link time. Constant sections cannot define complex expressions for symbols; the value must be fully defined at assembly time. Additionally, multiple instances of a constant section do not coalesce into a single addressing unit; each instance starts again at offset 0.

If the section name is "bss" or ".bss" in any combination of upper and lower case, the section is assumed to be a BSS section. In that case, the flag !bss can be used to override this assumption.

If the section name is "_constants" or "_constant", in any combination of upper and lower case, the section is assumed to be a constant section. This assumption can be overridden with the "!constant" flag.

If assembly is already happening within a section, the section is implicitly ended and the new section started. This is not considered an error although it is recommended that all sections be explicitly closed.

ENDSECTION, ENDSECT

This directive ends the current section. This puts assembly outside of any sections until the next SECTION directive. ENDSECTION is the preferred form. Prior to version 3.0 of LWASM, ENDS could also be used to end a section but as of version 3.0, it is now an alias for ENDSTRUCT instead.

sym EXTERN, sym EXTERNAL, sym IMPORT

This directive defines sym as an external symbol. This directive may occur at any point in the source code. EXTERN definitions are resolved on the first pass so an EXTERN definition anywhere in the source file is valid for the entire file. The use of this directive is optional when the assembler is instructed to assume that all undefined symbols are external. In fact, in that mode, if the symbol is referenced before the EXTERN directive, an error will occur.

sym EXPORT, sym .GLOBL, EXPORT sym, .GLOBL sym

This directive defines sym as an exported symbol. This directive may occur at any point in the source code, even before the definition of the exported symbol.

Note that sym may appear as the operand or as the statement's symbol. If there is a symbol on the statement, that will take precedence over any operand that is present.

sym EXTDEP

This directive forces an external dependency on sym, even if it is never referenced anywhere else in this file.