[ Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters]
CODEGEN: Code Generation Utility
Written by Patrick Doyle.
codegen-v1.tgz (22,357 bytes) - source code
http://www.ecf.toronto.edu/~doylep/ (Patrick Doyle's home page)
All files in this package are copyright (c) Patrick Doyle 1999, and are covered under the Eiffel Forum Freeware License. See forum.txt for details.
Codegen was developed under Red Hat Linux 6.0 using SmallEiffel -0.78 and Gobo 1.5. Other platforms and tools may or may not work.
Codegen is a utility which (surprise!) generates code. Specifically, it reads a "model", which is a description of the types of objects to be manipulated by the generated code; and a set of "scripts", which describe how to traverse the model to generate code.
The model files use a language somewhat like a stripped-down Eiffel, to describe the types of items and their relationships. A typical model for a procedural language might contain something like this:ROUTINE formal_arg$: VARIABLE -- '$' means one-to-many relation body: BLOCK end FUNCTION < ROUTINE result_variable: VARIABLE end FUNCTION_CALL < EXPRESSION target: FUNCTION actual_arg$: EXPRESSION end
The UPPERCASE names are type names; each describes a type of entity found in programs written in this fictitious procedural language. The lowercase names are "relations" (similar to Eiffel features). A dollar-sign ("$") at the end of a relation name makes it a "plural" (ie. one-to-many) relation. (It's supposed to be reminiscient of the 's' at the end of a plural noun.)
Once Codegen has read a model, it runs one or more scripts. Each script describes how to traverse the model, and what code should be generated for each part of it. As a matter of fact, Codegen is used to generate some of its own code; the classes involved in the model and script structures are all generated by Codegen itself.
Currently, Codegen is only targeted to SmallEiffel -0.78 under Linux, and has not been tested on any other platform.
Codegen is based on two instances of the Visitor pattern: a model visitor and a script visitor. Codegen parses a model and builds a model structure representing it. It can then use a model visitor to perform whatever action is desired for that model. For instance, one of the visitors is a "name resolver", which takes named references to each entity (such as the supertype of a type) and replaces them with actual references to the object which represents that entity.
Codegen then does a similar thing with scripts, parsing them and building a structure representing them.
The key class for code generation is the SCRIPT_RUNNER class. It is both a model visitor *and* a script visitor. It traverses the script and the model together in order to generate the appropriate code.
As for where to find things:
The root class for Codegen is ./programs/codegen.e. Make "gobo" a symbolic link to the root directory of your Gobo 1.5 installation.
Code related to the model files is kept in the ./model directory, and that for the script files is kept in ./script. All generated code--recall that codegen generates some of its own code--ends up in the ./gen directory; thus, there is no particular need to inspect this code; simply look at the corresponding model and script files (which have .cgm and .cgs extensions). However, the generated code is provided in this distribution to allow Codegen to be bootstrapped.
Classes which serve as root classes of a program are kept in ./programs in order to make Makefiles simpler. The script directory has a "conditions" subdirectory which contains all the classes used to model if-statement conditions in scripts.
Note that the test programs, scripts, and models are not guaranteed to be up-to-date. Time constraints prevent me from maintaining these files properly. Simply using Codegen to generate its own code provides a more rigrous test anyway; however, that is no excuse for having no regression-testing capabilities in place.
As explained above, the model files describe the types of things manipulated by the generated code, and their relations. Also, attributes (ie. name-value pairs) can be specified for a particular relation or type, or for the model as a whole. Attribute definitions are preceeded by the "@" symbol (which can be understood as "at", short for "attribute"), as in "@name=Patrick". These attributes can be used by scripts to customize code generation.
There are some predefined attributes in every model. For instance, relations have built-in "name" and "type_name" attributes. They also have a "plural" attribute, which is the pluralized version of the relation name, generated by replacing the dollar-sign symbol with an "s". Any built-in attribute can be overridden by an explicit attribute definition. For instance:STUDENT class$: CLASS @plural=classes end
This definition overrides the default plural "classs" with the more appropriate "classes".
A type can inherit from another type, in which case it (conceptually) gets all the supertype's relations and attributes. However, when executing a script, those relations and attributes are not included in the model traversal; this provides a means for scripts to deal with only the *new* relations and attributes added by the subtype. Scripts can make use of recursion to access the relations and attributes of supertypes.
As I write this, Codegen scripts use a rather ugly, but simple, syntax. There are two kinds of content in a script: literals, and "meta"-content. Meta-content is the term I use for the commands embedded in scripts to direct the traversal of, and access to, the model.
Scripts begin in literal content by default. There are two ways to introduce meta-content:
- Start with a backquote (`) and end with an apostrophe (')
- Start with a dollar-sign ($) and end with a newline
There is no practical difference between these; the parser treats them identically. Use whichever is more convenient.
Literal content is always copied as-is to the output file, whitespace and all. Meta-content looks more like a normal programming language, and until I write some better documentation, the best advice is to look at the .cgs files included with this project to get an idea of what they can do.
Note that script files, as well as model files, *are* case-sensitive, in the sense that the case used for different constructs is mandated. For instance, keywords in scripts are all-caps, and attribute names are lowercase.
Codegen evolved from an earlier tool called "Semgen" (for Semantic Generator), which was used to generate some of the code for compilers based on a semantic description of the language being compiled. Semgen was developed in October 1998 during my undergraduate thesis project which involved writing a compiler for a language that we were making up as we went along. Thus, it was important to make the language's semantics as flexible as possible; and by distilling the semantics into a model description file, and generating the actual Eiffel code from that file, it became much easier to change the language as we needed.
As the project progressed, it became clear that the Semgen program itself could benefit from the same code generation. After all, Semgen was, in a sense, a compiler which translated model descriptions into Eiffel. Thus, I rewrote Semgen using itself to generate much of its own code. This simplified Semgen's maintenance dramatically.
The major drawback of Semgen was that the actual Eiffel code being generated was hard-coded into Semgen, in the form of hundreds of manifest strings. Since Semgen relied on itself (or, more precisely, a previous version of itself) to generate its own source code, it became cumbersome to modify the generated code. The idea occurred to me to store the strings in a file that I could edit.
To make a long story short, I soon realized that a more generalized traversal/code-generation scripting language would be much better than simple strings; and, using Eric Bezault's Gobo tools, it would not be too hard to implement either. However, time was pressing on my thesis project, so Semgen was a low priority.
A few months after I graduated, I became interested in writing some tools to process Eiffel source code, and I knew that Semgen would be useful. At that point, I had the time to write it properly, so I changed the name to Codegen and started again from scratch.
In the first phase of development, I hacked together hand-written versions of the classes in Codegen. Once that first phase was finished, I wrote metamodels and scripts describing the Codegen model and script files themselves, and used Codegen to generate its own source code. As I write this, I have just completed this phase.
The structure of Codegen's source code is now set, and the major features are in place.
There are several aspects of Codegen which are less than perfect, and I hope to address them soon:
- The code should make use of the Gobo data structure libraries.
- The syntax of the script files seemed good at first, but the actual script files are hard to read. I want to rework the syntax and make the scripts cleaner and easier to maintain.
- Inheritance is not yet implemented for models or scripts.
- A "separator" feature is needed. When iterating over items in a collection and generating code from them, it is difficult to specify text that should go between the items. (Before is easy; after is easy; between is hard.) For instance, currently I have to resort to putting a comma *after* each argument, and then adding a "dummy: NONE" argument so the result is valid Eiffel. This is unacceptable.
- Script parser error handling is extremely weak. Basically, I have made no effort to handle script parse errors gracefully. I will modify the grammar and other parser code to improve this as much as possible.
- Currently, script macros and attributes inhabit two different namespaces, and it's possible two define a macro and an attribute with the same name, in which case they would be difficult to distinguish. I should rectify this somehow.
- Some of the terminology is lame. For instance, a relation has a "type" and an "outer type". What the heck are these supposed to mean? Instead, I should borrow from mathematics, and talk about the "domain" and "range" types of a relation.
Eventually, too, I hope to remove the hard-coded metamodel and allow that to be specified. That is, model files currently consist of types, relations, and attributes, and this fact is hard-coded into Codegen; however, I'd like to make this metamodel part of the input to Codegen. The trouble is that the parsing of the model files is based on the metamodel, so changing the metamodel means changing the parser, which is difficult. I'm currently exploring different solutions to this problem.
Thanks to Eric Bezault once again for his excellent Gobo library and tools. I would not have bothered with this project without gelex and geyacc.
Thanks also to Geoff Eldridge for encouraging me to finish Codegen and submit it to the Eiffel Struggle.
If you have questions or comments, please email me at firstname.lastname@example.org.
(c) Patrick Doyle Oct 4 1999
[ Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters]