Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters ]

Arc de Triomphe clipart (2486 bytes)Pipework


Written by Simon Parker.

pipe_1.zip (112363 bytes)


Motivation

Unix promotes the reuse of other people’s work in many ways. One of the most powerful features is command line redirection.

Using the characters ‘<’ and ‘>’ you can arrange for a process to read from a file rather than the keyboard, or write to a file rather than the screen. The mechanism is external to the software; the application developer merely deals with ‘standard input’ and ‘standard output’.

The ‘|’ character links together the standard output of one process with the standard input of another to form a ‘pipe’. The processes run at the same time, and whatever the first process writes is immediately available to the second process.

This environment encourages a new kind of program, known as a ‘filter’ or ‘pipe fitting’. A filter reads from standard input and writes to standard output, changing the data on the way. It needs no file management or user interface, and expects to work with other filters.

This approach to software development is attractive to a reuse-conscious developers. Filters have tight cohesion but loose coupling. The most mature filters are well-documented and widely known. You can build a filter without following any complex rules.

On the other hand, there are some disadvantages. It needs the support and co-operation of an operating system (Unix in practice), the components are coarse-grained and must operate as separate processes, the interface between components is a single typeless byte stream, and control is limited to command-line arguments read at startup.

Pipework tries to bring the benefits to this technology to application software design, using the power of Eiffel to overcome the limitations of the Unix model.

Objectives

Pipework is intended to serve a wide variety of applications in many ways. It is a framework for whole applications, but individual components may be used alone without fuss.

The fittings are mostly lightweight, and using them imposes no severe burden on runtime performance or developer learning time. The library incorporates a few simple protocols, whch can be learnt completely and remembered easily.

The components are Eiffel classes, mostly generic. They are used at compile time, assembled into applications using Eiffel statements rather than ‘*Script’ (where * is Shell-, VB-, Java- or this month’s favourite). Although there is no explicit support for concurrency the general architecture seems well suited to parallel execution.

The compiler can ensure that applications are type-safe. As you connect them together the type of objects emitted by one fitting must be acceptable to its neighbour. Type is preserved through a pipe without resorting to casts, string parsing or unnecessary conversions. Eiffel’s carefully integrated support for genericity ensures the result is readable and comfortable to use.

The library is extensible in two ways: by composition and by inheritance. If you find a particular sequence of operations crops up frequently you can encapsulate the sequence in a new class. If you need a transformation or monitoring operation which is not provided you can add it to the hierarchy. The place where the new class belongs will be clear, and you will find inherited features which make it easy to write.

Although the program text has not been extensively used in live projects most of the library has been around since 1993 and the concepts have been applied in different circumstances. The text has been ported between development environments. This version is built upon the standard library (ELKS 95), and is intended to be acceptable to any Eiffel compiler.

Example

Here is a routine which returns the number of lines in a file.

lines_in_file (filename: STRING): INTEGER is
   -- The number of lines in file ‘filename’
   local
      file: FILE
      freader: FILE_STRING_SOURCE
      counter: SOURCE_COUNTER [STRING]
      pump: PIPE_PUMP [STRING]
   do
      !!file.open_read (filename)
      !!freader.connect (file)
      !!counter.connect (freader)
      !!pump.connect_inlet (counter)
      pump.run
      Result := counter.count
   end -- lines_in_file

The entity ‘file’ is not part of pipework. The FILE_STRING_SOURCE is a typical boundary component, an adapter used to add the ‘reader’ protocol to an external class.

‘Counter’ is a simple pipe fitting which counts the objects passing through it.

The file, reader and counter are passive. 'Pump' provides power to the arrangement. The statement ‘pump.run’ moves lines through the pipe until there are no more, then stops. At that point ‘counter.count’ contains the number of lines which have been seen.

 Architecture

Pipes are passive. The components which make up the pipe don't act on their own; they wait until something calls them. The active component which causes movement is a pump.

A typical pipework system consists of a source, some fittings connected to make an inlet pipe, a pump, more fittings making an outlet pipe, and perhaps a drain at the end.

The pump reads objects from the inlet pipe and writes them to the outlet pipe. Each cycle of the pump consists of one read and one write operation. Each read pulls an object from the original source, through any number of intervening pipe fittings, and into the pump. Each write pushes an object through any number of pipe fittings out to the ultimate destination.

Conventional batch programs tend to centre around a main loop during which some input and some output processing gets done. The pipework architecture is simpler, since the application's work is done in the pipe, one cycle at a time, where you don't have to worry about loop control, starting and stopping.

It is tempting to try to build a pipe fitting in such a way that it can work upstream and downstream of the pump. It turns out to be very difficult to do this; the behaviour of upstream fittings (sources) and downstream fittings (drains) is very different.

Some fittings naturally belong upstream, and others belong downstream. There are plenty of fittings which are useful anywhere. These need to be implemented twice, as a pair of complementary classes. It is often worth building a common ancestor to avoid duplicating code and to ensure consistency, and this pattern crops up occasionally in the pipework library. For example, the abstract class PIPE_COUNTER is the parent of SOURCE_COUNTER and DRAIN_COUNTER.

Names

The plumbing analogy is useful for an overall perspective, but computing terminology is more appropriate for identifying components precisely. These classes are not intended for use by na´ve end users, and terms like ARRAY and FILE are more helpful than TANK or CISTERN.

The dominant features of these components are concerned with reading, writing and connecting. In striving to achieve untainted object models in a world of functional decomposition we are naturally wary of class names like PAYROLL_CALCULATOR and INVOICE_PRINTER. The names of the principle abstractions were chosen with some hesitation, but have turned out to be appropriate.

Upstream classes are called READERS and downstream classes are called WRITERS. Readers connect to other readers, and writers connect to other writers. The name SOURCE applies to a pipe made of readers, or specifically to the first reader. The name DRAIN applies to an outlet pipe made of writers, or specifically to the last writer.

Most classes use one of the words source, drain, reader or writer as a prefix or suffix. The type of object passing through the pipe might also be included, where the pipe is not generic. The kind of source or drain might also appear, as in ARRAY or FILE. In general, natural sounding names are preferred over formal classification. Finally, some abstractions with dangerously common names have the prefix PIPE_ for modesty.

Readers

At the heart of the 'Reader' protocol is a pair of features implementing a pattern familiar to all Eiffel developers. Command 'read' tries to obtain an object, and query 'last_read' makes it available.

If you call 'read' repeatedly objects are made available one by one until the source is exhausted. If you call the features alternately you get each available object in sequence. If you don't call 'read' then 'last_read' just keeps returning the same object.

In general, readers handle objects using simple assignment. Expanded objects are returned by value, and for all others only references are passed on. There is no copy or clone unless the specification for a fitting explicitly mentions it.

Three queries complete the specification by making the state of the reader explicit, and specify when it is safe to call the other features.

'is_connected' confirms that the reader has been properly configured to do its job. For some fittings this is always true, but other fittings cannot operate without some preliminary setup. This is like checking a file is open before reading it.

'is_primed' indicates that 'read' has been called at least once. You may not call 'last_read' until the reader has been primed by a call to 'read'.

'is_at_end' means the source is exhausted. This condition becomes true if you call 'read' again after reading the last object. You may not call 'read' or 'last_read' any more.

These conditions are perhaps a little more strict than similar specifications, but they allow a broad variety of sources to be adapted as readers. Here is a table showing the four states.

State

1 2 3 4
is_connected? N Y Y -
is_primed? - N Y -
is_at_end? - - N Y
may call read? N Y Y N
may call last_read? N N Y N

Here is a typical sequence of operations on a source providing two objects. The brackets show the values of the queries 'is_connected', 'is_primed' and 'is_at_end' respectively.

  is_connected is_primed is_at_end
make N N N
connect Y N N
read Y Y N
last_read Y Y N
read Y Y N
last_read Y Y N
read Y Y Y

Here is a loop you might write to use a reader (if you aren't using a pump):

from
   r.read
until
   r.is_at_end
loop
   io.put_line (r.last_read)
   r.read
end

The abstract class READER encapsulates this protocol, and is the ancestor of all the source classes.

Writer

The writer protocol is less demanding than its upstream counterpart. A writer's only obligation is to accept whatever it is given. There is no way to retrieve what was written, or to know whether it got to any ultimate destination. The main feature is 'write', which disposes of one object in some way.

Writers work by assignment, like readers, and do not copy or clone objects unless it is specified.

Some writers might buffer items and not pass them on immediately. Such fittings need to know when the last object has been written, so they can clean up properly without losing objects. This is achieved by calling 'flush'. A writer must flush its own stored items if necessary, then call the 'flush' feature of its neighbour downstream.

The abstract class WRITER encapsulates this protocol.

Connect

Readers are connected to form an inlet pipe, and writers to form an outlet pipe. The rules are similar. There are three commands and two queries concern connections

'Connect' establishes a new source or drain, 'Disconnect' breaks the link. 'Reconnect' disconnects then connects in one call.

'Is_connected' gives the connection state, and feature 'source' or 'drain' gives a reference to the linked fitting.

When you connect or disconnect you must respect the state of the fitting. You may not call 'Connect' if it's already connected, or 'disconnect' if it is not connected.

Note that the connection is in one direction only: upstream for readers, downstream for writers. A reader knows who is supplying its objects, but not who is consuming them. A writer knows where to dispose of objects, but not where they come from.

For many writers, an onward connection is optional. A fitting such as a counter can do useful work at the end of the line, or may pass on objects for further processing. On the other hand it may make sense for a writer to handle multiple connections, and DRAIN_SPLITTER is an example of this.

When you connect, the source or drain should be ready for use. If the source is a reader it need not be primed ('read' is likely to be called first). Input and output files should certainly be open. The fittings make no attempt to recover from exceptions.

Two abstract classes capture this protocol. SOURCE_READER is a descendent of READER, and DRAIN_WRITER is a descendent of WRITER.

AT_READER

Reader is a simple protocol which can only provide serial access to a source of objects. On the other hand, data structure libraries provide a rich variety of containers offering various storage, search and retrieval strategies.

Pipework offers a small extension to the Reader protocol which provides direct access to objects by integer index. Like an array, you can request the object 'at position 5'. Unlike an array, you can still ask for 'the next object' too, without keeping track of the state of the iteration.

The abstract class AT_READER is an abstract mixin class encapsulating these principles. It introduces only two new features. Command 'read_at' tries to find the object at the specified index. If it exists, 'last_read' returns it. If not, 'is_at_end' becomes true (even if 'is_at_start' would be more accurate).

After any successful call to 'read' or 'read_at', query 'last_read_at' gives the index value corresponding to 'last_read'. Whenever 'is_at_end' is true, 'last_read_at' is one greater than the highest index value in the source.

You can freely mix calls to 'read_at' and 'read', to jump to a point and read forward from there.

The meaning of 'index value', and how it relates to the values returned, is up to the fitting to specify. The index maintained by an ARRAY_READER is the same as the array index. For other sources, the index is usually based at 1.

Class SOURCE_BUFFER is an adapter which adds such indexed access to any reader. Be aware, though, that it can ultimately cache the whole source in its memory.

FILES

Access to files is simple and versatile. Each file reader or writer has a set of creation procedures for standard input or output as well as named files. As the names suggest, they are generally used at the beginning or end of a pipe.

Creation procedures provide convenient ways to access input or output files, including standard input and output.

For input: 'make', 'open', 'connect_standard_input' or 'connect'.

For output: 'make', 'open_append', 'open_replace', 'connect_standard_output', 'connect_standard_error', 'connect'.

The file classes do not close the files they use. The feature 'source' or 'drain' allows the calling application to do so at the appropriate time.

The underlying abstraction FILE in the kernel library offers different features for reading and writing objects of different types. These are reflected in different classes in the Pipework library.

CLASSES

This is a classified list of all the classes which make up the Pipework base cluster.

Basic framework READER*
AT_READER*
SOURCE_READER*
WRITER*
DRAIN_WRITER*
Utilities PIPE_PUMP
PIPE_CACHE
PIPE_REGISTER -- porting
Files FILE_SOURCE*
FILE_CHARACTER_SOURCE
FILE_WORD_SOURCE
FILE_LINE_SOURCE
FILE_PAGE_SOURCE
FILE_INTEGER_SOURCE
FILE_REAL_SOURCE
FILE_DOUBLE_SOURCE
FILE_DRAIN*
FILE_CHARACTER_DRAIN
FILE_LINE_DRAIN
Strings STRING_CHARACTER_SOURCE
STRING_CHARACTER_READER
STRING_CHARACTER_WRITER
LINE_PARAGRAPH_WRITER -- porting (eplumb wljoiner)
Numeric PIPE_COUNTER*
SOURCE_COUNTER
SOURCE_TOTALLER -- proposed
DRAIN_COUNTER
DRAIN_TOTALLER -- proposed
Filters, adapters, converters SOURCE_BUFFER
BIT_READER -- porting
HEAD_READER -- proposed
TAIL_READER -- proposed
EVERY_READER -- proposed
HEAD_WRITER -- proposed
TAIL_WRITER -- proposed
EVERY_WRITER -- proposed
Demonstrations AS_PARAGRAPH -- porting (eplumb aspara)
IO_COPIER -- porting (eplumb ccopier)
MORSE_ENCODER -- porting (encoder morse_en, app morse to_morse)
CHAR_HISTOGRAM -- porting (app charhist charhist)
Tests Buffer
Cache
File_reader
Pump
Register -- porting
Bit_reader -- porting

Home Page ] [ Eiffel Archive ] [ Eiffel Classes and Clusters ]