[ Home Page ] [ Eiffel Archive ] [ Tools for Eiffel Developers | Eiffel Classes and Clusters ]

Arc de Triomphe clipart (2486 bytes)

Eiffel Metrics System


Written by Friedrich Dominicus.

ems.tar.gz (900,246 bytes) - source code and binaries


DON'T USE THIS SOFTWARE WITHOUT READING THE Copyright.notice AT THE END. Short declaration: USE THIS SOFTWARE FOR WHATEVER YOU WANT, BUT DON'T PURSUE ME FOR ANYTHING.

INSTALLATION

Precondition use of ISE (C) Eiffel >= 3.3.7 eventually you have to adapt some parts of the Ace.ace file. The format is zipped tar. So you just have to type tar -xzvf ems.tgz. A new directory ems is created.

Please set up an Environment-Variable called EMS_HOME, in that directory the files eiffel_regular and/or eiffel_lex should be found. eiffel_regular is the grammar for Eiffel and eiffel_lex is an anyalyzer stored internally by the lex-classes. A word of CAUTION: eiffel_lex is stored in a completely unportable manner. If you remove your EIFGEN-directory, rebuild the system, the new system won't be able to reread this file so just remove this file, than it is new-generated from the Eiffel-Grammar.

An Example: you've unpacked ems.tgz in /home/whoever/ then you have a directory ems so set EMS_HOME to /home/whoever/ems. That's all. If you like to have ems available without giving it the complete path then put the following in your startup-resources

export EMS_HOME=/path_to_eiffel_regular_and_eiffel_lex
export PATH=$PATH:$EMS_HOME/bin

After this you can just change to that directory and run ebench or you just can run the programs from the command-line.

SYSTEMS

ems should run on any system which has an ISE-Eiffel version. Nothing special is used, but I just tested and run it under Linux. My system is now Debian 2.0 glib-based. ISE-Eiffel V 4.2 for Linux. The first version was implemented and run in parallel on Linux an Digital Unix V 4.x. And it was no problem to port from on machine on the other.

IMPORTANT

Please send me a short note if you use ems. What would you like to be included etc. I can't find all faults by myself so please help me to get a good system. And if you find the time please send suggestions what could be done better.

VERSION

1.1 04-10-98

1.2 13-10-98

BUGS

surely a lot around, but actually I don't know of one. Don't know who BIT N classes will be analyzed.

WEAKNESSES

The sources which you fed to ems should be compilable without problems, the Results of not compilable source codes are not predictable and with 100% not correct.

TODO

USE

Not sophisticated maybe it's now in a beta stage. It works normally without any problems and shouldn't do anything with your files (I open the classes for read-only). If you want to get sure that the programm doesn't harm any of your data work with copies.

USAGE: ems [-b] [-h] [-i] [-v <INTEGER> 1..3] 
       -b: breaks during output, if the user want to read s.th, default is no 
	   break
       -h: help: prints out a usage message and exits
       -i: interactive: goes into interactive mode after the files on the 
           command-line have been analyzed. If you call ems just with 'ems'
           then interactive usage is the default, otherwise the program
           terminates after analyzing the classes on the command-line
       -v: verbosity level 1 stand for very verbose output, 3 for compact
           output, default value is 3.

What do I have to do to run the programm?

With version 1.2 I included 2 finalized binaries. One with assertion turned on, one whithout assertions. You just can use them if you have an glibc based Linux.

just do a:

ems dummy.e 

and class dummy.e will be analysed, you should be able to redirect input/output to collect values.

If the program crashes with segmentation violation run the last command but use "ems_with_assertions" instead. And send me please an e-mail with the class which lead to this exception, and the output from Eiffel. Will try to fix the problem ASAP ;-)

RECOMMENDED USE

Here's how you should analyse you classes. Thing about what you found complex. Then ran ems on you classes with the most compact output. If you find some things which you might find suspect, ran the program again with more verbose output. If that isn't enough try it with the most verbose output. Use your favourite Skripting tool to get the data you want. I've used awk for this; take it into your spread sheet and let that generate graphics for you. Compare your values with other programs to get a feeling of where problems might be. And have fun ;-) and share your result with other. The more we know about what usual values are the better we may estimate if s.th. held some further trouble or needs some redoing.

OUTPUT

Here's a small example for the output of this programm. Comments from me start with -- FD. This outpt is the most compact output. And it isn't valid anymore, because of some work in the above classes.

-- FD the programm starts whith this. I put into the above example the
-- FD input that is expected for this here it's "eiffel_scan.e"


-----------------------------------------------
Scanning class 'whatever'

--------------- LEXICAL ANALYSIS: ----

                     statistics of the method do_a_token                       
                     -----------------------------------                       


    Feature is a command
    --FD just to possibilities here (command or query)
	                                                       
    statistics for the method                                                  
    -------------------------                                                  
         comments: 1 parameters: 1  local variables : 2  debug-blocks: 0
      
      -- FD number of comments, parameters, local variables and
      -- FD debug-blocks in this method 
         
         semantic units: 6                                                     
              in assertoins    : 0                                             
              in preconditions   : 0        in postconditions    : 0          
              in variant-blocks : 0         in invariant-blocks : 0           
              in check-blocks   : 0         in debug-blocks     : 0 

	-- FD 6 instructions called semantic units, you can use
	-- FD logical LOC instead in this feature
	-- FD what is a semantic unit? A semantic unit is one
	-- FD instruction, for whom I believe it's a unit
	-- FD e.g i := i + 1 is a very simple semantic unit
	-- FD but Result := a and b xor not c not d and e is a 
	-- FD semantic unit too, but a lot more complicated.
	-- FD you just see this information on detail-grade 1 or 2
	-- FD for every semantic unit; the possibilities are
	-- FD the logical operations (and, or ...)
	-- FD the arithmetic operations (+, -, // ...)
	-- FD the comparison operations (<, >, <= ...)
	-- FD the assignments (:=, ?=)
	-- FD the identifiers on each level
	-- FD if you have a.put(i,j) you have 1 identifier on level
	-- FD 0,0 one on level 0,1 and two on level (1,1) this (x,y)
	-- FD stands for the right round brace level ("(") and how 
	-- FD often the instruction is qualified
	-- FD e.g. a.b.c.
	-- FD I think if the levels get deeper and deeper the whole
	-- FD thing is more difficult to understand
	-- FD but I don't have any suggestions of how much is easy 
	-- FD to understand or more difficult to understand
	

          
    statistics for the control flow
    -------------------------------
          if  elseif    else    when inspect    from
           1                                        
           1               1                        


	-- FD here you can see how complicated and how nested the
	-- FD control flow is. Here you can see. 
	-- FD that we have a nesting level of two, on the second level
	-- FD there is a else branch 
	-- FD IMHO if the level is deep -> the expression is harder
	-- FD to understand

This have been exampled on feature level on class-level I've got the 
following information.  
	             Statistics for the class EIFFEL_SCAN                      
                     ------------------------------------                      
                                                                                
    The parent classes are : SCANNING

	-- FD what are the parents?

    Adaption of features of the parents                                        
    -----------------------------------                                        
    renaming            :  1  redefinition        :  1                         
    made abstract       :  0  explicit exported   :  0                         
    explicit selected   :  0                                                   
                                   
	-- FD how are the features adapted in this class
                                             
    statistics                                                                 
    ----------                                                                 
    feature   : 10 queries : 8 with 1 attributes                             
                   commands 2                                                  
    methods   : 9                                                              
    constants : 0

	-- FD statistics of the number of feature, queries .. 
	-- FD of the class                                                              
    From the features are 7 privat und 0 selective reachable.
                 
    This methods default_action, retrieve_command_number, get_kw_number, ems_str_n_pair, 
    ems_act_bd, is_simple_type, command_number_valid are private.
	

	--FD export status of the features

    
    Type and number of methods  (normal, abstract, etc.)                       
    ----------------------------------------------------                       
              normal  : 6                                                      
              once   : 3


	-- FD what type are the features, normal, abstract, external ...)
                                                                                
    Number of comments on class-level : 5

	

    occurences of the types
    -----------------------
    Typ Name: SCANNING                   occurences: 1
    Typ Name: STRING                     occurences: 4
    Typ Name: EMS_BOUND_ACTION           occurences: 1
    Typ Name: INTEGER                    occurences: 4
    Typ Name: EIFFEL_SCAN                occurences: 1
    Typ Name: PLAIN_TEXT_FILE            occurences: 1
    Typ Name: CREATION                   occurences: 1
    Typ Name: EMS_STR_NUMBER_BOUND       occurences: 1
    Typ Name: EMS_ACTION                 occurences: 2
    Typ Name: BOOLEAN                    occurences: 2
    Typ Name: MAKE                       occurences: 1
    Typ Name: NONE                       occurences: 1
    Typ Name: TOKEN                      occurences: 3

 -- FD how often and how much classes are needed for the 
 -- FD analyzed class to fullfill there duty?
 -- FD here I think if you have a lot of different classes
 -- FD the module isn't well encapsulated, and I it's likely that
 -- FD that changes in other classes have their effect on 
 -- FD this class
	


here's a more verbose output:
                     statistics of the method do_a_token                       
                     -----------------------------------                       


    Feature is a command                                                       
    feature-statistics                                                         
    ------------------                                                         
         comments: 1 parameters: 1  local variables : 2  debug-blocks : 0      
                                                                                
         semantic units: 6                                                     
              in assertoins    : 0                                             
              in preconditions   : 0        in postconditions    : 0          
              in variant-blocks : 0         in invariant-blocks : 0           
              in check-blocks   : 0         in debug-blocks     : 0           
    statistics for the control flow
    -------------------------------
          if  elseif    else    when inspect    from
           1                                        
           1               1                        
    Typen                                                                      
    ------                                                                     
    EMS_ACTION               occurences:   1                                  
    INTEGER                  occurences:   1                                  
    TOKEN                    occurences:   1                                  
Semantische Einheit Nr: 1
 id_ct num_ct := ?= and or xor not imp + - * / // \\ ^ = /= > >= < <=
     3         1                                                     

 -- FD this means there are 3 identifiers and on assignment in the first
 -- FD semantic unit
 -- FD the semantic unit looks like this:
 -- FD command_number :=  retrieve_command_number(read_token);
 -- 
 --
 -- FD cut the rest.

The most verbose output look like this: 

                     statistics of the method do_a_token                       
                     -----------------------------------                       


    Feature is a command                                                       
    feature-statistics                                                         
    ------------------                                                         
         comments: 1 parameters: 1  local variables : 2  debug-blocks : 0      
                                                                                
         semantic units: 6                                                     
              in assertoins    : 0                                             
              in preconditions   : 0        in postconditions    : 0          
              in variant-blocks : 0         in invariant-blocks : 0           
              in check-blocks   : 0         in debug-blocks     : 0           
    statistics for the control flow
    -------------------------------
          if  elseif    else    when inspect    from
           1                                        
           1               1                        
    Typen                                                                      
    ------                                                                     
    EMS_ACTION      ausfuehren
    INTEGER         command_number
    TOKEN           read_token
Semantische Einheit Nr: 1
        id_ct num_ct := ?= and or xor not imp + - * / // \\ ^ = /= > >= < <=
(0, 0)      2         1                                                     
(1, 0)      1                                                               

-- FD (0,0) the first 0 stand for the round-brace-level on which you can found
-- FD  the identifier etc
-- FD the second 0 show on which qualifier-level the things can be found
-- FD the SU is the same as above
-- FD command_number :=  retrieve_command_number(read_token);
-- FD you can see. 2 identifiers are on the zero level of qualification and
-- FD round-brace level
-- FD and read_token is the identifier you can found on the first round-brace 
-- FD level

Intention

The deeper nested the levels are, the more difficult are they to understand. Compare e.g usage with a.b.c(d).item and x.a.c and not ... and I think you see what I mean. My idea was to give each of the elements of an semantic unit a difficulty number and then you might be able to compare different semantic units just by a number. Of course this is extremly difficult and the number may be very subjective. But as long as we don't have anything better we just can try it with such a stuff.

INTEPRETATION

Here's what I feel about semantic units (SU) and other values:

  1. the deeper nested the more difficult is a SU
  2. different operators in a SU make it more difficult to understand what is done. So if you would have a.item(i+a //j) is more difficult than a.item(i)
  3. xor and not make an extra difficulty
  4. with := and ?= you can easily find command which really change an object and if you found them in a query, you're violating CQS (Command-Query-Separation) which usually is in Eiffel regardes as bas style.
  5. the deeper nested the control flow is, the more difficult is it to understand what's going on
  6. if you don't find a lot of assertion maybe s.th is wrong
  7. if you have a lot of local variables, your methods will do more than one job
  8. if you find externals all over the place, you have a problem with modularization
  9. if your method work with a lot of different other classes you might have a problem with modularization
  10. if you need a lot of parametes you should think about dividing options and parametes. In Eiffel just a few parametes are used normally. If you don't believe that run ems on vision classes of ISE. Exception are interface classes either to C, Fortran or whatever and an interface to a databse
  11. the more SU you've in one method, the more trouble you might have. I ran ems on the base classes from ISE and found just around 2-3 SU per feature, in the metrics system itself around 5 SU are in one feature. This drops if you don't include the class PRINTER_CL. If this is ok or not must everbody decide by himself. But I/O is a very sequentiel stuff so I think you'll find a lot more SUs in I/O intensive classes. But if you found it in a lot of classes you have a problem.

RESULTS

Have a look into the $EMS_HOME/data directory there you can find the compact output of ems on all classes of the system. (Because of space consumption just the most compact output has been generated) This output was generated this way:

ls **/*.e > eiffel_files
ems $(<eiffel_files) > data/metrics_from_ems

I'm a user of zsh and it won't work this way if you use another shell. but I've tried ems $(ls **/*.e) > data/test too so it should work ;-)

COMMENTS, BUGS

Please send comments, bugs to Friedrich.Dominicus@inka.de. Can not promise to fix bugs soon. Would be nice to send me mail if you use this software. If you think the software should be build up, please send me suggestions.

Copyright notice

In short: You can do whatever you want with this software, but don't pursue me for anything, and please keep a short notice who has written this software.

I've stolen this from the Python Source, if this isn't ok please send me a short note.

The Eiffel Metrics System is copyrighted but you can freely use and copy it as long as you don't change or remove the copyright notice:

----------------------------------------------------------------------
Copyright 1998 by Friedrich Dominicus, Karlsruhe, Germany.

                        All Rights Reserved

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Friedrich Dominicus
not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.


Friedrich Dominicus DISCLAIM ALL WARRANTIES WITH
REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL Friedrich Dominicus
BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
----------------------------------------------------------------------

--Friedrich Dominicus e-mail: Friedrich.Dominicus@inka.de

CONTACT

e-mail: Friedrich.Dominicus@inka.de

[ Home Page ] [ Eiffel Archive ] [ Tools for Eiffel DevelopersEiffel Classes and Clusters ]