Implementation Notes
Implementation Notes
This appendix contains information mainly of interest to implementors and
maintainers of gawk. Everything in it applies specifically to
gawk, and not to other implementations.
Downward Compatibility and Debugging
See section Extensions in gawk Not in POSIX awk,
for a summary of the GNU extensions to the awk language and program.
All of these features can be turned off by invoking gawk with the
`--traditional' option, or with the `--posix' option.
If gawk is compiled for debugging with `-DDEBUG', then there
is one more option available on the command line:
-W parsedebug--parsedebug- Print out the parse stack information as the program is being parsed.
This option is intended only for serious gawk developers,
and not for the casual user. It probably has not even been compiled into
your version of gawk, since it slows down execution.
Making Additions to gawk
If you should find that you wish to enhance gawk in a significant
fashion, you are perfectly free to do so. That is the point of having
free software; the source code is available, and you are free to change
it as you wish (see section GNU GENERAL PUBLIC LICENSE).
This section discusses the ways you might wish to change gawk,
and any considerations you should bear in mind.
Adding New Features
You are free to add any new features you like to gawk.
However, if you want your changes to be incorporated into the gawk
distribution, there are several steps that you need to take in order to
make it possible for me to include to your changes.
-
Get the latest version.
It is much easier for me to integrate changes if they are relative to
the most recent distributed version of
gawk. If your version ofgawkis very old, I may not be able to integrate them at all. See section Getting thegawkDistribution, for information on getting the latest version ofgawk. -
Follow the GNU Coding Standards.
This document describes how GNU software should be written. If you haven't
read it, please do so, preferably before starting to modify
gawk. (The GNU Coding Standards are available as part of the Autoconf distribution, from the FSF.) -
Use the
gawkcoding style. The C code forgawkfollows the instructions in the GNU Coding Standards, with minor exceptions. The code is formatted using the traditional "K&R" style, particularly as regards the placement of braces and the use of tabs. In brief, the coding rules forgawkare:- Use old style (non-prototype) function headers when defining functions.
- Put the name of the function at the beginning of its own line.
-
Put the return type of the function, even if it is
int, on the line above the line with the name and arguments of the function. - The declarations for the function arguments should not be indented.
-
Put spaces around parentheses used in control structures
(
if,while,for,do,switchandreturn). - Do not put spaces in front of parentheses used in function calls.
- Put spaces around all C operators, and after commas in function calls.
-
Do not use the comma operator to produce multiple side-effects, except
in
forloop initialization and increment parts, and in macro bodies. - Use real tabs for indenting, not spaces.
- Use the "K&R" brace layout style.
-
Use comparisons against
NULLand'\0'in the conditions ofif,whileandforstatements, and in thecases ofswitchstatements, instead of just the plain pointer or character value. -
Use the
TRUE,FALSE, andNULLsymbolic constants, and the character constant'\0'where appropriate, instead of1and0. - Provide one-line descriptive comments for each function.
- Do not use `#elif'. Many older Unix C compilers cannot handle it.
gawk, I may not bother. -
Be prepared to sign the appropriate paperwork.
In order for the FSF to distribute your changes, you must either place
those changes in the public domain, and submit a signed statement to that
effect, or assign the copyright in your changes to the FSF.
Both of these actions are easy to do, and many people have done so
already. If you have questions, please contact me
(see section Reporting Problems and Bugs),
or
gnu@prep.ai.mit.edu. - Update the documentation. Along with your new code, please supply new sections and or chapters for this book. If at all possible, please use real Texinfo, instead of just supplying unformatted ASCII text (although even that is better than no documentation at all). Conventions to be followed in AWK Language Programming are provided after the `@bye' at the end of the Texinfo source file. If possible, please update the man page as well. You will also have to sign paperwork for your documentation changes.
-
Submit changes as context diffs or unified diffs.
Use `diff -c -r -N' or `diff -u -r -N' to compare
the original
gawksource tree with your version. (I find context diffs to be more readable, but unified diffs are more compact.) I recommend using the GNU version ofdiff. Send the output produced by either run ofdiffto me when you submit your changes. See section Reporting Problems and Bugs, for the electronic mail information. Using this format makes it easy for me to apply your changes to the master version of thegawksource code (usingpatch). If I have to apply the changes manually, using a text editor, I may not do so, particularly if there are lots of changes.
Although this sounds like a lot of work, please remember that while you may write the new code, I have to maintain it and support it, and if it isn't possible for me to do that with a minimum of extra work, then I probably will not.
Porting gawk to a New Operating System
If you wish to port gawk to a new operating system, there are
several steps to follow.
- Follow the guidelines in section Adding New Features, concerning coding style, submission of diffs, and so on.
-
When doing a port, bear in mind that your code must co-exist peacefully
with the rest of
gawk, and the other ports. Avoid gratuitous changes to the system-independent parts of the code. If at all possible, avoid sprinkling `#ifdef's just for your port throughout the code. If the changes needed for a particular system affect too much of the code, I probably will not accept them. In such a case, you will, of course, be able to distribute your changes on your own, as long as you comply with the GPL (see section GNU GENERAL PUBLIC LICENSE). -
A number of the files that come with
gawkare maintained by other people at the Free Software Foundation. Thus, you should not change them unless it is for a very good reason. I.e. changes are not out of the question, but changes to these files will be scrutinized extra carefully. The files are `alloca.c', `getopt.h', `getopt.c', `getopt1.c', `regex.h', `regex.c', `dfa.h', `dfa.c', `install-sh', and `mkinstalldirs'. -
Be willing to continue to maintain the port.
Non-Unix operating systems are supported by volunteers who maintain
the code needed to compile and run
gawkon their systems. If no-one volunteers to maintain a port, that port becomes unsupported, and it may be necessary to remove it from the distribution. - Supply an appropriate `gawkmisc.???' file. Each port has its own `gawkmisc.???' that implements certain operating system specific functions. This is cleaner than a plethora of `#ifdef's scattered throughout the code. The `gawkmisc.c' in the main source directory includes the appropriate `gawkmisc.???' file from each subdirectory. Be sure to update it as well. Each port's `gawkmisc.???' file has a suffix reminiscent of the machine or operating system for the port. For example, `pc/gawkmisc.pc' and `vms/gawkmisc.vms'. The use of separate suffixes, instead of plain `gawkmisc.c', makes it possible to move files from a port's subdirectory into the main subdirectory, without accidentally destroying the real `gawkmisc.c' file. (Currently, this is only an issue for the MS-DOS and OS/2 ports.)
- Supply a `Makefile' and any other C source and header files that are necessary for your operating system. All your code should be in a separate subdirectory, with a name that is the same as, or reminiscent of, either your operating system or the computer system. If possible, try to structure things so that it is not necessary to move files out of the subdirectory into the main source directory. If that is not possible, then be sure to avoid using names for your files that duplicate the names of files in the main source directory.
-
Update the documentation.
Please write a section (or sections) for this book describing the
installation and compilation steps needed to install and/or compile
gawkfor your system. - Be prepared to sign the appropriate paperwork. In order for the FSF to distribute your code, you must either place your code in the public domain, and submit a signed statement to that effect, or assign the copyright in your code to the FSF.
Following these steps will make it much easier to integrate your changes
into gawk, and have them co-exist happily with the code for other
operating systems that is already there.
In the code that you supply, and that you maintain, feel free to use a coding style and brace layout that suits your taste.
Probable Future Extensions
AWK is a language similar to PERL, only considerably more elegant. Arnold Robbins Hey! Larry Wall
This section briefly lists extensions and possible improvements
that indicate the directions we are
currently considering for gawk. The file `FUTURES' in the
gawk distributions lists these extensions as well.
This is a list of probable future changes that will be usable by the
awk language programmer.
- Localization
-
The GNU project is starting to support multiple languages.
It will at least be possible to make
gawkprint its warnings and error messages in languages other than English. It may be possible forawkprograms to also use the multiple language facilities, separate fromgawkitself. - Databases
-
It may be possible to map a GDBM/NDBM/SDBM file into an
awkarray. - A
PROCINFOArray -
The special files that provide process-related information
(see section Special File Names in
gawk) may be superseded by aPROCINFOarray that would provide the same information, in an easier to access fashion. - More
lintwarnings - There are more things that could be checked for portability.
- Control of subprocess environment
-
Changes made in
gawkto the arrayENVIRONmay be propagated to subprocesses run bygawk.
This is a list of probable improvements that will make gawk
perform better.
- An Improved Version of
dfa -
The
dfapattern matcher from GNUgrephas some problems. Either a new version or a fixed one will deal with some important regexp matching issues. - Use of
mmap -
On systems that support the
mmapsystem call, its use would provide much faster file input, and considerably simplified input buffer management. - Use of GNU
malloc -
The GNU version of
malloccould potentially speed upgawk, since it relies heavily on the use of dynamic memory allocation. - Use of the
rxregexp library -
The
rxregular expression library could potentially speed up all regexp operations that require knowing the exact location of matches. This includes record termination, field and array splitting, and thesub,gsub,gensubandmatchfunctions.
Suggestions for Improvements
Here are some projects that would-be gawk hackers might like to take
on. They vary in size from a few days to a few weeks of programming,
depending on which one you choose and how fast a programmer you are. Please
send any improvements you write to the maintainers at the GNU project.
See section Adding New Features,
for guidelines to follow when adding new features to gawk.
See section Reporting Problems and Bugs, for information on
contacting the maintainers.
-
Compilation of
awkprograms:gawkuses a Bison (YACC-like) parser to convert the script given it into a syntax tree; the syntax tree is then executed by a simple recursive evaluator. This method incurs a lot of overhead, since the recursive evaluator performs many procedure calls to do even the simplest things. It should be possible forgawkto convert the script's parse tree into a C program which the user would then compile, using the normal C compiler and a specialgawklibrary to provide all the needed functions (regexps, fields, associative arrays, type coercion, and so on). An easier possibility might be for an intermediate phase ofawkto convert the parse tree into a linear byte code form like the one used in GNU Emacs Lisp. The recursive evaluator would then be replaced by a straight line byte code interpreter that would be intermediate in speed between running a compiled program and doing whatgawkdoes now. - The programs in the test suite could use documenting in this book.
- See the `FUTURES' file for more ideas. Contact us if you would seriously like to tackle any of the items listed there.
Implementation Notes : micro annuaire
| cygwin | : | le compilateur gcc sous windows ainsi que tous les outils unix (awk, grep, sed, bash, ksh ...). |
| Youhp3 | : | Youpee est un preprocesseur HTML pour vous simplifier toutes les tâches répétitives dans la création d'un site web. Salemioche.net utilise trés largement ses possibilités. |
