I like words..

I like words. Sentences are fine. Paragraphs tend to make me sleepy. Short stories are almost always too long. Books are tiring. Series are evil. But, I like words.

- Submitted by Todd Coram

Permalink |  Friday, March 31 12:52 PM

A word or two about Gawking.

EFX is all about text parsing and generation.

Storybot talks to a POP server and downloads some email. The email is parsed and certain bits are extracted. These bits consist of commands, story bodies, pictures and other publishable stuff. XHTML site templates are read, parsed and used to generate text that will make up the website. The website files are transfered via FTP to the website.


Along the way, my mind grapples with scheduling, filesystems, recursive templates and how to make EFX template code dynamically co-exist with Tcl (without trampling all over engine).

I've got a working EFX made up of nothing but GAWK, Tcl and a smidgen of bourne shell. It manages two sites which are (potentially) updated every 5 to 10 minutes. It all runs under a cron job on top of cygwin+Win2k on an old PC sitting in my spare bedroom. It hasn't crashed yet.

So, I am spending this time just thinking.... the latest thought: Everything, like I just said, is about text processing. Job control (scheduling) is handled by cron, but everything else is text processing. Heck, POP3 and FTP is all about text processing (command/query and response).

Is there a way to further simplify the most difficult part of EFX? (That would be the EFX engine.) Tcl, Perl, Schme, Ksh -- it doesn't matter. I've always thought that I should leverage the language environment (by using essentially eval) to let the template code be the same as the host language. Gawk can't do this, so gawk was not considered for the core.

But, what if a mini-language for EFX template progamming is the way to go? The templates themselves are in a XML mini-language that I parse -- why not take that one step further? Then I wouldn't need eval. An EFX language parser in gawk... more control. The mini-language, for the most part, would be the same as it is now. But instead of being executed by Tcl (directly evaluated within the Tcl interpreter), it would be parsed and interpreted explicitely by Gawk. Safer. Saner?

Now is the time I take another step back. How do I make EFX simpler?

- Submitted by Todd Coram

Permalink |  Thursday, March 30 4:06 PM

...and then there is ALL ksh?

Re: my last post

I wasn't convinced that "everything" in EFX could be done in ksh. It supports networking but not binary data manipulation, so I would still need an external FTP program. (I am doing some rather sophisticated scripting of file transfers so a stock FTP command isn't enough -- some things that are reported as errors aren't errors, etc.)

A few experiments using ksh93 under Solaris yields a rather interesting capability:

  $ cat /tmp/somebinaryfile >/dev/tcp/somehost/someport

Cat is a built-in and the above line happily transfers binary data (a requirement for writing an FTP program). Currently I do some rather suspect stuff in gawk (using the INET extension). FTP in pure ksh is very possible (without a single external command).

(Question: What is all of this stuff about avoiding external commands? Answer: I want to install just a handful of files with EFX and not rely on utils a user may already have installed).

If I can convert my "email parsing" stuff to (at least) nawk (gawk overlaps ksh too much in regards to functionality for my purposes). Then I could ship (install) EFX 2.0 as just "ksh and nawk". Eventually, I could always port email parsing stuff to ksh.

- Submitted by Todd Coram

Permalink |  Friday, March 24 2:45 PM

ksh + gawk = Full EFX?

These two make an interesting pair. (G)awk is excellent as a quick language for processing text and expressing algorithms. Storybot's pop3 fetcher, mime parser and command processor is written in gawk. It would be more cumbersome in ksh (or Tcl for that matter). There is a certain beauty in brevity. That would be syntatic and semantic brevity. Without looking like line noise, you can express a lot in awk.

However, (g)awk falls down when dealing with OS and control issues. Watch awk turn ugly when you just want to schedule a download or copy files from one location to another. Awk isn't prepared to deal with filesystems and process control. But, that's okay. That is where the Korn Shell (ksh) excels.

Currently, I've collapsed all of the gawk programs (about 8-10 of them) down to 2 gawk programs: pop3fetch, storybot.awk. Frankly, pop3fetch could be written (and has been written) in ksh. But, I don't want to commit myself to a full ksh (versus just plain old "sh") just yet.

So, the current components of EFX 2.0 consist of:

  1. pop3fetch - download email into an "incoming" directory
  2. storybot - process that email, validate users, follow commands, put stories into proper EFX content directory.
  3. xmlparser.awk - Parses EFX template files and produces a streaming list of elements.
  4. efx core engine - (Tcl code) calls xmlparser.awk and calls embedded Tcl code to produce site.
  5. efx content apps - (Tcl code) application stuff to do calendar, news, menus, etc.
  6. ftp publisher - (Gawk code) specialized ftp client that pushes content to web host.

The end result... You need /bin/sh, gawk and Tcl to run EFX. Not bad.

However, let's try a little experiment... Let's replace Tcl and /bin/sh with ksh. This would require a complete rewrite of the core engine and the content apps. The blend of ksh/gawk is more natural than Tcl/gawk. But, why not all Tcl? Well, I love Tcl, but gawk is (currently for me) more fun (and more natural for some of the tasks) and I didn't want to use any "standard" libraries due to my desire for fanatical control. Ksh would work just like Tcl for the core engine and template language. The syntax is very similiar.

Now, let's get even further out... EFX using ksh won't work without gawk (there is no XML processor written in ksh and I don't want to do that). Plus, gawk AND Tcl would be more natural for some of the email mangling. I respect ksh, but it starts to get more Perl like complexity when you want to do text manipulation.

So... would an "all gawk" EFX fly? Well, gawk doesn't have an eval so efx core would be a problem. But, wait, I could write a very simple template language parser in awk. This is not as crazy as it sounds. Most of the language is about setting variables and calling applications (generate news story, get a news item, etc). Currently, the Tcl appication code carries very little state (no side effects), so replacing the apps with gawk isn't the problem. The template code relies on variables and variable substitution for a lot of work. This is no problem for Tcl or Ksh. Gawk can't use itself. I would have to write a simple "language".

A mini template markup language in gawk would have a few benefits. The biggest would be sandboxing. Unlike exposing all of Tcl or Ksh, the mini language would limit access to the underlying system.

The biggest problem would be a lack of file "globbing". Gawk has no access to the file system aside from reading/writing files it has been told about. I could throw in "find" or "ls", but maybe I shouldn't rely on the filesystem contents to begin with? Maybe everything referenced by EFX should be either traceable by file inclusion (files referencing other files) or through mini databases?

For example, rather than reading a directory to find all publishable stories, I maintain a text database of filenames of the publishable stories.

Yikes, I am starting to get very un-unix here. Time to calm down. Time to think....

- Submitted by Todd Coram

Permalink |  Friday, March 24 2:22 PM

Stupid Korn shell tricks, Part 1

I've been avoiding serious programming at home lately. I'm just too tired... EFX inspiration please hit me soon!

However, I have been doing a whole lot of Korn shell (ksh93) programming at work. It's been bleeding into my home/personal time. There are quite a few interesting language features in ksh. I find it interesting to stretch out those features to see how far they go before snapping. In particular, I like how you can treat functions just like unix commands. No big deal here. But, it allows you to do some not-quite-obvious stuff.

Consider the classic factorial algorithm. Here it is in straight forward ksh:

function fact { 
    integer n=$1; nameref result=$2; 
    if ((n==0)); then 
      fact $((n-1)) result; result=$((n*result))
fact 5 res; print $res

The above code uses reference variables to hold return values. Ksh functions returns are like program exits (they are limited to a small integer range -- they convey status not data).

However, with a little unix mind warping (think pipe streams), you can force ksh to do this:

function fact {
    integer n=$1
    if ((n==0)); then 
      print 1
      print $((n*$(fact $((n-1)))))
res=$(fact 5); print $res


- Submitted by Todd Coram

Permalink |  Wednesday, March 15 8:35 PM

EFX 2.0 vs other work

EFX 1.0 runs. It runs well. I was working on the 2.0 rewrite when I encountered the problems with all "2.0 rewrites": interest wanes; other projects begin to bloom.

EFX is far from dead. It still runs 2 sites (.000000N% of all online websites? ;-) , but I have another project brewing that will use some components from the Storybot portion of EFX.

Stay tuned, all 2 or 4 of you who read this blog for more on the new project.

For now, EFX will continue along its 1.0 trajectory -- a bunch of simple scripts that can be used to compose and publish websites.

I'll probably make the code available as a tarball (sans freshmeat/sourceforge/cvs trappings).

- Submitted by Todd Coram

Permalink |  Tuesday, March 07 2:54 PM