Help with the usenet-web.rc file

The usenet-web.rc file is the single most important file for a person wanting to maintain a Usenet-Web 1.0 archive. It contains almost all of the configuration options you need to make an archive not only work, but to put your own personal stamp on it.

The usenet-web.rc file has two major sections - the global options that define the default behavior of the software, and the group specific options that allow you to tweek an individual group.

The Global Options

COMPRESS
COMPRESS specifies the full path to the compression program used by Usenet-Web. At present, only 'gzip' is supported - and required. Usenet-Web does not have any direct options for turning compression off at this time. If you are determined to run without compression, you will have to fool the program by writing a shell script that renames files to the .gz extension ('mv $1 $1.gz' will do it), and set COMPRESS to point to it.
UNCOMPRESS
UNCOMPRESS points to the uncompression program to be used. Usenet-Web is designed to use one of the following: 'zcat', 'gunzip -c' or 'gzip -d -c'.

If you are operating without file compression by fooling the program, 'cat' will work.

URLPATH

URLPATH gives the URL that the fetch program is using. For example, if you install fetch to the master cgi-bin directory on your system, then URLPATH should be set to '/cgi-bin/fetch'. You can rename the program anything you like and it will still work - you just have to make sure that URLPATH points to it.

This also provides a mechanism for passing extra CGI variables, such as those used by 'cgiwrap'. Just add the extra variables to the URLPATH as if you were passing them from a browser. So if you wanted to add 'user=snowhare&script=fetch' it would read: URLPATH=/cgi-bin/fetch?user=snowhare&script=fetch

What could be easier?

ICONPATH
ICONPATH gives the URL of the directory you have placed the icons used by Usenet-Web: search.gif, results.gif, day.gif, month.gif, year.gif, next.gif and previous.gif. The must be placed in a single directory, and must be named as listed. So if you put them in a directory with URL '/icons/', that would be what you used.
SYSTEMIDENT
This is where you get the first of the customization options: The ability to add a banner with anything you want in it. SYSTEMIDENT points to the file system path of a text file containing any text you want included at the bottom of every page presented by Usenet-Web. This text is presented as HTML - so you can use it for anything you can use HTML for. An example file, 'system-ident' has been included in this package.
ACTIVE_FILE
ACTIVE_FILE points to the news system file that tells what messages and newsgroups your system has available. Usenet-Web uses this information to speed up the archiving process by only looking at new messages.

This file can frequently be found in one of these places: /usr/lib/news/active, /usr/spool/news/active, or /usr/local/lib/inn/active

If you cannot find it on your system - try asking your system adminstrator. They will certainly know where it is.

NEWS_SPOOL
This is the directory where all news articles are normally stored by the news system. The most frequent place for this is '/usr/spool/news/'.
CACHE
To assist with the threading of archive searches and to reduce the load caused by people doing the same search repeatedly, Usenet-Webmaintains a cache of the results of recent search request. the 'CACHE" global points to the directory these cached search results should be stored in.

IMPORTANT: This directory must be readable and writable by the 'fetch' CGI script. If running the script under user 'nobody', this means that the cache directory must be world readable and writable (privs 0777), unless it is owned by 'nobody'.

If you are running Usenet-Web under 'cgiwrap', the cache directory only needs to be readable and writable by you (privs 0700).

CACHE_EXPIRY
This is the time (in seconds) the cache contents will be retained. I suggest 3600 (one hour) as a reasonable time.
ARCHIVE_DIRECTORY
This is Usenet-Web's equivalent to the NEWS_SPOOL. This is the top level directory where, by default, it will begin storing archived articles in a tree very simliar to an ordinary news spool - with a few some elaborations. It will automatically create subdirectories as needed here.
TRIPLE_HEADER_INDEX_FILE
This is a feature provided mainly for the fiddle factor. This gives the name that USenet-Web will use for it's triple header file containing the Subject:, Date: and From: lines of every message in the archive. I personally use 'Archive_Index'.
MESSAGE_ID_INDEX_FILE
Again, this really is not that useful, except for fiddle factor. This gives the name used for the index files containing the archive index number, Message-ID: and location in the archive of each message. I prefer 'Id-Index' as the name I use.
That is all of the global options.

Group Options

Each of this options apply ONLY to the newsgroup they are listed after. All options are, well, optional, unless otherwise stated. Where a default of 'yes' or 'no' is given, you can override the default by specifying OPTION_NAME=no or OPTION_NAME=yes, whichever is the opposite of the default.
NEWSGROUP - [required]
The name of the newsgroup being archived.
DESCRIPTION (default=blank)
The description of this group - I personally try to use the same descriptions as given to the groups on the Usenet, but you don't have to.
FANCY_DISPLAY (default=yes)
This option is used to turn off the hotlinking of URLs contained in messages and italicizing of text that appears to be quotes from other messages. This is mainly useful in a few non-english and binary groups (maybe).
USE_ACTIVE_FILE (default=yes)
This option is used when, for one reason or another, no active file of newsgroups is available for a newsgroup. This might occur, for example, if you were obtaining the articles from a newsgroup via mail or other non-local mechanism and then storing them to a directory for archiving.

Turning this off (USE_ACTIVE_FILE=off) results in every message in the news spool directory for ths group being processed every time the archiver runs. This creates a major speed penalty unless you also use REMOVE_ORIGINAL=yes to remove the old messages.

REMOVE_ORIGINAL (default=no)
If you do not want to retain the copy of the message in the news spool directory after it has been added to the archive, set this to yes. Note: This is really only useful in cases where you are obtaining news articles from somewhere else than a normal news spool (see USE_ACTIVE_FILE).
ACTIVE_FILE
If you have a system that mounts more than one news spool, then you may need to point the archiver to a different active file than the default one for a particular newsgroup. This allows you to specify what active file to use by the group.
NEWS_SPOOL
This allows you to override the global news spool and explicitly point the archiver to a directory used specifically by this one newsgroup. That is, not the top of the news spool tree - but the bottom level. This is useful both for when you have multiple news spools and for unconventional newsfeeds (ie news->mail->local disk) or.
ARCHIVE_DIRECTORY
For one reason or another, you might want to store an archive somewhere else than the common tree provided by Usenet-Web. This allows you to specify the exact directory you want a particular newsgroup archived in.
HIGH_MESSAGE - [optional/required]
If USE_ACTIVE_FILE=yes (which is the default), then you must also use HIGH_MESSAGE. This tracks the highest message in the news spool that the archiver has already processed (actually +1) so it does not have to reprocess old messages. When you first start an archive, this should be set to '0' (zero): HIGH_MESSAGE=0
MESSAGE_COUNT - [required]
This keeps track of the number of messages currently in the archive. The software must know this number for a variety of reasons. When you are first setting up an archive for a newsgroup it should be set to '0' (zero): MESSAGE_COUNT=0
MESSAGE_ID_INDEX_FILE
More fiddle factor. If for some reason you wanted to have a different name for the MESSAGE_ID_INDEX_FILE than the global one - here is where you would change it.
TRIPLE_HEADER_INDEX_FILE
More fiddle factor. If for some reason you wanted to have a different name for the TRIPLE_HEADER_INDEX_FILE than the global one - here is where you would change it.
ICONPATH
In the spirit of customizing by the group, you could use this to change the URL for the various icons to a different directory for just this one group. Possible uses would be changing the icons for archives of foreign languages, 'cutesy' icons for particular groups or just plain because you wanted to.
URLPATH
Allows the changing of the CGI script called by the group. I haven't thought of a good reason to do this, but what the hell.
SYSTEMIDENT
Allows you to change the bottom of screen banner for each group individually.

This then is what an initial usenet-web.rc file could look like for someone archiving two newsgroups from their news spool (details will differ depending on system configuration):


# Globals

UNCOMPRESS=/usr/bin/zcat
COMPRESS=/usr/bin/gzip
URLPATH=/cgi-bin/fetch
ICONPATH=/icons/
SYSTEMIDENT=/home/users/snowhare/usenet-web/system-ident
CACHE=/home/users/snowhare/usenet-web/cache
CACHE_EXPIRY=3600
ACTIVE_FILE=/usr/lib/news/active
NEWS_SPOOL=/usr/spool/news
ARCHIVE_DIRECTORY=/home/users/snowhare/usenet-web/archive
TRIPLE_HEADER_INDEX_FILE=Archive_Index
MESSAGE_ID_INDEX_FILE=Id-Index

# -------------------------------------------------------------------

#Group Specific

NEWSGROUP=alt.devilbunnies
DESCRIPTION=Probably better left undescribed
HIGH_MESSAGE=0
MESSAGE_COUNT=0

# -------------------------------------------------------------------

NEWSGROUP=soc.culture.vietnamese
DESCRIPTION=Issues and discussions of Vietnamese culture.
HIGH_MESSAGE=0
MESSAGE_COUNT=0

# -------------------------------------------------------------------

Return to Advanced Topics


Benjamin "Snowhare" Franz / snowhare@netimages.com