1. Introduction
2 LBT Archive Features
3. How should the archive be structured
4. The World-Wide Web Project and the LBT Archive
5. Tools for the Trade
6. Conclusions
References
1. Introduction
Like any big and complex project, the LBT project has produced a huge
amount of documentation both of direct and of historical interest
since its very beginning.
Moreover with the starting of the detailed design phase, and
of the actual construction the rate
of documents, drawings, papers, etc., is due to
increase dramatically.
Most of the documentation generated in the design and construction phase will maintain its interest throughout the full operating life of the LBT, for activities such as instrumentation design, observation scheduling, troubleshooting, maintenance, refurbishing; moreover during operation more data and pieces of information will be recorded in order to increase the knowledge of the various aspect related to observation which will allow a better use of the LBT.
It is clearly the right time to design a model of the information system that will allow us to store in a suitable database all the sensible pieces of information along the various phases of the LBT project and to develop a set of tools which will allow easy access and retrieval of items to all the interested parties.
The LBT is intrinsically a ``distributed'' project: documents are produced in a number of different places (basically two, at the moment, but possibly more in the future) and this distributed nature of the project will somewhat be reflected in the distributed nature of the required database. On the other hand the information must be accessible on request to all the participants in a short time; archives must be easily searchable and browsable, documents must be retrievable from remote locations.
In the last few years a number of application programs for distributed data retrieval have been made available and are currently in an advanced phase of development; some of those tools may be of help in the implementation of a distributed information system such the one described above. In the following pages some discussion will be devoted to how the existing network services and tools can be used for the purpose of the generation, maintaining and access to the LBT archive.
A detailed discussion about the implementation of such a system is deferred to more technically oriented reports to follow.
2 LBT Archive Features
The LBT archive system must fulfill a number of general requirements;
here follows a list of the most obvious ones:
3. How should the archive be structured
When talking of the structure of the archive we must distinguish between
the physical structure, i.e. the way information is actually stored
on the hardware supports and the logical structure, i.e. the structure of the
archive as seen by the user when accessing pieces of information.
The two can and will be different, and while the physical structure must
be maintained as stable as possible in time, the logical one can change
according to needs, and it may happen that there will be different logical
organizations of the archive for different users: e.g.: some pieces of
information may be hidden to certain groups of users, or the ``travel''
through the structure may be different for different groups of users.
In the following sections a proposed implementation of the structure of the archive and of a set of tools for creating and maintaining it, and for information accessing will be discussed.
With the above constraints the structure of the archive will consist simply of many files stored in a set of directory trees on various storage disks, possibly on various machines of a LAN. This ensures the best mapping to the native data structures of the host system, allows for growing in size: you can add more files and directories on a single disk, then more disks on a single machine, then more machines on a LAN, if needed; moreover the parts of the archive might reside on different devices (e.g.: CD-ROMs juke-box), or on special purpose database machines, if needed.
This also ensures the availability, now and in the future, of standard tools and procedures for the maintaining of the archive (moving to other places, backup, etc.) independently on the actual architecture of the host system.
Other solutions could be chosen, e.g. the use of a standard database system but, although it could make some operations easier, it hardly could fulfill the above requirements. Moreover maintaining the physical structure of the archive as simple and as close to the file structure of the host machine as possible will ease any future transport of the data on different machines, software environments, supports, should it be required by technological changes.
The use of special purpose archiving systems could anyway be useful for the development of parts of the archive. E.g.: a source code control system will very likely be used for the development of software, some specialized database system could be used for the management of drawings, and so on.
Yet these tools can be considered as different logical organizations of parts of the whole database provided that the file tree structure remains compatible with the overall archive (E.g.: I'd strongly oppose a system where data are stored in some database specific formats or structures, because they don't provide enough generality to be managed by other programs nor give adequate guarantees to maintain compatibility with the evolution of technology for a long time).
This suggests the need of duplication of most of the data on more than one site.
On the other hand data will need to be available within reasonably short time from any request from a group of people located in far away places. In spite of the increased speed of communication links (which will presumably continue to increase in the future), a completely distributed system, i.e.: a system where a single file is stored in a single place would soon become unpractical, because this requires an access to a wide area network for any retrieval of data from other places.

Figure 1: LBT Archive Architecture
This also indicates the desirability of duplication of data in more than one site.
The resulting structure should become something like the example depicted in figure 3.2. The archive is a collection of directory trees stored on various CPU's on a network. The archive nodes need not to be on the same LAN, some nodes may be remotely accessed. At the same time the information contained in some subtrees can be duplicated on the other participating nodes.
Thus a user sitting at Site A will have a local access to data stored in the ``Mechanical design workstation'' and to a local copy of the static part of the archive (likely stored on another node of the same LAN) while will use network access tools to get to the documents stored into the ``Simulation workstation'' at Site B. And, vice-versa a user at site B will have network access to the data stored in the ``Mechanical design workstation'' and local access to the rest of the archive.
Both users will see the same archive structure, i.e.: the set of data enclosed within the dotted line.
A proper set of tools must then be devised which allow the maintaining of the archive structure so that, as an example, the proper alignment of data items which are duplicated is guaranteed.
With respect to the logical structure of the archive documents will not only be identified simply by filenames, but more meaningful description could be associated to them, various search paths could be allowed according to date, strings in names, associated keywords, and so on.
As part of the logical structure of the archive, documents, or groups of documents will have associated proprietors, i.e.: parties which are allowed to write/modify them (or are anyhow ``responsible'' for the document content). All the other parties which have access to the document will only have read access to it.
It is thus clear that the logical structure of the archive must be superimposed to the physical one by means of added pieces of information which will go together with the actual data files. Some ideas on how this can be accomplished are discussed in the following pages.
The archive structure will also allow different ``views'' of the data to different groups of users. Some parts of the data could be hidden to given groups of users, different search paths may be allowed to different groups due to different needs in the access to data. Also, as stated above, some parts of the archive might be managed with specialized tools (e.g. a database system) by the development team, while remaining within the overall structure of the archive.
4. The World-Wide Web Project and the LBT Archive
Many ideas and the corresponding tools which lay behind the proposed
LBT archive are derived from concepts and applications related to
the World-Wide Web project, so it should be useful a brief discussion
of the WWW project itself and some of the involved concepts and terms.
The WWW project, started by CERN
(the European Laboratory for Particle Physics),
and now grown to many more participants,
seeks to build a set of protocols and software tools which allow
to access information in the world wide network known as ``The Internet''.
It uses hypertext and multimedia techniques to make finding,
browsing and retrieving information as easy as possible.
The functionalities required for the proposed LBT archive fit pretty well in the WWW scheme and, moreover, while a part of the archive will only be used by a limited number of sites, a subset of it (increasing in size with time) could naturally become a part of the WWW itself providing access to LBT related information to the astronomical community at large by means of widespread standard tools.
As a result of the WWW project a number of concepts have been developed and applied to programs which allow to manage a complex network of information providers such as LBT archive. Here follows a brief description of the main ones (for each underlined terms in the text you may find further explanations).
The URL above are ``absolute'' reference in that they specify the full path to the resource. The standard adds more flexibility by allowing ``relative'' URLs, where only a part of the path is specified.
The first aim of the Archiving system will be to discipline the physical structure of the archive, i.e., mainly the structure of the directory tree containing the data files.
Some guidelines for a consistent way to name files and arrange the directory structure have been already proposed [P.Gray, ``Proposed LBT Project Drawing and Document Numbering System''] and represent a good starting point, although it is not obvious that the proposed scheme can be suitable for the whole archive.
It seems unlikely, anyway, that the naming strategy can be enforced in a totally automatic way. Here discipline must be imposed by means of clear and well stated rules such as the ones referred to in the quoted proposal, possibly with the help of some automatic tool which periodically scans the archive and signals deviations from the formal rules whenever possible.
A completely automatic system can, instead, be devised for the alignment of the various copies of the Archive. Provided that each site has the responsibility on a part of the archive (i.e.: generates data on a number of subtrees of the archive hierarchy) the files can be spread to the other sites by means of suitable ftp-like programs controlled by some periodic procedure.
Here some points must be taken into account:
The file transfer mechanism will thus need to perform authentication on the calling party to allow or deny the access. This indicates http as the software tool better suited to the job.
The browsing system must be easy-to-use, must allow the management of all the required data formats, must be easily available, must allow the various ``points of view'' of the archive whose rationale was discussed above.
NCSA Mosaic (introduced in the previous section) is the software tool which seems to meet most of the above requirements and, moreover, is available in the public domain. Here follows some of the most notable capabilities of Mosaic.
NCSA Mosaic will need a server on the other side of the connection to get the pieces of information. Such a server may be as simple as an anonymous ftp server which most unix machines have active by default, or one of several consolidated protocol servers (gopher, WAIS, News), or a specific http server. The latter is particularly interesting in that provides servicing to some special features of NCSA Mosaic (such as the form capability) and allows a great deal of flexibility in the structure of the accessed data base.
Moreover it enforces authentication of the accesses and provides a flexible mechanism to allow or deny access to information according to the client location.
In figure 5.2 the structure of the LBT archive as it could be seen through Mosaic is sketched. The networked hypertext nature of Mosaic allows an easy linking of documents logically related but stored in different parts of the directory hierarchy and even on different sites.
Also note that the hypertext documents associated to the actual data files, other than specifying one of many possible logical structures of the archive, will add useful bits of information to data, which might be difficult to associate in other ways.

Figure 2: Archive Layout
Different logical structures (i.e: access paths) can be defined. E.g.: a document list strictly based on document creation date could be maintained (This list as well as some other ``standard'' views of the archive could easily be created and maintained by automatic procedures.) together with other lists based on different approaches and needs: by subassembly, by contractor, by author, and so on.
Here an integration of the Mosaic browser with a WAIS search engine can be applied. This will ensure the ability to define search paths among the archive structure well integrated with the Mosaic based browsing system.
The presence of Mosaic related documents may be particularly useful with respect to the search capabilities, in that it will allow to retrieve files which don't contain text, such as drawings and other data files by means of searches performed on the description files.
The limited search capabilities of current WAIS (only ``full text'' searches, no boolean expressions are allowed) seems to be uninfluent for this particular application.
6. Conclusions
In the report we have discussed the problem arising from the huge amount
of information which is produced in the development of the LBT project
and which must be easily accessible to all the people who cooperate
in the project, now, and to the operating staff, to maintainers
to the observers, etc. in the future.
We propose to start the development of an archiving system which will support both in the development and in the operating phases of the project, the process to search and retrieve the information needed to perform any LBT related job.
A number of concepts and corresponding software tools, mostly derived from the WWW project have been indicated which can be profitably employed to build such a system, together with some procedures to be implemented for some particular task.
A detailed description of the architecture and of the configuration of the tools will be the subject of a further report.
References
[1] Marc Andreessen.
Getting started with NCSA Mosaic.
Technical report, National Center for Supercomputing Applications,
May 1993.
[2]Marc Andreessen. NCSA Mosaic technical summary. \newblock Technical Report 2.1, Software Development Group, National Center for Supercomputing Applications, Champaign, Illinois, May 1993.
[3] F.Anklesaria, M.McCahill, P.Lindner, D.Johnson, D.Torrey, and B.Alberti. The internet gopher protocol. Technical report, University of Minnesota, March 1993.
[4] Tim Berners-Lee. Hypertext transfer protocol. Internet Draft, CERN, November 1993. Work in progress.
[5] Tim Berners-Lee and Daniel Connolly. Hypertext markup language. Internet Draft, CERN, June 1993. Work in progress.
[6] Brewster Kahle and Art Medlar. An information system for corporate users: wide area information servers. Technical Report TMC199, Thinking Machines Corp., April 1991.
[7] Craig Stanfill. Massively parallel information retrieval for wide area information servers. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Charlottesville, Virginia, October 1991.