![]() |
Large
Infrastructure in Mathematics Enhanced Services Cellule MathDoc : drafts & proposals |
Updates procedures : present and future
MathDoc Cell (October 26, 2000, last revision november 6, 2000)
The w3-server database (w3-db) is derived from the main database (Compindas system on FIZ Karlsruhe) through the PFS system (trademark by the company Kramer & Hofmann, Saarbrücken ; it is a proprietary system). The w3-db is made of subsets (PFS-db), whose sizes permit download of their PFS version on a cdrom (600 MB is the upper content on such a media). Now there are five PFS-db's : the first four are stable, containing definitive entries, while the last one is changing regularly, by addition of recent entries (incomplete with bibliographical data only or definitive with the review or author's summary) and deletion of superceded entries.
After the production of these PFS-db's produced in the Berlin office, the installation is made on each server by a simple copy of their data part and a mapping of the PFS indexes to EDBM indexes. The necessary software tool for this mapping operation (which is quasi-immediate and doesn't require special memory resources) has been created by C. Goutorbe (MathDoc Cell) in 1997 and is used by Zentralblatt-MATH according to a contract between MathDoc Cell and FIZ.
The transfer of the last PFS-db is done by post mail of the cdrom : transformation to the EDBM-index format is done on each server.
In 1998, C. Goutorbe (MathDoc Cell) realized a software piece to achieve incremental updates. Given three PFS-databases
| PFS-db1 | : | a current PFS data-set |
| PFS-db2 | : | a PFS data-set of new complete entries (which mapping to existing bibliographical entries, if any) |
| PFS-db3 | : | a PFS data-set of preliminary entries |
The EDBM-db consists of the existing definitive entries of pfs-db1. The new complete entries and preliminary entries of pfs-db2 and pfs-db3 are added, while a preliminary entry of pfs-db1 is replaced by its corresponding complete entry in pfs-db3 if it exists. The creation of indexes is not resource consuming and this upgrading system necessitates only the transfer of the (small) databases pfs-db1 and psf-db2.
This incremental procedure has been successfully tested, but never entered in production.
The production of the EDBM database w3-db is under current changes (operating period will begin in april 2001). Ultimately, ascii tagged files will be produced from the core database of Zentralblatt-MATH, from which EDBM-db's (data and index files) will be directly derived. This process is based on a new component of the EDBM suite, an indexer which is elaborated by the MathDoc Cell and will be available very soon. Intellectual property and usage rights attached to this software will be set according to the Limes consortium agreements and the rules of ''free software'': free usage for academic institutions and individuals (not allowed for bundle with non-free or with commercial product), but with due mention of MathDoc Cell copyright.
The EDBM/w3 component (search and retrieval) will stay unaffected by this new production system of the EDBM-db's (change of the database structure, as splitting of some fields, introduction of author or serial better recognition tools,... will modify the EDBM/w3 ZM applications, but these modifications are unrelated to the new updating process).
The initialization will consist by producing N files (F1,...,FN) and corresponding EDBM-db's : the first N-1 EDBM-db's will be stable and correspond to the years 1931-2000 of Zentralblatt-MATH; the last one (FN) will include the last definitive entries and uncomplete (bibliographical) entries. These EDBM-db's will be derived from ascii files trough the EDBM indexer and delivered on DVD (2 DVD should be necessary) for initialization.
The regular monthly upgrade mechanism will be consist by transferring through the net two ascii tagged files
| f1 : | new and modified entries, |
| f2 : | identifiers (AN) of entries to be deleted. |
At some appropriate time, the file FN will be declared stable and a file FN+1 introduced. A new DVD (with updated F1,...,FN-1 eventually) will be produced.
As ftp is no more considered as a secure protocol, a new one must be chosen (see note below.
This system will apply to the servers of the Zentralblatt-MATH international access net (ZM-IAN).
For regional/national servers, a similar system could be used, but the distribution system of the files (updated EDBM-files or updating ascii file) must be carefully studied : in particular, man resources must be estimated so that the proposed delivery times (1, 2 or 6 months) should be strictly honored (otherwise users would be frustrated).
For local servers, updates every 6 months could be maintained. The core problem is here again that of the distribution.
The objective is to have synchronized servers, with the same interface, so that registered users can use any of them with the same service. Hence, ZM-IAN would be a complete graph, with the hightest symetry of the permutation group of its nodes set (it is not the case nowadays).
The actual list (taken from the Berlin server) is : Athens, Berkeley, Cornell, Berlin, Lecce, Mexico, New-York, Rehovot, Rio, Santiago, Strasbourg.
Any EMIS server is offering now to any ZM user a link to the Berlin server alone: this organisation on EMIS should be changed, so that the user had the choice of his server among the international servers (useful in case of no service on some node of the ZM-IAN)
An international server is missing in the Asian and Oceanic areas.
| Date | Period | Volumes |
|---|---|---|
| 1996/10 |
701-824 |
|
| 1997/09/02 | 1996/01 - 1997/06 | 826-862 |
| 1998/01/20 | 1996/01 - 1997/12 | 826-874 |
| 1998/07/31 | 1996/01 - 1998/06 | 826-887 |
| 1999/03/15 | 1996/01 - 1998/12 | 826-899 |
| 1999/09/13 | 1996/01 - 1999/06 | 826-912 |
| 2000/04/27 | 1996/01 - 1999/12 | 826-924 |
| 2000/09/12 | 2000/01 - 2000/06 | 926-937 |
| Date | Type |
|---|---|
| 1998/01/31 | |
| 1998/03/31 | |
| 1998/06/20 | |
| 1998/07/01 | cd: --> 887 |
| 1998/09 | ftp: 888-891 |
| 1998/11/02 | ftp: 892-895 |
| 1998/11/20 | ftp: 896-897 |
| 1999/01/04 | cd : --> 900 |
| 1999/02/26 | ftp : 901-902 |
| 1999/03/17 | ftp : 903-906 |
| 1999/04/20 | ftp : 907-908 |
| 1999/05/20 | ftp : 909-910 |
| 1999/07/16 | cd : --> 912 |
| 1999/08/17 | cd : --> 914 |
| 1999/09/06 | cd : --> 916 |
| 1999/10/08 | cd : --> 918 |
| 1999/10/15 | cd : --> 920 |
| 1999/11/12 | cd : --> 922 |
| 2000/01/25 | cd : --> 924 |
| 2000/03/06 | cd : --> 929 |
| 2000/04/26 | cd : --> 931 |
| 2000/05/23 | cd : --> 933 |
| 2000/06/26 | cd : --> 935 |
| 2000/07/13 | cd : --> 937 |
| 2000/09/08 | cd : --> 941 |
| 2000/10/16 | cd : --> 943 |
| Date | # IP |
|---|---|
| 1999/03/01 | 678 |
| 1999/03/24 | 688 |
| 1999/07/16 | 866 |
| 1999/10/29 | 1090 |
| 1999/12/14 | 1227 |
| 2000/01/25 | 1251 |
| 2000/02/08 | 1255 |
| 2000/03/07 | 1311 |
| 2000/04/12 | 1324 |
| 2000/06/26 | 1420 |
| 2000/09/08 | 1480 |
| 2000/10/16 | 1496 |
Depending on the exact details of the extraction process, they may be sorted once on the main server, or at each site just before actual updating. (I am in favour of sorting once on the main server).
In the future, to be consistent with the general
Zentralblack box interfaces specifications, there should be a
tool that will generate these files in some xmlish format (to
be specified as part of the central system interfaces
specifications). These files would then have to be parsed in a
more compact format (used by the local updating process), so as
to keep transfer size reasonable. (This may be unnecessary if
we manage to have high frequency updates).
(C. Goutorbe, 2000/11/06)
Could you please explain why "ftp is no more considered as a secure protocol" (M. Jost, 2000/11/01)
The main thing is the fact that the identification (e.g. passwd) circulates on the net without encryption : anybody can sniff it. In this sense, other protocoles (eg telnet) are unsecure also. Such protocoles must be replaced by tools using crypting : ssh (www.ayahuasca.net/ssh/ssh-faq.html) is the most used actually, for direct connections or tunelling unsecure applications. Servers are available on common unix (eg linux) and also clients for most OS (windows, macos, unix).
If the first reason was not sufficient, you could mention the bad state of the ftp servers software, as the frequence of CRC notes shows it.
Lot of information can be found on www.math.jussieu.fr/informatique/acces.html
(L. Guillopé, 2000/11/06)