LIMES Large Infrastructure in Mathematics
Enhanced Services

Cellule MathDoc : drafts & proposals

Updates procedures : present and future

MathDoc Cell (October 26, 2000, last revision november 6, 2000)

 

  1. Updates : current situation
  2. Updates : new mechanism
Appendix
  1. ZM servers : categories and their main features
  2. ZM servers : notes and comments
  3. Time-table of recent updates
  4. Questions and notes
1. Updates : current situation

The w3-server database (w3-db) is derived from the main database (Compindas system on FIZ Karlsruhe) through the PFS system (trademark by the company Kramer & Hofmann, Saarbrücken ; it is a proprietary system). The w3-db is made of subsets (PFS-db), whose sizes permit download of their PFS version on a cdrom (600 MB is the upper content on such a media). Now there are five PFS-db's : the first four are stable, containing definitive entries, while the last one is changing regularly, by addition of recent entries (incomplete with bibliographical data only or definitive with the review or author's summary) and deletion of superceded entries.

After the production of these PFS-db's produced in the Berlin office, the installation is made on each server by a simple copy of their data part and a mapping of the PFS indexes to EDBM indexes. The necessary software tool for this mapping operation (which is quasi-immediate and doesn't require special memory resources) has been created by C. Goutorbe (MathDoc Cell) in 1997 and is used by Zentralblatt-MATH according to a contract between MathDoc Cell and FIZ.

The transfer of the last PFS-db is done by post mail of the cdrom : transformation to the EDBM-index format is done on each server.

In 1998, C. Goutorbe (MathDoc Cell) realized a software piece to achieve incremental updates. Given three PFS-databases

PFS-db1 : a current PFS data-set
PFS-db2 : a PFS data-set of new complete entries (which mapping to existing bibliographical entries, if any)
PFS-db3 : a PFS data-set of preliminary entries


this tool realized a merging of them in a new EDBM database (EDBM-db).

The EDBM-db consists of the existing definitive entries of pfs-db1. The new complete entries and preliminary entries of pfs-db2 and pfs-db3 are added, while a preliminary entry of pfs-db1 is replaced by its corresponding complete entry in pfs-db3 if it exists. The creation of indexes is not resource consuming and this upgrading system necessitates only the transfer of the (small) databases pfs-db1 and psf-db2.

This incremental procedure has been successfully tested, but never entered in production.

2. Updates : new mechanism

The production of the EDBM database w3-db is under current changes (operating period will begin in april 2001). Ultimately, ascii tagged files will be produced from the core database of Zentralblatt-MATH, from which EDBM-db's (data and index files) will be directly derived. This process is based on a new component of the EDBM suite, an indexer which is elaborated by the MathDoc Cell and will be available very soon. Intellectual property and usage rights attached to this software will be set according to the Limes consortium agreements and the rules of ''free software'': free usage for academic institutions and individuals (not allowed for bundle with non-free or with commercial product), but with due mention of MathDoc Cell copyright.

The EDBM/w3 component (search and retrieval) will stay unaffected by this new production system of the EDBM-db's (change of the database structure, as splitting of some fields, introduction of author or serial better recognition tools,... will modify the EDBM/w3 ZM applications, but these modifications are unrelated to the new updating process).

The initialization will consist by producing N files (F1,...,FN) and corresponding EDBM-db's : the first N-1 EDBM-db's will be stable and correspond to the years 1931-2000 of Zentralblatt-MATH; the last one (FN) will include the last definitive entries and uncomplete (bibliographical) entries. These EDBM-db's will be derived from ascii files trough the EDBM indexer and delivered on DVD (2 DVD should be necessary) for initialization.

The regular monthly upgrade mechanism will be consist by transferring through the net two ascii tagged files

f1 : new and modified entries,
f2 : identifiers (AN) of entries to be deleted.


With monthly update and compression techniques, the size of this files set is expected to be a few megabytes. On each server, a software tool based essentially on the EDBM indexer will update the files EDBM set FN according to the files f1 and f2 (see
note below).

At some appropriate time, the file FN will be declared stable and a file FN+1 introduced. A new DVD (with updated F1,...,FN-1 eventually) will be produced.

As ftp is no more considered as a secure protocol, a new one must be chosen (see note below.

This system will apply to the servers of the Zentralblatt-MATH international access net (ZM-IAN).

For regional/national servers, a similar system could be used, but the distribution system of the files (updated EDBM-files or updating ascii file) must be carefully studied : in particular, man resources must be estimated so that the proposed delivery times (1, 2 or 6 months) should be strictly honored (otherwise users would be frustrated).

For local servers, updates every 6 months could be maintained. The core problem is here again that of the distribution.

Appendix
A. ZM servers : categories and their main features
  1. Zentralblatt-MATH international access net (ZM-IAN)
    1. Bibliographical and review data : monthly updates
    2. Subscribers IP list : monthly update (or less)
    3. Scanned images for reviews not available in full text
    4. Multilinguality for user interface
    5. Usage statistics report delivered to Berlin
    6. Links to document delivery services
  2. Regional and national mirrors
    1. Bibliographical and review data : should be monthly
    2. Distribution procedures must be installed.
    3. Subscribers IP list management locally done
    4. Multilinguality (local languages) Special needs for display of specific languages can be expected in well defined user community (example : greek). Interest for local languages (euskadi, catalan....)
    5. No scanned images for old reviews
    6. Additional features of limited scope + Link to national merged journals catalogue (localisation and collection states), see France.
  3. Local servers
    1. Bibliographical and review data : twice a year
    2. Subscribers IP list management locally done
    3. Multilinguality (local languages)
    4. No scanned images for old reviews
    5. Additional features of limited scope (links to local catalogues, local interfaces)
B. ZM servers : notes and comments
  1. Zentralblatt-MATH international access net

    The objective is to have synchronized servers, with the same interface, so that registered users can use any of them with the same service. Hence, ZM-IAN would be a complete graph, with the hightest symetry of the permutation group of its nodes set (it is not the case nowadays).

    The actual list (taken from the Berlin server) is : Athens, Berkeley, Cornell, Berlin, Lecce, Mexico, New-York, Rehovot, Rio, Santiago, Strasbourg.

    Any EMIS server is offering now to any ZM user a link to the Berlin server alone: this organisation on EMIS should be changed, so that the user had the choice of his server among the international servers (useful in case of no service on some node of the ZM-IAN)

    1. Bibliographical and review data updates
      The monthly schedule is more or less hold (see appendix 3). On 2000/10/15, last volume available on the different servers (Rehovot has been forgotten in this study, apologizes !) is
      • 943 : Berlin ;
      • 941 : Strasbourg, New-York, Lecce, Mexico, Santiago, Berkeley, Rio ;
      • 937 : Cornell ;
      • unavailable : Athens (mythos.hms.gr doesn't appear in the DNS).
      A 2-unit delay (which represents one month production) is unavoidable (because of time of transmission) ; more indicates probably some problem.

      An international server is missing in the Asian and Oceanic areas.

    2. Subscribers IP list
      Monthly update. See appendix 3 for the report on lastly updates
    3. Scanned images for reviews
      As the text database contains the full text of the review only since the vol. 531, 1985, it was proposed to offer the reviews as scanned images. These images are only accessible on (and through) the Berlin server : the vol. 1-217 (1931-1971) are only available for the time being (on 2000/10/15, only the years 1946-1956 were accessible). There is for the moment no indication on the disk storage necessary for installation on a generic server.
    4. Multilinguality for user interface
      It concerns english, french, german (all available), italian, spanish (to be integrated), portugese (planned). Other languages create difficulties on the user level (special fonts and navigator customisation needed, eg greek, russian) or of limited scope (catalan, galician, danish). Set of languages on ZM-IAN servers must kept to the main languages.
      On 2000/10/15, all servers have english entry page, except Santiago (spanish). All servers must have the same language one their entry page (accessed from other ZM-IAN server)
    5. Usage statistics report
      Only available for the Berlin server
    6. Links to document delivery services
      Services present : Göttingen, Bielefeld, Hannover Link to Catalogo Nazionale Periodici Scienze Matematiche (Lecce)
  2. Regional mirrors
    Examples : France (3 servers net), Poland
    1. Bibliographical and review data : should be monthly
    2. Subscribers IP list management locally done
    3. Multilinguality (local languages)
    4. No scanned images for old reviews
    5. Additional features of limited scope (links to specific catalogues, localized document delivery services)
  3. Local servers
    1. Bibliographical and review data : twice a year
    2. Subscribers IP list management locally done
    3. Multilinguality (local languages)
    4. No scanned images for old reviews
    5. Additional features of limited scope (links to local catalogues, local interfaces)
C. Time-table of recent updates
Cederom updates for local servers (in pinciple every 6 months)
(distribution done by Springer)
Date Period Volumes
1996/10   701-824
&
826-837
1997/09/02 1996/01 - 1997/06 826-862
1998/01/20 1996/01 - 1997/12 826-874
1998/07/31 1996/01 - 1998/06 826-887
1999/03/15 1996/01 - 1998/12 826-899
1999/09/13 1996/01 - 1999/06 826-912
2000/04/27 1996/01 - 1999/12 826-924
2000/09/12 2000/01 - 2000/06 926-937
Data updates on Strasbourg ZM-IAN node
Date Type
1998/01/31  
1998/03/31  
1998/06/20  
1998/07/01 cd: --> 887
1998/09 ftp: 888-891
1998/11/02 ftp: 892-895
1998/11/20 ftp: 896-897
1999/01/04 cd : --> 900
1999/02/26 ftp : 901-902
1999/03/17 ftp : 903-906
1999/04/20 ftp : 907-908
1999/05/20 ftp : 909-910
1999/07/16 cd : --> 912
1999/08/17 cd : --> 914
1999/09/06 cd : --> 916
1999/10/08 cd : --> 918
1999/10/15 cd : --> 920
1999/11/12 cd : --> 922
2000/01/25 cd : --> 924
2000/03/06 cd : --> 929
2000/04/26 cd : --> 931
2000/05/23 cd : --> 933
2000/06/26 cd : --> 935
2000/07/13 cd : --> 937
2000/09/08 cd : --> 941
2000/10/16 cd : --> 943
IP list updates (on-line subsribers)
Updates on Strasbourg ZM-IAN node
Date # IP
1999/03/01 678
1999/03/24 688
1999/07/16 866
1999/10/29 1090
1999/12/14 1227
2000/01/25 1251
2000/02/08 1255
2000/03/07 1311
2000/04/12 1324
2000/06/26 1420
2000/09/08 1480
2000/10/16 1496
D. Questions and notes

Need the entries in files FN, f1-3 to be sorted in some special way? (M. Jost, 2000/11/O1)
Since the updating process works by merging, f1 has to be sorted the same way as the current database. f2 may or may not be sorted (by AN), this is actually only an efficiency issue.

Depending on the exact details of the extraction process, they may be sorted once on the main server, or at each site just before actual updating. (I am in favour of sorting once on the main server).

In the future, to be consistent with the general Zentralblack box interfaces specifications, there should be a tool that will generate these files in some xmlish format (to be specified as part of the central system interfaces specifications). These files would then have to be parsed in a more compact format (used by the local updating process), so as to keep transfer size reasonable. (This may be unnecessary if we manage to have high frequency updates).
(C. Goutorbe, 2000/11/06)

Could you please explain why "ftp is no more considered as a secure protocol" (M. Jost, 2000/11/01)

The main thing is the fact that the identification (e.g. passwd) circulates on the net without encryption : anybody can sniff it. In this sense, other protocoles (eg telnet) are unsecure also. Such protocoles must be replaced by tools using crypting : ssh (www.ayahuasca.net/ssh/ssh-faq.html) is the most used actually, for direct connections or tunelling unsecure applications. Servers are available on common unix (eg linux) and also clients for most OS (windows, macos, unix).

If the first reason was not sufficient, you could mention the bad state of the ftp servers software, as the frequence of CRC notes shows it.

Lot of information can be found on www.math.jussieu.fr/informatique/acces.html
(L. Guillopé, 2000/11/06)