From - Fri Apr 5 13:39:06 2002 Return-Path: Received: from tthep2.phys.ttu.edu (tthep2.phys.ttu.edu [129.118.41.23]) by castor.ts.infn.it (8.11.6/8.11.6) with SMTP id g343PuY29488 for ; Thu, 4 Apr 2002 05:25:56 +0200 (MET DST) Received: by tthep2.phys.ttu.edu; Wed, 3 Apr 2002 21:26:05 -0600 Date: Wed, 3 Apr 2002 21:26:05 -0600 From: ALANSILL@tthep2.phys.ttu.edu To: Stefano.Belforte@ts.infn.it CC: r.stdenis@physics.gla.ac.uk, badgett@fnal.gov, cranshaw@fnal.gov, ALANSILL@tthep2.phys.ttu.edu Message-Id: <020403212605.20a0082d@tthep2.phys.ttu.edu> Subject: Picking up on database export and offsite needs / grid use Hi Stefano, Picking up the thread from below and trying to update it, I'd like to update you on our current thinking and plans, and to see whether these match what you, Rick, and others who are doing offsite analysis are doing with respect to the Grid. We've seen on several occasions that both experiments and inadvertent use (abuse) can overload the current CDF offline "production" db server, fcdfora1.fnal.gov (= cdfofprd). This is a Sun machine; plans call for future replacement and/or augmentation with a multiprocessor Linux-based machine. Bottlenecks exist both in hardware performance (e.g. SCSI bus layout and implementation) and in the number of simultaneous processes allowed; ultimately the latter is controlled mostly by the number of Fermilab licenses for simultaneous average connections, though some allowance is made by Oracle for "spiking." The number of Fermilab licenses for Oracle total stands now at 135. The offline machine process limit is 200, and we have seen several cases of it being reached and exceeded, e.g. by refused connections, monitoring, and complaints from users. Nonetheless most of these cases seem to be traced to unusual and/or abusive use, for example via an experiment by an offsite institution to export the entire database contents to try an import into MySQL (not recommended for a number of reasons right now, but I'll get to this), timed out processes, etc. Bill Badgett and the Oracle dba's, Anil, etc. have been trying to track down some of the worst of the non-disconnecting processes and make some changes to the CDF calib db code to cut down on this problem. For the moment things are OK, i.e., good under routine conditions, and people are able almost always to connect from offsite and onsite to get their work done. How this might change in the future: 1) The CAF is coming up. This will greatly increase the number of concurrently running jobs. 2) People are running an increasing number of analysis jobs from their desktop workstations, both remotely and from FNAl on site. --> 1 a) and 2 a): both sets of users above will be running on increasingly concentrated data sets. 3) The CDF grid projects and CDF SAM are starting up. SAM makes and will make heavy use of the offline database both to access usual CDF DB entries and to manage & store its own catalog information. 4) CDF is getting more data. (This may seem trivial, and of course is the kind of problem we would like to have at a considerably increased rate, with more luminosity from the accelerator than we are currently getting, but ultimately will have an effect. Also reprocessing of existing datasets with new versions of production will put an increased load on the database and will cause us to refine our procedures for storing multiple copies and versions of calibration constants, etc.) What are we doing about this? So far just talking, but with plans toward the future. - We want to implement a more advanced (note not "Oracle Advanced") version of replication between the online and offline db servers perhaps; between the offline and (new) copies of the offline server definitely. - Some of these copies or duplicates of the offline database, including the data file catalog and/or whatever it grows to in the future, can be housed off-site, and probably should. - Some of these copies can probably be achieved, or at least this is the suggestion from Oracle, by a technology called "Oracle streams." This would require that we move to a newer version of the Oracle software than we are using, is a lot easier to implement than the previous technique that everyone seems to hate of "advanced replication", and seems to be made to order for copying to a limited number of off-site and on-site locations. - We probably will have to look at some kind of organized export to off-site freeware databases also, if only for license limitation reasons (we have a limited number of licenses to share between FNAL and off-site institutions). There are however some limtations in even the best of the currently available freeware solutions that make it impossible to cover everything that the CDF database does right now, even (as far as I know) in MySQL. That's it for the moment. We need volunteers to work on some of the exporting issues -- e.g. to work in a more coordinated way toward solving some of the missing features in MySQL, provide some of the missing features for offsite querying of the database and insertion needed to support SAM, etc. -- let me know if you need a list, and I can provide it if you are interested. We all want to get to the point of analyzing CDF data more smoothly and efficiently, both from off-site and on; I'm just trying to look ahead and see what we can do to avoid a "crash" of the offline database and its exporting system. Let me know if I can do anything more to help, Alan --------------------------------------------------------------------------- > Date: Thu, 15 Mar 2001 08:57:08 -0600 > To: r.stdenis@physics.gla.ac.uk > Cc: Stefano Belforte , > Igor Gorelov , franco.semeria@bo.infn.it > From: Jack Cranshaw > Subject: Re: DB export to Italy > > Stefano, > > Have you tried just using the oracle client which you > get distributed automatically with the offline software? > If you've had problems, then please send us the numbers > on performance or lack thereof. But I would actually push > people to first try just using the database at Fermilab. > > One indicator that this may serve well is that Vladimir > who works with silicon now does his work on a machine > at Texas Tech because the PC there is faster than anything > available at Fermilab, and although the network connection > is poor, the database access is not noticeably different than > at Fermilab. > > We're slowly getting a handle on the export, but I don't see > a viable export of all the pieces of the database that you need > for analysis for at least one month. Due to the chronically > undermanned effort given to the database software, things > move slowly. Offers of help are gladly appreciated. > > Cheers, > > Jack > > "Rick St. Denis" wrote: > > > Dear Stefano > > I have not heard from Mark for some time and lost touch with operations > > of the database as I have been teaching since February. > > So the proposals and intentions for export were fine, but needed action > > from proponents. Jack has had to put as first and only priority the second > > commissioning run. Er. I mean Run II. So I am frankly not sure what > > happened to export and could not chase it. > > We can mumble lots about who has responsibility to provide what, which > > commmittee wants what thing, but if you really want to get the full CDFDB, > > then I think we need someone who is doing what Igor Gorelov of New Mexico > > has done. He has studied oracle, having taken official courses, set up a > > server on a linux box at new mexico and started replication to it. He is > > now going to FNAL for 6 months having realized how much he needs to know > > to actually get a server working and how much he can learn from the > > Fermilab Computing Division people, like Nelly Stanfield and others. > > But even then, you have a whole new problem: the network > > connectivity. The fact is that when we modify the schema -- and we do > > this more than we would have liked -- we have to wipe out the replication > > and refresh it all. As the database has grown, this becomes quite > > demanding! > > Also, when you talk about the full database, you probably dont want > > every silicon raw calibration. But to pick and choose, you need mark's > > solution. If you go with the oracle solution, you are looking at a lot of > > work and resource usage. Kinda like buying a jaguar. > > I will try to find out therefore what mark is up to, but suspect i will > > not get a lot of happiness. > > cheers > > rick > > > > ***************************** > > Dr. Richard St. Denis,Dept. of Phys.& Astr., Glasgow University > > Glasgow G12 8QQ; United Kingdom; UK Phone: [44] (141) 330 5887 > > UK Fax : [44] (141) 330 5881 > > ====================================== > > FermiLab PO Box 500; MS 318 Batavia, Illinois 60510 USA > > FNAL Phone: [00] 1-630-840-2943 FNAL Fax: [00] 1-630-840-2968 > > Sidet: [00] 1-630-840-8630 FCC: [00] 1-630-840-3707 > > > > On Wed, 14 Mar 2001, Stefano Belforte wrote: > > > > > Rick, > > > given the kind of network connectivity we have among italian sites, > > > and between Italy and Fnal, it make sense for us to look for a > > > solution that exports the full CDF DB to just one place and then > > > processes running all over Italy could access this common location > > > rather then having local copies. > > > This would hoepfully simplify our work over there. > > > While this sort of naturally blends (I think) with a full blown > > > Oracle replica, it may still be a work-saving solution also for > > > the freeware export. > > > Franco kindly offered to set up this on a central server in Bologna > > > if/when a proper technical proposal is singled out. > > > We very much need your opinion and input on this and definitely > > > would like to talk about it sometimes in the next couple of weeks > > > when Franco will also be at Fermilab (I am at Fnal all months). > > > We definitely can start by e-mail, just as you find more convenient. > > > > > > Looking forward to hearing from you > > > Stefano > > > -- > > > Stefano Belforte - I.N.F.N. tel : +39 040 375-6261 (fax: 375-6258) > > > Area di Ricerca - Padriciano 99 e-mail: Stefano.Belforte@ts.infn.it > > > 34012 TRIESTE TS - Italy Web : http://www.ts.infn.it/~belforte > > > at Fermilab: CDF trailers 169-N tel: (630)840-8698 > > >