Dear all,
I spent some time investigating what could be done to
improve timing for 4/5 SVT operation.
In lack of smart ideas, I looked at the feasibility of
inserting a "ghost buster" stage before the TF.
For simplicity I call this a "road buster" (RB) board.
I was immensely helped by Alberto Annovi and Bill Ashmanskas.
There are 2 steps to this:

1. what can we gain
2. how can we do it in real life


1. has been already investigated independently by Alberto,
   I will use some plots of mine just for convenience, but
   when we compard apples with apples we got the same numbers.
   As most of you know there is room for substantially reducing
   the TF load by removing the roads with 4 hits that are
   a subset of a 5-hit road. There is then a more modest gain
   by removing 4-hit roads that differ only for the SuperStrip
   of the 5-th (missing) SVX layer. I call the 1st category 4in5,
   the second 4in4 ghosts (or duplicates). There

2. It has been widely suggested to use a properly reprogrammed GB
   board to do this. Question was: will the needed logic (and
   mostly the needed storage of roads to be compared) fit ?
   There are 3 scenarios in which a GB can be made to work as a RB,
   in all of these we will have AMS set again to the mode of
   outputting all 5/5 roads before the 4/5 (takes 700 nsec more).
   2.A: RB after HB, 4 boards
        the RB receives the hit-road packets, uses the hits to
        find the list of SS for each roads (can be done without
        a full-fledged 128K SSMAP RAM) and creates for each road
        a vector of 72 bits (12-bit x 6 layers). As roads arrive
        the vector is compared with all stored ones, if no match,
        the road is sent out and store, otherwise is killed.
        All the logic fits in the GB Apex, each GB board handles
        3 wedges. I have described this in Verilog using a brute
        force approach, 30 72-bit registers are instantiated
        to hold the "non busted" roads as they arrive. 30 is
        large enough that should not add inefficiency. Then
        a brute-force comparison is made in each clock cycle
        between the 30 12-bit register holding the SS value for
        each layer with the 12 bit of the incoming hit SS, and
        6 "match" bits are generated and later anded. This
        monster combinatorics apparently takes 14nsec (according
        to Quartus). I assume that as hits arrive the SS is
        obtained on-the fly without an SSmap. For SVX hit the
        present algorithm from hit to SS is as simple as
        "take 10 bits from the hit word" then add z(0-5)*1000.
        I am assuming this can be replace by z*1024, i.e. take
        9 bits from hit word and add 3 bits of z, left shifted
        by 9. Since SS are wide (at present) for 4/5, should be OK.
        Probably this o very similar one can be OK also with
        narrow patterns. At worst, the 3 z bits and e.g. the 5MSB
        of the hit word can be pased throught a 8x8 LUT using one
        of the many unused Apex embedded RAM blocks. Same for
        Layer 5 (XFT), here the incoming data is already 12-bit,
        it can simle be used "as it is" without mapping to the
        local phi (as present SS map does) (in the verilog I
        treat layer 5 the same way as SVX layers).
        As hit arrive, they shoudl also be sent to the TF, so no
        time is needed once the road is processed and declared
        good/no-good. Only road to be busted can be flagged, e.g.
        by overwriting wedge number with 15, then the TF will
        have to be tought to properly deal with this.

   2.B RB before HB, 4 boards.