Class HashStore


  • public class HashStore
    extends java.lang.Object
    HashStore is a sorted, basic-block aware, store for Instruction "n-grams" to help quickly match similar sequences of Instructions between two functions. The Instructions comprising a single n-gram are hashed for quick lookup by the main matching algorithm (HashedFunctionAddressCorrelation). Hash diversity is important to minimize collisions, even though the number of hashes calculated for a single function pair match is small. Hashes are built and sorted respectively using the calcHashes() and insertHashes() methods. The main sort is on the number of collisions for a hash (indicating that there are duplicate or near duplicate instruction sequences), the hashes with fewer (or no) duplicates come first. The secondary sort is on "n", the number of Instructions in the n-gram, which effectively describes the significance of the match, or how unlikely the match is to occur at random. The main matching algorithm effectively creates a HashSort for both functions, and then in a loop calls hash = getFirstEntry() on one side to get the most significant possible match getEntry(has) to see if there is a matching n-gram on the other side If there is a match it is declared to the sort with the matchHash() call, allowing overlapping n-grams to be removed and deconflicting information to be updated. If there is no match, hashes can be removed with the removeHash() method to allow new hashes to move to the top of the sort. The store uses a couple of methods to help deconflict very similar sequences of instructions within the same function. Primarily, the sort is basic-block aware. All n-grams are contained within a single basic block, and when an initial match is found, hashes for other n-grams within that block (and its matching block on the other side) are modified so that n-grams within that block pair can only match each other.
    • Method Detail

      • getTotalInstructions

        public int getTotalInstructions()
        Returns:
        total number of Instructions in the whole function
      • numMatchedInstructions

        public int numMatchedInstructions()
        Returns:
        number of instructions that have been matched so far
      • removeHash

        public void removeHash​(HashEntry hashEntry)
        Remove a particular HashEntry. This may affect multiple instructions.
        Parameters:
        hashEntry - is the entry
      • calcHashes

        public void calcHashes​(int minLength,
                               int maxLength,
                               boolean wholeBlock,
                               boolean matchOnly,
                               HashCalculator hashCalc)
                        throws MemoryAccessException
        Calculate hashes for all blocks
        Parameters:
        minLength - is the minimum length of an n-gram for these passes
        maxLength - is the maximum length of an n-gram for these passes
        wholeBlock - if true, allows blocks that are smaller than the minimum length to be considered as 1 n-gram.
        matchOnly - if true, only generates n-grams for sequences in previously matched blocks
        hashCalc - is the hash function
        Throws:
        MemoryAccessException
      • insertHashes

        public void insertHashes()
        Insert all hashes associated with unknown (i.e not matched) blocks and instructions
      • matchHash

        public void matchHash​(HashStore.NgramMatch match,
                              java.util.List<Instruction> instResult,
                              java.util.List<CodeBlock> blockResult)
        Mark a particular n-gram hash and instruction as having a match. Set of instructions covered by n-gram are removed, and data structures are updated
        Parameters:
        match - is the n-gram being declared as a match
        instResult - collects the explicit set of Instructions matched
        blockResult - collects the explicit set of CodeBlocks matched
      • extendMatch

        public static void extendMatch​(int nGramSize,
                                       InstructHash srcInstruct,
                                       HashStore.NgramMatch srcMatch,
                                       InstructHash destInstruct,
                                       HashStore.NgramMatch destMatch,
                                       HashCalculator hashCalc)
                                throws MemoryAccessException
        Try to extend a match on a pair of n-grams to the Instructions right before and right after the n-gram. The match is extended if the Instruction adjacent to the n-gram, and its corresponding pair on the other side, hash to the same value using the hash function. The NgramMatch objects are updated to reflect the original n-gram match plus any additional extension.
        Parameters:
        nGramSize - is the original size of the matching n-gram.
        srcInstruct - is the first Instruction in the "source" n-gram
        srcMatch - is the "source" NgramMatch object to be populate
        destInstruct - is the first Instruction in the "destination" n-gram
        destMatch - is the "destination" NgramMatch object to populate
        hashCalc - is the hash function object
        Throws:
        MemoryAccessException
      • getUnmatchedInstructions

        public java.util.List<Instruction> getUnmatchedInstructions()
        Returns:
        list of unmatched instructions across the whole function
      • clearSort

        public void clearSort()
        Clear the main sort structures, but preserve blocks and instructions
      • isEmpty

        public boolean isEmpty()
        Returns:
        true if there are no n-grams left in the sort
      • getFirstEntry

        public HashEntry getFirstEntry()
        Returns:
        the first HashEntry in the sort. The least number of matching n-grams and the biggest n-gram.
      • getEntry

        public HashEntry getEntry​(Hash hash)
        Get the HashEntry corresponding to a given hash
        Parameters:
        hash - is the Hash to match
        Returns:
        the set of n-grams (HashEntry) matching this hash
      • getBlock

        public Block getBlock​(Address addr)
        Get the basic-block with the corresponding start Address
        Parameters:
        addr - is the starting address
        Returns:
        the Block object
      • getMonitor

        public TaskMonitor getMonitor()
        Returns:
        the TaskMonitor for this store