Below is a description of the most commonly used advanced preferences available on the PatSnap Bio platform
Sequence Identity- This allows you to set what percentage match is allowed in your results. This includes the characters used in the sequence and alignment. The higher the percentage the closer the results should match your sequence.
Query Coverage- This specifies the percentage of your query sequence you wish to match. For example, if I search for a sequence that contains 100 amino acids and I would like to see sequences that match at least 70 amino acids of my query sequence then I would change the slider from 70% to 100%. This will not take account into positioning of the amino acids.
Match with Gaps- Allows for sequences to match and found in your results if the result sequences displays a gap rather than an amino acid.
E-Value - indicates how likely it is that a sequence is similar to yours simply by chance. For instance, if your sequence is very short, there is a higher likelihood that it appears in several locations simply by chance. The greater the e-value, the more likely it is that this is just down to luck.
Algorithm- There are 3 algorithms to choose from depending on the search you wish to perform.
Search results Cap
Within Advanced Preferences, you will have the option to increase your search results by changing the 'Max Target Sequence'. The default selection will be 5000, but this can be changed to 1000 or 10,000. For results greater than 10,000, there will be a range selection option to View Sources.
Advanced Preferences Search Parameters
We covered 2 of the 6 search parameters you can change to get broader or more specific results. The other 4 are the following:
Subject Length - This is the length of the subject the system will look at to match your query against. You can use this parameter to limit your search results based on how long you want your subject to be.
Alignment Identity (%) - The Alignment Identity is a number that describes how similar the query sequence is to the target sequence (how many characters in each sequence are identical). The higher the percent identity is, the more significant the match.
Query Identity (%) - This is the percent of matching amino acids or nucleotides.
Subject Coverage (%) - This is the percentage of the subject sequence that matches the query sequence. If you would like the entire subject to be present in the query sequence, select 100%.
The following are some setting recommendations you might find helpful for different search types:
-
When the target sequence to be retrieved is similar in length to the query sequence, for example, when using Wild-type sequences to find mutant sequences, you might not want to get very short or very long sequence results. In this case, you can set Query Identity and Subject Coverage to 90-100:
-
When you want to get short target sequences by using a long sequence query, for example, using a gene sequence with thousands of base pairs to get a SiRNA or short fragments. In this case, you can set the Alignment identity to 95-100 and Query Coverage to 0-10. Along with this, you can also set a lower word size and use the BlastN algorithm:
-
When using short sequences to retrieve long sequences, for example, if you know the sequence of a linked peptide (GGGGSGGGGSGGGGSGGGGS), and want to find the long sequence containing the short peptide. In this case, you can use an Alignment Identity of 95-100 and Subject Coverage of 0-10. Along with this, you can use the Blastn-short algorithm:
-
To conduct a broad search, it is recommended to use the default settings so that you don't miss out on certain results and also to match with gaps: