For simple organisms such as viruses and bacteria, we offer de novogenome assembly based on the sequence data alone i.e. with no reference genome used to guide the assembly.
For more complex organisms we offer reference-guided assembly. We will use the most appropriate and up-to-date reference genome for your organism and experimental plan.
- Assembled contigs in FASTA format
- BAM alignment file of sequencing reads
After genome assembly, we can annotate genomes in silico by gene prediction software or in combination with RNA-Seq (transcriptome) data. Gene prediction for all putative genes is carried out by comparing their sequence to several genome databases, and for genes with less sequence similarity, functions can be predicted by comparing with identified functional domains.
If annotated genomes for close relatives exist, the annotations can be improved by transferring gene information to the unannotated genome using sequence alignment approaches.
- Comprehensive list of genes and their coordinates
- Fully annotated genes based upon homology searches
- Validation of annotated genes with the addition of RNA-Seq data
We employ the current best practices for variant calling, resulting in a statistically supported and reliable set of variants. Single nucleotide polymorphisms (SNPs) can be called against any reference genome in any organism or even a combination of genomes compiled as part of a sequencing project for better representation.
In addition to high confidence variants, we report regions of low coverage where we were not able to determine the sequence in that loci. Whole genome, whole exome or targetted DNA-sequencing all enable good variant calling. We can also further combine the list of variants to compare and filter to find disease-causing de novomutations in trio studies, for example.
- Full variant lists
- Filtered variant lists based on your criteria
- Low-coverage regions where variation could not be called
Genetic variants are annotated with information regarding their location in the genome, their variant type (homozygous/heterozygous), function classification for exon variants, amino acid changes, database identifiers for known variants and allele frequencies in genome databases, or your own datasets.
We also provide pathogenicity predictions for exonic variants using prediction software. Ranking and filtering the variants based on the annotations enables easy interpretation of complex genomic data.
- Location and functional annotation of every variant
- Database identifiers for known variants
- Pathogenicity predictions
Copy number variations can be determined from sequence data using our statistical approaches for analysing coverage and allele frequency. The analysis leads to copy numbers for chromosome segments, genes and exons. This information can be further integrated with expression data to find significant effects.
- Copy number for each chromosome, gene, and exon
When whole genome sequencing is coupled with mate-pair information from paired-end sequencing genomic rearrangements like inversions and translocations can be elucidated. These can result in fusion genes that are linked to the formation of cancer, for example. We deliver a list of ranked fusion genes that can be validated with RNA-seq (transcriptomics) sequencing.
- List of potential fusion genes
- List of all rearrangements
We can work with sequence data you have already produced, advise on sequencing requirements so you can use your preferred facility, or we can provide sequencing expertise through one of our carefully selected commercial sequencing providers.
When we use commercial sequencing providers we don’t just work with anyone with a sequencer, we carefully select our providers based on the quality of the sequencing they have provided us in the past.
- Raw FASTQ sequence data for future analyses
- Deliverables for any downstream analyses we carry out at your request, including quality control and PhiX contaminant removal