Go
crazy!
You can test IMgc with this example
file.
IMgc is written in Perl. The source code and installation
instructions are available for download here.
Please cite IMgc as:
Woerner, A.E., M.P. Cox and M.F. Hammer. (2007) Recombination-Filtered Genomic
Datasets by Information Maximization. Bioinformatics23:1851-1853.
PDF
Questions and comments may be directed to:
August
Woerner
Murray
Cox
Michael
Hammer
|
| About IMgc Online:
|
IMgc reads an aligned fasta file and returns the largest non-recombining
block of DNA sequence. This is necessary for downstream analyses that require
datasets with no evidence of recombination (such as Jody Hey's IM).
IMgc maximizes the information content of the final dataset (as defined
by the user). Users can favour retention of segregating sites relative to
individuals, or vice versa. IMgc defaults to an equal weighting of
individuals and segregating sites.
Notes
The chromosome copy weighting parameter α changes the
retention of chromosome copies Cr relative to segregating
sites Sr. This relationship is defined by the inclusiveness
score I, such that I =
SrCrα
Indels are included for purposes of identifying four-gamete violations. Sequence
data must be haplotypic and fully phased (i.e., ambiguity codes are not
allowed). The characters GATCN are permitted,
where N signifies missing data and indicates an indel.
| Complex multi-base indels are sometimes observed,
e.g.:
|
ATT
--T
ATT
--T
---
| |
IMgc treats these three bases as a single unit. This is an infinite sites
violation, and IMgc currently changes all but the two highest frequency
character states at a site violating the infinite sites model to N.
| The example above would
become: |
ATT
--T
ATT
--T
NNN
| |
|
|