Free Software Links
Genetree (by R. C. Griffiths)
RepeatMasker (a.k.a. RepeatHamster)
Hey! Lab software
Amy's software links
UCSC Genome Browser
International HapMap Project
Family Tree DNA
Geeky Computer Links
The Genomic Analysis and Technology Core
BioDesk Session Manager
Comprehensive Perl Archive Network
Sun Developer Network (SDN)
The PERL Directory
For more links visit the Links page
|Software Generated by the Hammer Lab
Note that all software available is released under the GNU Public license (GPL).
Meaning that, amongst other things, there are neither warranties nor expectations of correctness.
See the GPL
The online interface for IMgc. IMgc reads an aligned fasta file and returns the
largest non-recombining block of DNA sequence. This is necessary for downstream
analyses that require datasets with no evidence of recombination (such as Jody Hey's
Isolation-with-Migration (IM) software).
IMgc maximizes the information content of the final dataset (as defined by the user). Users can favour retention of segregating sites relative to individuals, or vice versa. IMgc defaults to an equal weighting of individuals and segregating sites.
Nx/Na estimation software
Also, see some of
Vincent Plagnol's software
Reads in a fasta file, and returns either the largest non-recombining
block (by removing data), or multiple non-recombining blocks
(without removing data). This was originally written to generate input for Jody
(IM) software, although is useful for many other tasks as well.
IMgc requires working installations of Perl, BioPerl, and in particular, the
IMgc components as a .zip file
This is probably the most widely used script in the lab. As this webpage can attest,
I am not good at making things look pretty; Wallop, however, tries to buck that trend
by making an excel sheet of the segregating sites found in an alignment file. It uses
the modules below to help with this feat, which means it can also give you the genomic
coordinates of the SNPs (assuming you have a local copy of the genome you are working in).
It can also show you probable sequence-finishing errors (see DiploidLD.pm) and/or recombinant
haplotypes. It also works with multiple outgroups, which can be handy if they are not all
Not only is pasta delicious, but it's also a great file format! Pasta files are pseudo-fasta
files. The general idea is as follows: each individual is composed of multiple ab1s that may overlap.
As such, each person's final fasta sequence is by default the reference sequence, and places where
the max phred quality is below a given threshold, the their basecall becomes undefined (N). After this
interpolation, the segregating sites are overlaid ontop of the final sequence giving something very close
to a fasta file. Note that if you don't call a segregating site, it will be omitted from the final product,
so the finisher must be very cognizant of false negatives.
This script is used to give you a visual representation of the distribution of quality values across a locus
for your individuals. It uses a fixed window size (eg, 100 bases), and for each individual it will show you
what the average quality is for the given range. This is a very useful tool for figuring out where to design primers
to patch the wholes in you alignment file. The results are written to an excel sheet...
Script reads in fasta files subsamples data using
regular expressions for populations
and population sample sizes. Available options also include subsampling of sequences,
that include N first and/or N last segregating sites. This script requires the perl module FindSegSites...
This script might be useful if you just want 10 Biaka from, say, the
For this you would type: perl fasta_subsample.pl -popR 'BIA' -popS 10 *fasta
This script is a wrapper script around
query_tracedb.pl , both of which let you interrogate the
trace archives. call_ncbi.pl, is, however, quite a bit more friendly,
in that you can specify that you want fasta files, quality files , etc. for a particular individual or set of individuals. In
essence this script crafts the ncbi query for you and uses query_tracedb to envoke the transaction.
This script reads in a file with genetic coordinates(genes, exons etc) and
takes as an argument a recombination rate value. As an output it extends each given coordinate for a
distance that is equal to recombination rate value. The script uses genetic map
files for each chromosome, downloaded from
,where recombination rates in (cM) are given for chromosome positions. File
with chromosome lengths is also required for this script.
FindSegSites.pm Example Script:
This module takes in a file and finds the segregating sites therein.
The return value is a hash, with references to a 2D array of segregating sites
amongst other things. This module allows for a reference sequence to be in the file
and for outgroups to be present as well.
Production-level example scripts: Wallop
This module takes in a reference sequence, which is by definition a subsequence of
the genome, the name of the chromosome, and an array of indexes in the reference sequence,
and it returns a remapping of the original indexes in the chromosome specified. In genereal we use
this to remap SNPs from an alignment (of say a gene), so we can see what the genomic
coordinates of the SNPs are (so you can compare your data with, say, Hapmap).
Note that Ns in the reference are taken as a delimiter-- ie, if you have two disconnected introns,
can pass the reference of the first intron, a few Ns, and the second intron (all concatenated), and the module
will remap as appropriate. Recently I also added support for genes on the negative strand,
but that code is still a little fresh.
Production-level example scripts: Wallop
Note that this module is a pale imitation of
This module looks at a pair of sites and tells you if there's a 4-gamete violation. Although
this is a little unexciting, it spruces things up by telling you if there must be a 4-gamete
violation (ie, it fails the 4-gamete test)
and the data contain IUPAC codes
(and yes, the ambiguous codes can drive
a 4-gamete violation). As if it couldn't possibly get more exciting, it will also give you
the minimal set of individuals you can remove such that the pair of sites is no longer a 4-gamete
ParseAceWithReference.pm (see the top of the module for a description) This
is used to parse ace/phd files from
AceFileQuality.pm (see the top of the module for a description). This is used to
give quality information about ace/phd files from
AceConstants.pm (see the top of the module for a description)
This is a "helper" module used to get/set constants used by the two
For scripts that use these ace modules see Consed2Pasta and qualityThis.
Note that my Perl modules use the
, which means that global variables in the Perl module are used (and set to
defaults a la our own in-house formatting). If you want to change them, there
will be getter/setter methods. EG, our fasta files have a reference sequence, which
is a subsection of the human genome. No matter what, for all loci, the reference sequence is
called 'ref', and our modules have that as their default value. If you want to change the reference
to 'foo', you'll have to say something like: SegSites::setReference('foo');
More from the Hammer lab coming soon!