Home :  Research :  People :  News :  Publications :  Software  

Free Software Links

Arlequin 3.0
Genetree (by R. C. Griffiths)
Primer 3
RepeatMasker (a.k.a. RepeatHamster)
Hey! Lab software
Amy's software links
Stop! Hammertime!


UCSC Genome Browser
International HapMap Project
Perlegen Sciences
Seattle SNPs
Family Tree DNA

Geeky Computer Links

The Genomic Analysis and Technology Core
BioDesk Session Manager
Comprehensive Perl Archive Network
Sun Developer Network (SDN)
The PERL Directory

For more links visit the Links page

Software Generated by the Hammer Lab

Note that all software available is released under the GNU Public license (GPL). Meaning that, amongst other things, there are neither warranties nor expectations of correctness. See the GPL

     IMgc Online

The online interface for IMgc. IMgc reads an aligned fasta file and returns the largest non-recombining block of DNA sequence. This is necessary for downstream analyses that require datasets with no evidence of recombination (such as Jody Hey's Isolation-with-Migration (IM) software).

IMgc maximizes the information content of the final dataset (as defined by the user). Users can favour retention of segregating sites relative to individuals, or vice versa. IMgc defaults to an equal weighting of individuals and segregating sites.


Download Jeff Wall's Nx/Na estimation software here
Also, see some of Vincent Plagnol's software here

Perl Scripts


Reads in a fasta file, and returns either the largest non-recombining block (by removing data), or multiple non-recombining blocks (without removing data). This was originally written to generate input for Jody Hey's Isolation-with-Migration (IM) software, although is useful for many other tasks as well.

IMgc requires working installations of Perl, BioPerl, and in particular, the BioPerl::SeqIO module.

      Component Files:

      Read Me

      Download all IMgc components as a .zip file

This is probably the most widely used script in the lab. As this webpage can attest, I am not good at making things look pretty; Wallop, however, tries to buck that trend by making an excel sheet of the segregating sites found in an alignment file. It uses the modules below to help with this feat, which means it can also give you the genomic coordinates of the SNPs (assuming you have a local copy of the genome you are working in). It can also show you probable sequence-finishing errors (see DiploidLD.pm) and/or recombinant haplotypes. It also works with multiple outgroups, which can be handy if they are not all concordant...

Not only is pasta delicious, but it's also a great file format! Pasta files are pseudo-fasta files. The general idea is as follows: each individual is composed of multiple ab1s that may overlap. As such, each person's final fasta sequence is by default the reference sequence, and places where the max phred quality is below a given threshold, the their basecall becomes undefined (N). After this interpolation, the segregating sites are overlaid ontop of the final sequence giving something very close to a fasta file. Note that if you don't call a segregating site, it will be omitted from the final product, so the finisher must be very cognizant of false negatives.

This script is used to give you a visual representation of the distribution of quality values across a locus for your individuals. It uses a fixed window size (eg, 100 bases), and for each individual it will show you what the average quality is for the given range. This is a very useful tool for figuring out where to design primers to patch the wholes in you alignment file. The results are written to an excel sheet...

Script reads in fasta files subsamples data using regular expressions for populations and population sample sizes. Available options also include subsampling of sequences, that include N first and/or N last segregating sites. This script requires the perl module FindSegSites... This script might be useful if you just want 10 Biaka from, say, the hominid project! For this you would type: perl fasta_subsample.pl -popR 'BIA' -popS 10 *fasta

This script is a wrapper script around query_tracedb.pl , both of which let you interrogate the trace archives. call_ncbi.pl, is, however, quite a bit more friendly, in that you can specify that you want fasta files, quality files , etc. for a particular individual or set of individuals. In essence this script crafts the ncbi query for you and uses query_tracedb to envoke the transaction.

This script reads in a file with genetic coordinates(genes, exons etc) and takes as an argument a recombination rate value. As an output it extends each given coordinate for a distance that is equal to recombination rate value. The script uses genetic map files for each chromosome, downloaded from HapMap ,where recombination rates in (cM) are given for chromosome positions. File with chromosome lengths is also required for this script.

Perl Modules

FindSegSites.pm Example Script: segSites.pl

This module takes in a file and finds the segregating sites therein. The return value is a hash, with references to a 2D array of segregating sites amongst other things. This module allows for a reference sequence to be in the file and for outgroups to be present as well. Production-level example scripts: Wallop


This module takes in a reference sequence, which is by definition a subsequence of the genome, the name of the chromosome, and an array of indexes in the reference sequence, and it returns a remapping of the original indexes in the chromosome specified. In genereal we use this to remap SNPs from an alignment (of say a gene), so we can see what the genomic coordinates of the SNPs are (so you can compare your data with, say, Hapmap). Note that Ns in the reference are taken as a delimiter-- ie, if you have two disconnected introns, can pass the reference of the first intron, a few Ns, and the second intron (all concatenated), and the module will remap as appropriate. Recently I also added support for genes on the negative strand, but that code is still a little fresh. Production-level example scripts: Wallop
Note that this module is a pale imitation of Jim Kent's Blat


This module looks at a pair of sites and tells you if there's a 4-gamete violation. Although this is a little unexciting, it spruces things up by telling you if there must be a 4-gamete violation (ie, it fails the 4-gamete test) and the data contain IUPAC codes (and yes, the ambiguous codes can drive a 4-gamete violation). As if it couldn't possibly get more exciting, it will also give you the minimal set of individuals you can remove such that the pair of sites is no longer a 4-gamete violation...

ParseAceWithReference.pm (see the top of the module for a description) This is used to parse ace/phd files from consed.
AceFileQuality.pm (see the top of the module for a description). This is used to give quality information about ace/phd files from consed
AceConstants.pm (see the top of the module for a description) This is a "helper" module used to get/set constants used by the two consed modules above...
For scripts that use these ace modules see Consed2Pasta and qualityThis.

Note that my Perl modules use the Mutator method , which means that global variables in the Perl module are used (and set to defaults a la our own in-house formatting). If you want to change them, there will be getter/setter methods. EG, our fasta files have a reference sequence, which is a subsection of the human genome. No matter what, for all loci, the reference sequence is called 'ref', and our modules have that as their default value. If you want to change the reference to 'foo', you'll have to say something like: SegSites::setReference('foo');

More from the Hammer lab coming soon!

Copyright © 2006 The University of Arizona
W3C: HTML 4.01 W3C: CSS