High-throughput genotyping software for short reads
Make sure you have perl and GD module installed.

(1) Build the pseudo-reference sequences of RILs' parents by SNPs.

If you have both the parents' high quality pseudo-molecue sequences, please go to step2.

Run the perl script to build pseudo-reference:

perl PseudoMaker.pl snp_file reference.fa parent_genome

snp_file: The file list all the identified high quality SNPs between one parent and the high quality reference pseudo-molecue sequence.
The file format should be as the following
chromosome01	325	T
chromosome01	335	T
chromosome01	362	A
chromosome01	411	G
chromosome01	482	C
chromosome01	579	A
chromosome01	872	T
chromosome01	1032	T
chromosome01	1141	G
chromosome01	1350	A
...
One SNP per line.
The three column indicate the chromosome, position, and the SNP base, respectively.

reference.fa: The high quality pseudo-molecue sequences of the reference genome, which has been fully sequenced. Basically one sequence per chromosome. And the sequence is FASTA format.

parent_genome: The name of the parent.

The output pseudo-reference file will be parent_genome.fa.


(2) Convert Solexa fastq files to Sanger fastq files, 
After solexa basecall pipeline for pair end sequences, you will get two fastq files for one lane.
The two files will be such as s_1_1_sequence.txt and s_1_2_sequence.txt.
You may rename the file while converting the file format.

To convert Solexa fastq to Sanger fastq, 

Before Solexa_Pipeline1.3:
you can go to MAQ page and download the maq:
http://maq.sourceforge.net/

maq sol2sanger s_1_1_sequence.txt Lane1_1.fastq
maq sol2sanger s_1_2_sequence.txt Lane1_2.fastq

Here, "Lane1" is the lane used for sequecing the RIL, 

For Solexa_Pipeline1.3 or later:
you can go to ftp://ftp.sanger.ac.uk/pub/zn1/solexa/slx2fastq/

./slx2fastq s_1_1_sequence.txt Lane1_1.fastq
./slx2fastq s_1_2_sequence.txt Lane1_2.fastq

(3) Sort and split the fastq files for each RIL.
For 3-bp index, you can run 16 RIL samples per solexa lane.

perl Split16.pl Lane1_1.fastq
perl Split16.pl Lane1_2.fastq

You will get files such as Lane1_1.AAT.fastq, Lane1_1.AAC.fastq...,
and Lane1_2.AAT.fastq, Lane1_2.AAC.fastq...

Here, "AAT" is the tag for a particular RIL.


For 4-bp index, you can run 64 RIL samples per solexa lane.

perl Split64.pl Lane1_1.fastq
perl Split64.pl Lane1_2.fastq

You will get files such as Lane1_1.AAAT.fastq, Lane1_1.AAAC.fastq...,
and Lane1_2.AAAT.fastq, Lane1_2.AAAC.fastq...

Here, "AAT" is the tag for a particular RIL.

Then you must get rid off the unpaired sequences for each RIL(index).

perl Get_Paired.pl Lane1 AAT

So, you get the sequence files for both ends for each index, such as Lane1_1.AAT.PE.fastq Lane1_2.AAT.PE.fastq.


(4.1) The genotype assignment pipeline to deal with SSAHA2 alignment result.
The fastq seuqences should be aligned with both parent genomes by SSAHA2.
The following commands show how to align solexa pair-end(PE) sequences to both parent genomes by using SSAHA2.

./ssaha2-2.3_x86_64 -rtype solexa -mthresh 30 -skip 2 -diff 0 -depth 500 -align 0 -pair 50,900 ./Parent1_genome.fa Lane1_1.AAT.PE.fastq Lane1_2.AAT.PE.fastq > Lane1.AAT.PE.fastq.p1
./ssaha2-2.3_x86_64 -rtype solexa -mthresh 30 -skip 2 -diff 0 -depth 500 -align 0 -pair 50,900 ./Parent2_genome.fa Lane1_1.AAT.PE.fastq Lane1_2.AAT.PE.fastq > Lane1.AAT.PE.fastq.p2

Make sure the SSAHA2 (at least v2.3) has been installed in your computer. 
To get the SSAHA2 package, please go to SSAHA2 page and download them:
http://www.sanger.ac.uk/Software/analysis/SSAHA2/

And then run the genotype assignment pipeline:

perl Ssaha2rlt.pl Lane1 AAT 36

Here "36" is the length for your reads.

(4.2) The genotype assignment pipeline to deal with MAQ alignment result.
The fastq seuqences should be aligned with both parent genomes by MAQ.

Firstly, both parent sequences need to be convert into MAQ *.bfa files, by the following commands:
./maq fasta2bfa Parent1_genome.fa Parent1_genome.bfa
./maq fasta2bfa Parent2_genome.fa Parent2_genome.bfa

Secondly, RILs' fastq sequences need to be convert into MAQ *.bfq files, by the following commands:
./maq fastq2bfq Lane1_1.AAT.PE.fastq Lane1_1.AAT.PE.bfq
./maq fastq2bfq Lane1_2.AAT.PE.fastq Lane1_2.AAT.PE.bfq

Then, you should run the alignment procedure of MAQ:
./maq match -a 900 Lane1.AAT.PE.p1.map Parent1_genome.bfa Lane1_1.AAT.PE.bfq Lane1_2.AAT.PE.bfq
./maq match -a 900 Lane1.AAT.PE.p2.map Parent2_genome.bfa Lane1_1.AAT.PE.bfq Lane1_2.AAT.PE.bfq

After that, the *.maq files should be convert to plain text files, file names are the same as SSAHA2's.
./maq mapview Lane1.AAT.PE.p1.map >Lane1.AAT.PE.fastq.p1
./maq mapview Lane1.AAT.PE.p2.map >Lane1.AAT.PE.fastq.p2

And then run the genotype assignment pipeline:

perl Maq2rlt.pl Lane1 AAT 36

Here "36" is the length for your reads.

(4.3) The genotype assignment pipeline to deal with SOAPaligner alignment result.
To run SOAPaligner, you need to build index files for the reference genome. To format reference sequences: 
./2bwt-builder Parent1_genome.fa
./2bwt-builder Parent2_genome.fa

Then you may search reads against the formatted index files:
./soap -a Lane1_1.AAT.PE.fastq -b Lane1_2.AAT.PE.fastq -D Parent1_genome.fa.index -o Lane1.AAT.PE.fastq.p1.PE -2 Lane1.AAT.PE.fastq.p1.SE -m 50 -x 900
./soap -a Lane1_1.AAT.PE.fastq -b Lane1_2.AAT.PE.fastq -D Parent2_genome.fa.index -o Lane1.AAT.PE.fastq.p2.PE -2 Lane1.AAT.PE.fastq.p2.SE -m 50 -x 900

And you should put pair end (PE) and single end (SE) results together:
cat Lane1.AAT.PE.fastq.p1.PE Lane1.AAT.PE.fastq.p1.SE > Lane1.AAT.PE.fastq.p1
cat Lane1.AAT.PE.fastq.p2.PE Lane1.AAT.PE.fastq.p2.SE > Lane1.AAT.PE.fastq.p2

And then run the genotype assignment pipeline:
perl Soap2rlt.pl Lane1 AAT 36

Here "36" is the length for your reads.

(5) The genotype calling pipeline to get recombination map.

perl Seq2Bin.pl Lane1.AAT.PE.fastq.rlt jap_v4_length_list

Here, genome_length_list is the genome length list of the organisms. 
The file format should be as followed:
chromosome01	45064769
chromosome02	36823111
chromosome03	37257345
chromosome04	35863200
chromosome05	30039014
chromosome06	32124789
chromosome07	30357780
chromosome08	28530027
chromosome09	23843360
chromosome10	23661561
chromosome11	30828668
chromosome12	27757321
...

Lane1.AAT.PE.fastq.rlt is the input file. 

And the output file will be three for each RIL.
Lane1.AAT.bin is the bin file, each line is for one bin.
The six columns indicates chromosome, bin start, bin end, parent1 or 2, last SNP's read, bin length.

Lane1.AAT.combine.png show a combined figure for each RIL.
Each backgroud(grey) bar indicates one chromosome. And There are three colored horizontal bars on each backgroud(grey) bar. 
The first continous bar indicate the bin map. Blue, red and yellow represent regions from parent1, parent2, and heterozygous regions, respectively.
The second and third discontinuous bars show the SNPs distributing on the chromosome. Blue from parent1, and red from parent2.

Lane1.AAT.PE.fastq.rlt.win15.edge is the temp file for genotyping.


(6) Build the bin map and the input files for linkage mapping.
Firstly, make a bin file list for all RILs.
$ls *.bin > rils_file

Secondly, you should creat a file include all traits of each RIL.
The file format is like this:

RILs	Trait1	Trait2	Trait3
RIL_001	94	94	94
RIL_002	124.8	124.8	124.8
RIL_003	103	103	103
RIL_004	99.9	99.9	99.9
RIL_005	121.3	121.3	121.3
RIL_006	101.14	101.14	101.14
RIL_007	124	124	124
RIL_008	.	.	.
RIL_009	153.6	153.6	153.6
RIL_010	161.9	161.9	161.9
RIL_011	136.6	136.6	136.6
RIL_012	169.2	169.2	169.2
...

Here, the first column indicate the RILs' name. And the order should be the same as the filenames in <rils_file>, 
while the RILs' name is changeable.

Then, you can run the following perl script to get the result.

perl Bin2MCD.pl rils_file rils_traits

rils_file.map is the Bin map file. Each line shows the genotypes of one Bin.
The first and second columns indicate the chromosome and physical position (1.0=100Kb).
For other columns, each column represents one RIL. "A" is from parent1, "B" is from parent2, and "H" is heterozyous.

rils_file.map.mcd is the standard input file for linkage mapping software such as Windows QTL Cartographer.

The other 3 files, rils_file.bin.mark, rils_file.edge.comb, rils_file.edge.sort are temp files for Bin map construction.



Thank you for your interest in our high-throughput genotyping software.


If you have questions about this software, you can contact us via: zqiang@ncgr.ac.cn