
Welcome to Ancestry Server
- 01
The HAP format -------------------- HAP is a text file format. It contains several meta-information lines followed by ## , one line containing only the # sign, one header line, and those data lines. Each data line contains the information for one SNP, including the chromosome ID, the position of the SNP in the chromosome, the rs ID, and the haplotypes of each individual.
1. An example of the HAP format.
##fileformat=HAPv1.0 ##reference=hg19 #CHROM POS ID NA00001 NA00002 1 126113 . A:A A:A 1 535131 . G:T G:T 1 567239 . C:C C:C 1 570254 . G:A G:A 1 592368 . G:A G:A
2. Meta-information lines. The meta-information lines are the lines after the ## string. A single ‘fileformat’ line is required, it must be the first line in the data file. For example, for HAP version 1.0, this line should be read as: ##fileformat=HAPv1.0 The other meta-information lines are not required by the functioning of the AncestryHub, but we strongly encourage the users to use these lines to provide any informative message here.
3. Header line syntax. The header line names the 3 mandatory columns (CHROM, POS , ID) and additonal columns of sample IDs. Duplicate sample IDs are not allowed. The header line is tab-delimited.
4. Data lines. All data lines are tab-delimited.
1) CHROM: The ID of the chromosome where a SNP is located. All entries for a specific CHROM should form a contiguous block within the HAP file. This information is required.
2) POS: The position of a SNP along a chromosome. Positions must be sorted numerically in increasing order within each chromosome (Integer, Required) This infromation is required. Please note that the current version of AncestryHub 1.0 only supports the GRCh37 (hg19) coordinates.
3) ID: The rs ID of each SNP. Please note that no identifier should be present in more than one data liine. If there is no identifier available, then a dot (‘.’) can be put here to represent a missing value.
4) Haplotypes: An individual's two haplotypes should be put in one column named with the corresponding sample ID, separated by ‘:’. For Diploid calls, it should be A:A, A:G, etc. For haploid calls, e.g. on Y chromosome, male non-pseudoautosomal X, or mitochondrial chromosomes, only one allele should be given. Triploid calls are not supported by this version of AncestryHub. For mmissing values where an allele call cannot be made at a given locus, ‘.’ should be specified for each missing allele, for example ‘.:.’ for a diploid haplotype and ‘.’ for haploid genotype. In all cases, missing values are specified with a dot (‘.’).
- 02
The HAP format -------------------- HAP is a text file format. It contains several meta-information lines followed by ## , one line containing only the # sign, one header line, and those data lines. Each data line contains the information for one SNP, including the chromosome ID, the position of the SNP in the chromosome, the rs ID, and the haplotypes of each individual.
1. An example of the HAP format.
##fileformat=HAPv1.0 ##reference=hg19 # CHROM POS ID NA00001 NA00002 1 126113 . A:A A:A 1 535131 . G:T G:T 1 567239 . C:C C:C 1 570254 . G:A G:A 1 592368 . G:A G:A
- 03
*.vcf.gz file using VCFtools and tabix (including bgzip):
vcf-sort mystudy_chr1.vcf | bgzip -c > mystudy_chr1.vcf.gz
- 04
If you need help on creating this format of files, please click here.
- 05
1) File size. We require that each file is not larger than 1 G;
2) File format. We will check the following:
HAP formate check
Quality control statistics: duplicate sites, SNPs removed, NonSNP sites, monomorphic sites, MAF check.
Check SNPs number for each chromosome or specific area. ( 2000 SNPs required for each chromosome or specifc are)
- 06
1) File size. We require that each file is not larger than 1 G;
2) File format. We will check the following:
VCF check: validity + statistics such as #samples, chromosomes, SNPs, chunks, phased / unphased, reference build.
Quality control statistics: duplicate sites, SNPs removed, NonSNP sites, monomorphic sites, MAF check.
- 07
测试一下内容
- 08
The data will be transferred to our server, a wide array of security measures are enforced:
The complete interaction with the server is secured with HTTPS.
All results are encrypted with an one-time password, thus only you can get them.
- 09
111
- 10
Enter your answer here
- 11
Please cite:
Ma Y, Zhao J, Wong JS, Ma L, Li W, Fu G, Xu W, Zhang K, Kittles RA, Li Y, Song Q. Accurate inference of local phased ancestry of modern admixed populations. Scientific Reports. 2014 Jul 23;4:5800. PMID:25052506