genoCNA {genoCN}R Documentation

Copy Number Abbertation

Description

extract genotype and copy number infomration for copy number abberations, which are often observed in tumor tissues

Usage

genoCNA(snpNames, chr, pos, LRR, BAF, pBs, sampleID, 
  Para=NULL, fixPara=FALSE, cnv.only=NULL, estimate.pi.r=TRUE, 
  estimate.pi.b=TRUE, estimate.trans.m=TRUE, outputSeg = TRUE, 
  outputSNP=3, outputTag=sampleID, outputViterbi=TRUE, 
  Ds=c(1e10, 1e10, rep(1e8, 7)), pBs.alpha=0.001, contamination=TRUE, 
  normalGtp=NULL, geno.error=0.01, min.tp=1e-4, max.diff=0.1, 
  distThreshold=1e6, transB=c(0.5,.05,.05,0.1,0.1,.05,.05,.05,.05), 
  epsilon=0.005, K=5, maxIt=200, seg.nSNP=3, traceIt=5)

Arguments

snpNames a vector of SNP names. SNPs must be ordered by chromosme locations
chr chromosomes of all the SNPs specified in snpNames
pos positions of all the SNPs specified in snpNames
LRR Log R Ratio of all the SNPs specified in snpNames
BAF B Allele Frequency of all the SNPs specified in snpNames
pBs population frequency of of all the SNPs specified in snpNames
sampleID symbol/name of the studied sample. Only one sample is studied each time
Para a list of initial parameters for the HMM. If Para is NULL, The default initial parameters: init.Para.CNA is used
fixPara if fixPara is TRUE, the parameters in Para are fixed, and are used directly to calculate posterior probabilities. It is not recommended to set fixPara as TRUE for CNA studies.
cnv.only a vector indicating those CNV-only probes, for which we only consider their Log R ratio. If it is NULL, there is no CNV-only probes
estimate.pi.r to estimate pi.r (proportion of uniform component for LRR) or not. By default, estimate.pi.r=FALSE, and the initial value of pi.r is used to estimate other parameters
estimate.pi.b to estimate pi.b (proportion of uniform component for BAF) or not. By default, estimate.pi.b=FALSE, and the initial value of pi.b is used to estimate other parameters
estimate.trans.m to estimate transition probability matrix or not. By default, estimate.trans.m=FALSE, and the initial value of estimate.trans.m is used to estimate other parameters
outputSeg wether to output the informaiton of copy number altered segments
outputSNP if outputSNP is 0, do not output SNP specific information (genotype, copy number and the corresponding posterior probability); if outputSNP is 1, output the information of the SNPs that are within copy number altered regions; if outputSNP is 2, output the information of all the SNPs
outputTag the prefix of the output files, output of copy number altered segments is written into file outputTag_segment.txt, and output of SNP information is written into file outputTag_SNP.txt
outputViterbi whether to output the copy altered regions identified by the viterbi algorithm. see details
Ds Parameter to for trnansition probability of the HMM. A vector of length N, where N is the number of states in the HMM
pBs.alpha pBs.alpha is the lower limit of population B allele frequency, and the upper limit is 1 - pBs.alpha
contamination whether tissue contamination is considered
normalGtp normalGtp is specified only if paired tumor-normal SNP array is availalble. It is the normal tissue genotype for all the SNPs specified in snpNames, which can only take four different values: -1, 0, 1, and 2. Values 0, 1, 2 correspond to the number of B alleles, and value -1 indicates the normal genotype is missing. By default, it is NULL, then all the normal genotype are set missing (-1)
geno.error probability of genotyping error in normal tissue genotypes
min.tp the minimum of transition probability.
max.diff Due to normalizaiton procedure, the BAF may not be symmetric. Let's use state (AAA, AAB, ABB, BBB) as an example. Ideally, mean values of normal components AAB and ABB, denoted by mu1 and mu2, repectively, should have the relation mu1 = 1-mu2 if BAF is symmetric. However, this may not be true due to normalization procedures. We restrict the difference of mu1 and (1-mu2) by this parameter max.diff.
distThreshold If distance between adjacent probes is larger than distThreshold, restart the transition probability by the default values in transB.
transB The default transition probability.
epsilon see explanation of K
K epsilon and K are used to specify the convergence criteria. We say the estimate.para is converged if for K consecutive updates, the maximum change of parameter estimates in every adjacent step is smaller than epsilon
maxIt the maximum number of iterations of the EM algrithm to estimate parameters
seg.nSNP the mimimum number of SNPs per segment
traceIt if traceIt is a integer n, then the runing time is printed out in every n iterations of the EM algorithm. if traceIt is 0 or negative, no tracing information is printed out.

Value

results are written into output files

Note

Copy number altered regions are identified, by default, based on the SNP level copy number calls. A CNV region boundary is declared simply when the adjacent SNPs have different copy numbers. An alternative approach is to use viterbi algorithm to output the ``best path''. Most time the resutls based on the SNP level copy number calls are the same as the results from viterbi algorithm. For the following up association studies, the SNP level information is more relevant if we examine the assocaiton SNP by SNP.

Author(s)

Wei Sun and Zhengzheng Tang


[Package genoCN version 1.07 Index]