Shared Flashcard Set

Details

Applied Bioinformatics: Quiz 1
University of Guelph BIOL*3300
139
Biology
Undergraduate 3
02/06/2017

Additional Biology Flashcards

 


 

Cards

Term
Absolute path
Definition
The path printed out by the pwd function. The path to that directory, starting from the root directory. Starts with a / character.
Term
Accession number
Definition
A unique ID for searching for nucleotides in Genbank.
Term
Alignment software
Definition

When you have a read, you must align it with a sequence in the reference genome. This is what alignment software does: it locates the read within the 3 billion bp genome. Allows you to identify nucleotide differences. Includes Bowtie, BWA, and STAR. The user can set criteria for alignements, allowing a certain number of gaps and mismatches between sequences and the reference genome. The more differences allowed, the more false positives and true positives. Output is in the .SAM format. Outcomes of a read are:

1. The read does not match anywhere in the genome. It may be because the reference genome is incomplete, or the read sequence has diverged from the reference genome (indels and SNPs). The software determines how much difference is accepted for a match.

2. The read matches to one site very well.

3. The read matches up equally well to more than one site. This is caused by genome repetition.

4. Paired ends match up at different sites: different fragments of the read match to separate sites, even on separate chromosomes. Caused by DNA fragmenting and moving around of the reference genome.

Term
Allelic differences
Definition
Differences in genes within protein-coding sequences, or within genes that regulate how genes are expressed. Contributes to genetic variation.
Term
ALX homeobox 1 (ALX1)
Definition
The gene which controls beak shape in Darwin's finches. Involved in craniofacial development. A homeobox.
Term
Ancestral trait
Definition
The trait from which the derived trait evolved; it is found in the common ancestor. In Darwin's finches, pointed beaks are the ancestral trait.
Term
APOA1
Definition
A gene at position 5.2 million bp in the chicken genome. Contains a SNP in close linkage with the yellow skinned trait.
Term
Arabidopsis
Definition
Its genome is 130 Mbp.
Term
Array
Definition
Stores zero or more scalars as elements which can be accessed with an integer index. You can add, remove, or replace elements or sets of elements. Denoted with the @ sigil.
Term
Bartter syndrome
Definition
A rare inherited condition, with which the case study GIT 264-1 was diagnosed. Caused by a defect in the kidney's ability to reabsorb sodium. The kidneys remove too much potassium from the body.
Term
Bases mapped to exome
Definition
In exome sequencing, the proportion of DNA sequenced which belongs to the exome. While only exome genes are found on the chip, hybridization is not perfect, and there are hanging fragments which get sequenced as well. Ideally it is 100%. In the GIT 264-1 sequencing, bases mapped to exome was 34.8%.
Term
BCDO2
Definition
The gene for yellow skinned chickens is suspected to be these gene. Encodes betacarotene dioxygenase 2, which cleaves colourful carotenoids into colourless apocarotenoids. A SNP is found in chicken breeds with yellow skin, and not in chicken breeds with white skin; the allele is fixed. When the gene is sequenced in white- and yellow-skinned chicken breeds, they did not find nucleotide changes which would be expected to change protein function, rather they differed at nucleotides important for transcriptional regulation; the yellow allele must regulate BCDO2 differently than the white allele. The mRNA is weakly expressed in yellow skin chickens relative to white skin chickens, but only in skin tissues.
Term
Bioinformatics
Definition
Analysis of molecular data. Uses existing techniques to make sense of information in DNA sequences.
Term
Bipolar disorder
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
Bowtie
Definition

http://bowtie-bio.sourceforge.net/index.shtml

An alignment software. Outputs some optional fields for each alignment in the .SAM format, depending on the type of alignment, including edit distance.

Term
BWA
Definition

http://bio-bwa.sourceforge.net/

An alignment software. Used in the study with Darwin's finches.

Term
Chi square (Χ2)
Definition

Traits must have two possibilities. Tests for an association between a trait and DNA sequences. Uses a contingency table. Gives a p value. The greater the deviation from the expected, given the null hypothesis, the more evidence for association. The Χ2 statistic is larger the stronger the association.

Χ2 = Σ ((observed - expected)2 / expected)

Term
C. elegans
Definition
A roundworm. Its genome is 97 Mbp.
Term
Chicken
Definition

Gallus gallus

Its ancestor is jungle fowl. It has 78 chromosomes, all of which  may have nucleotide differences with each other. Its genome is 1,200 Mbp.

Term
Chromosomal rearrangement
Definition
Includes inversions and translocations. FISH staining can reveal them. They are fairly common. More important for creating variation. Not used to assay DNA polymorphisms. Important for fertility; they can cause unviable versions of genes.
Term
CIGAR string
Definition
The fifth information point given in the .SAM format. Represents how well the query matches the reference. String "3S97M" indicates that 97 bases matched, but 3 were sliced from the end.
Term
Coding single nucleotide variant (cSNV)
Definition
Single nucleotide differences within coding sequences.
Term
Conservation score
Definition
The higer the score, the rarer a SNP is at a certain point, and the more conserved the gene. High scores are found in TFB sites.
Term
Contingency table
Definition
Used for doing a chi square test. A table showing the frequencies of combinations of traits and DNA sequences.
Term
Corn
Definition
Its genome is 2,500 Mbp.
Term
Coronary artery disease
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
Critical value
Definition
The value which is chosen as the significance threshold. Any p value below this value, and the null hypothesis is rejected. Typically it is 0.05, but can be larger in bioinformatics because of high sampling numbers.
Term
Crohn's disease
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
Darwin's finches
Definition

Geospiza sp.

Famous because they are mentioned in Origin of the Species. Have been widely studied as a model of speciation and adaptive evolution. Illustrate young, adaptive radiation. One trait that has diverged is beak shape. Different beak shapes enable the birds to eat different foods. Lamichhaney et al (2015) discovered a likely gene contributing to beak evolution. There were 100 bp reads made, with 10x coverage, for 200 individuals. The reads were aligned using BWA alignment software, and GATK was used to identify SNPs and indels. Birds with blunt beaks should have different alleles than those with pointed beaks at alleles that contribute to beak shape; allele frequencies should differ at these loci. Blunt beak was the derived trait, and had little genetic diversity. The gene controlling beak shape was found to be ALX homeobox 1.

Term
Database
Definition
A repository of information that enables entering and extracting data. Usually consists of tables containing records and fields. A simple database would be a single page with information about students in a class. Usually have a carefully controlled format and restricted vocabulary, so computers can retrieve information easily by searching across fields.
Term
Depth
Definition
The number of times a location within an individual is sequenced.
Term
Derived trait
Definition

Novel trait

The trait which evolved from the acnestral trait. Loci for derived traits have less genetic diversity. In Darwin's finches, blunt beaks are the derived trait.

Term
DNA
Definition
It consists of adenine (A), guanine (G), cytosine (C), and thymine (T). It is double-stranded. Written from 5' end to 3' end. Genes and other attributes can be encoded on either strand.
Term
DNA protein interactions
Definition
A use of sequencing technology. ChIP sequencing is used. Finds where genes for proteins are located.
Term
E. coli
Definition
A bacteria. Its genome is 5 Mbp.
Term
Edit distance
Definition
An optional field in the .SAM format output of Bowtie. It would take two changes to make the query sequence the same as the reference.
Term
Environment
Definition
Contributes to trait variation. Can include diet. Includes things which we don't understand.
Term
Environmental variation
Definition
Variation due to environmental causes. Can be due to known or unknown causes.
Term
Epigenetics
Definition
Changes in chromatin and histones that can affect phenotype. Involves epimutations and epialleles. The DNA sequence is the same, but with chemical differences. Contributes to trait variation.
Term
FST
Definition

A function of allele frequencies between populations. Based on the frequency of heterozygotes expected, given a whole, random mating population, compared to the frequency of heterozygotes observed within subpopulations. If there is low diversity within a population, one expects a low frequency of heterozygotes. A value of 0 indicates no allelic differentiation between populations. A value of 1 indicates maximum allelic differentiation betwen populations.

FST = (HT - HS) / HT

Term
False positive
Definition
A genetic variant that appears to contribute to a trait, but does not. If controls have different ancestry than diseased individuals, then regions of the genome where the two populations differ can associate with disease. This can arise if one population is more susceptible to disease than another.
Term
Fine mapping
Definition
Using more than one site on a chromosome to map the order of genes and their distances from each other. The objective is to identify the location on a chromosome of the sequence that controls a trait. One can map the DNA sequence contributing to trait variation to between two chromosomal positions on the physical map. The mapping can be very precise, down to an individual gene or to a chromosomal region. Crossovers are used to identify gene locations. There may be a weak correlation between genetic and physical distance, so all mapping is approximate.
Term
Flag
Definition
The sum of applicable flags is the second information ponit given in the .SAM format. Gives information about the mate pair mapping, among other things.
Term
Fruit fly
Definition
Its genome is 165 Mbp.
Term
G6PD
Definition
A gene where a mutation at site 90 provides some protection against malaria. It has carried closely linked alleles as it increased in frequency in the population. When homozygous, this mutation causes sickle cell anemia.
Term
Genbank
Definition
An annotated collection of all publicly available DNA sequences, and other sources. You can search the nucleotide database with accession numbers. Output is in Genbank format.
Term
Genbank format
Definition
The output of searching Genbank with an accession number. Contains a summary of information about the sequence, sequence attributes, and the sequence itself. The page is populated by data extracted from the database.
Term
Gene annotation
Definition

https://genome.ucsc.edu/index.html

A genome data browser for many organisms, including chickens is available. Allows you to investigate a region of a chromosome, and shows gene sequence and annotation with genes and protein coding sites. Genome sequence projects produce the DNA sequence of chromosomes, as completely as possible.

Term
Genetic disease
Definition
Mutations that cause disease are either a stop codon, amino acid change, or alteration to splice sites.
Term
Genetic distance
Definition

The distance between two genes can be measured in cM using this formula:

Distance = (# recombinant types / # gametes) x 100

Term
Genetic map
Definition
If the genome is sequenced, SNPs can be located in the sequence.
Term
Genetic variation
Definition
Trait variation that is due to DNA sequence differences or epigenetic differences. May be due to nucleotide differences of major genes, or due to nucleotide differences at many minor genes. May refer to differences between DNA sequences, produced by mutations. Includes allelic differences. Cause gene functional differences through changes to the gene's coding sequences or changes to the gene's regulatory sequences.
Term
Genome
Definition
DNA is housed in the nucleus, in chromosomes. There is also cytoplasmic DNA in michondria and plastids. In humans, there are 3 billion base pairs in the whole genome, with two copies for a total of 6 billion base pairs! This amount of data is equal to a stack of printed paper 1,524 m tall!
Term
Genome analysis toolkit (GATK)
Definition
SNP and indel identification software. Used in the study on Darwin's finches.
Term
Genome-wide association (gwas) studies
Definition
One samples an existing population that is the product of an unknown/complex pedigree to test for associations between nucleotide and trait variation.
Term
Genome-wide association of 14,000 cases of seven common diseases and 3,00 shared controls
Definition
A study on common illnesses with environmental factors and polygenic allele distribution. The final results were in the form of a case matrix and control matrix. Looking for alleles which were present at different frequencies in cases compared to controls. 500,000 loci were sampled; it is likely that some SNP positions are in linkage disequilibrium with neighboring genes that could affect disease. Diseases analyzed include bipolar disorder, coronary artery disease, Crohn's disease, hypertension, rheumatic arthritis, type I diametes, and type II diabetes.
Term
GIT 264-1
Definition
A case study of a Turkish male who at 5 months was evaluated for failure to thrive, dehydration, and diarrhea. Had a premature birth at 30 weeks, and parental consanguinity, with two spontaneous abortions and death of a premature sibling on day 4. Diagnosed with Bartter syndrome. There were 20 homozygous deletions, all in the Database of Genome Variants, but none altering protein coding sequences or known to associate with Bartter syndrome. Performed a whole exome sequencing. Found 2,495 genes homozygous by consanguineous descent, and 1,493 nucleotide variants in coding regions, including 10 in highly conserved proteins, including SLC26A3. They first found SNPs using a pre-manufactured chip, but found no unusual SNPs in important regions. Then they sequenced his exome using massively parallel DNA sequencing, 5 billion bp, with a 100x coverage. Huge amounts of sequences were acquired, and a series of filters were used to diagnose him with a mutation in SLC26A3, causing chloride-losing diarrhea.
Term
Group
Definition
An important aspect of genetic variation analysis. You cannot make a blanket statement about a population based on one group. Genetic variation observed is a function of the group being investigated.
Term
HS
Definition
The proportion of heterozygous individuals expected from two populationis, given each is a separate random mating population.
Term
HT
Definition
The proportion of heterozygous individuals expected from two populations if they form one, random mating population.
Term
Haplotype
Definition
Individuals with the same combination of alleles. Include AB, Ab, aB, and ab.
Term
Hash
Definition
A data structure like an array, but instead of associating a scalar with an index position, it associates a scalar with a string key. The key-value pairs aren't arrayed in the same order they were put in. Denoted with the % sigil.
Term
Heterozygous odds ratio
Definition
[individuals heterozygous for risk allele] / [individuals heterozygous for control allele]
Term
Homeobox
Definition
A group of transcription factors that determine in what cells genes are expressed.
Term
Homozygous SNP
Definition
Every read, from both chromosomes, has a SNP compared to a reference genome. For case GIT 264, there were 9,045, and 117 of them were novel, never seen before. The allele causing disease must be homozygous; if both parents are unaffected by the disease they must be heterozygotes.
Term
Human
Definition
Genome is 3,000 Mbp. A gene is about 10,000 bp. There are 30,000 genes. About 300 Mbp of coding sequences.
Term
Hutt
Definition
Suggested that four species of jungle fowl contributed to the domestic chicken. Yellow skin alleles for BCDO2 seem to come from grey jungle fowl. Grey jungle fowl must have interbred with early domestic chickens.
Term
Hypertension
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
Identical by descent
Definition
Two individuals which have the same allele which arose from a common ancestor.
Term
Identical in state
Definition
Two individuals which have the same alleles, but they did not arise from a common ancestor.
Term
Illumina FASTQ
Definition

A sequencing technology. The manufacturer includes in the output an estimate of the confidence that the sequence is correct. The raw data output is as follows:

1. Line 1 begins with an '@' character and sequence identifier.

2. Line 2 contains the nucleotide sequence. This is the read.

3. Line 3 begins with a '+' character and may be followed by the sequence identifier.

4. Line 4 encodes the quality values for the sequence in line 2. The quality symbols associated with each nucleotide is asci coded, and can be translated into Q scores.

Term
Indel
Definition

Insertion/deletion event

Repeat number differences. A site in the DNA where there is a sequence present in one individual and absent in another. Can be very large, up to megabases, or they can be a few nucleotides.

Term
Jungle fowl
Definition
The natural ancestor of chickens, as thought by Darwin. They all have white skin. Hutt suggested that grey jungle fowl, which have yellow legs, may be the source of the genes for yellow skinned chickens. Found in South Asia. There are four species: red, grey, Ceylon, and green.
Term
Kernel
Definition
The part of an operating system that turns program instructions into commands the hardware understands.
Term
Linkage disequilibrium (D)
Definition

The non-random association of alleles at two or more chromosomal sites within a population. Alleles that are very close together on a chromosome, and they are inherited together. Calculated using haplotype frequencies. D is not meaningful by itself, and is often expressed relative to the maximum possible D given haplotype frequencies. Haplotype frequencies can never be less than zero, so D can never be greater than (FA x Fb) or (Fa x FB), so the lesser of these values will give you the maximum value of D.

FAB = (FA + FB) + D

FAb = (FA + Fb) - D

FaB = (Fa + FB) - D

Fab = (Fa + Fb) + D

Term
Major histocompatibility complex (MHC)
Definition
Has strong geographical variation, mostly in the NW/SE axis.
Term
Manhattan plot
Definition
A way to display tests from multiple sites. The x axis gives the position of tested SNP. The y axis gives the probability of obtaining a test statistic given that the null hypothesis (SNP and trait are not associated) is true. It gets its name because the greaph looks like tall buildings of a cityscape.
Term
Massively parallel sequencing
Definition
A lot of molecules are sequenced. A chip has a collection of DNA sequences, each with 150 bp in paired ends.
Term
Mean base coverage (k)
Definition
The average number of reads per each single nucleotide. In the GIT 264-1 sequencing, the mean base coverage was 40.1.
Term
Mendelian disease
Definition
Diseases caused by Mendelian genes. Relatively rare; there are 2,600. Approximately 85% affect protein coding regions or mRNA splice sites.
Term
Mendelian loci
Definition

Mendelian genes

Have a large effect on phenotype. Includes the gene controlling wrinkled vs. smooth peas in Mendel's experiments. Not typically the situation; it is rare.

Term
Methylation sequencing
Definition
A use of sequencing technology. Gives information on gene function. DNA is treated with a chemical which digests methylated DNA, but not unmethylated DNA. Tells you what sequences are methylated and which aren't.
Term
Missense mutation
Definition
A type of mutation. In the GIT 264-1 sequencing, there were 5,091, and 357 of these were novel. These are the most important types of mutations.
Term
Mouse
Definition
Its genome is 2.9 Gbp.
Term
Multiplexed
Definition
When multiple samples are sequenced in the machine at once. Often done to to reduce cost and data size.
Term
National Centre for Biotechnology Information (NCBI)
Definition

www.ncbi.nlm.gov

A main source for molecular information. Has many databases that contain nucleotide/protein information and relevant literature. One main database is "nucleotide"; a collectioin of sequences from Genbank. It is a little confusing compared to other databases. A record consists of a feature table, summary, and sequence.

Term
Next-generation sequencing
Definition
DNA is hybridized onto a pre-manufactured chip. Each "read" is 75 bp, but lots of reads are made, some of which overlap with each other in the DNA sequence.
Term
Non-synonymous changes
Definition
A mutation which changes amino acid sequence. There were 5,091 in the GIT 264-1 sequencing.
Term
Nuclear genome
Definition
Has 20 - 40 thousand genes. For a diploid organism, there is twice this amount in a somatic cell.
Term
Odds ratio
Definition

The further the value is from 1, the greater the association of allele differences to disease status. The value may be high, but the chances of being a case could still be low.

Odds ratio = [probability of being a case, given a genotype] / [probability of being  a case given another genotype]

Term
Operating system (OS)
Definition
A platform that consists of a specific set of libraries and infrastructure for applications to be built upon and interact with each other. A software package that provides a desktop, shortcuts to applications, a web browser, and media play. Includes Microsoft Windows, Mac OSX, and Ubuntu.
Term
Origin of the Species
Definition
A book by Charles Darwin. It first talks about how people can change traits in animals through breeding, and then goes on to conclude that selectioin must occur in nature as well. Uses Darwin's finches as an example of this.
Term
P value
Definition
Given two independent traits, the probability of getting results like the results observed, or worse, given the null hypothesis is true (no association between alleles). It is the area under the curve on a graph of frequency and Χ2 value. A measure of the probability of your data.
Term
Parent directory
Definition
A directory that contains another directory.
Term
Parental type
Definition
Genotypes which are identical to one of the parents. If there is an association between the two genes, they will be frequent.
Term
Perl
Definition
A programming language often used in bioinformatics to manipulate text based genetic data.
Term
Physical map
Definition
Physical locations of SNPs. You can get ever more precise locations, mapping SNPs onto the genome.
Term
Pig
Definition
Its genome is 2,600 Mbp.
Term
Plants
Definition
There is no ethical concern involved with collecting DNA, unlike in humans.
Term
Plastid genome
Definition
Have dozens of genes. Mitochondria in plants have 100 - 1,000 kb. Mitochodnria in animals have 16 kb. Chloroplasts have around 120 - 160 kb.
Term
Ploidy
Definition
The number of sets of chromosomes in a cell. Humans have two sets.
Term
Population structure
Definition
Alleles are not equally distributed across different subpopulations. Can arise from non-random mating or natural selection.
Term
Premature termination
Definition
A type of mutation where a stop codon is inserted, truncating proteins. In the GIT 264-1 sequencing, there were 33.
Term
Q score
Definition

Encoded in asci symbols in line 4 of Illumina FASTQ outputs. It is based on the p value.

Q = - 10 log10 [probability that base pair is wrong]

Term
Quantile-quantile plot
Definition

Q-Q plot

Plots the theoretical or expected value of a statistic given the null hypothesis on the x axis. Plotted from small to large, with corresponding observed values on the y axis.

Term
Quantitative loci
Definition

Quantitative genes

Have a small effect on phenotype. Many genes contribute to production of phenotype. More complex than Mendelian loci. They are the typical situation.

Term
R2
Definition
Measures association of alleles. It is calculated similarly to D.
Term
Recombinant types
Definition
Genotypes which are recombinant versions of the parents' genotypes due to gene crossovers. If there is an association between the two genes, they will be rare. It is possible for there to be two crossover events, so a recombinant type resembles a parental type, but this is rare.
Term
Relative path
Definition
A path that starts from the current directory. Does not start with a /.
Term
Relative risk
Definition

A ratio of estimated probabilities. The value can be high but have little real-world relevance.

Relative risk = [probability, given A] / [probability, given a]

Term
Rheumatic arthritis
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
Rice
Definition
Its genome is 400 Mbp.
Term
Risk allele
Definition
The allele which is more common in cases than in controls.
Term
Root directory
Definition

First directory

Denoted by a / character alone.

Term
S. cerevisiae
Definition
A yeast. Its genome is 12 Mbp.
Term
.SAM format
Definition

The output of alignment software for sequence alignment. Each aligned sequence gets a single line of output. Information lcoated on the line:

1. Name of read.

2. Sum of all applicable flags.

3. Name of sequence where alignment occurs, indicating chromosome.

4. 1-based offset into the forward reference strand, where the leftmost character of alignment occurs.

5. Mapping quality.

6. CIGAR string.

7. Name of reference sequence where mate's alignment occurs.

8. 1-based offset into the forward reference strand whre the leftmost character of the mate's alignment occurs. Equal to 0 if there is no mate.

9. Inferred insert size. Size is negative if the mate's alignment occurs upstream of this alignment. Size is 0 if there is no mate.

10. Read sequence, reverse complemented if aligned to the reverse stand.

11. ASCII-encoded read quality. Encoded using Phred quality scale, offset by 33, similar to in FASTQ files.

12. Optional fields, often tab-separated. Includes edit distance in Bowtie.

Term
Scalar
Definition
A variable used to hold a single value such as a string, an integer, or a floating point (number with a decimal). You can use them as the number or string. Denoted with the $ sigil.
Term
Scope
Definition
The lifespan and visibility of named entities, most often variables. Created by blocks. Most common scope is lexical.
Term
Sensitivity
Definition
Percentage of known SNPs that are captured by DNA sequencing. In the GIT 264-1 sequencing, sensitivity was 96.3%.
Term
Sequencing technologies
Definition
Can be used for transcriptome sequencing (mRNAs), genome sequencing, genome subset sequencing (exomes, certain restriction sites), DNA-protein interactions (ChIP sequencing), methylation sequencing, and small RNA sequencing. The smallest sequencer generates 400 million paired reads from 400 million molecules, which each read 150 bp, or 300 bp for a paird end: this is equal to 120 Gbp. The largest sequencer generates 1,800 Gbp.
Term
Single nucleotide polymorphism (SNP)
Definition
A site in the DNA which differs between two individuals at one nucleotide. It is the primary tool used for DNA investigation.
Term
SLC26A3
Definition
A highly conserved gene that encodes an epithelial Cl-/HCO3- exchanger. Mutation D652N changes aspartic acid to asparagine, causing congential chloride-losing diarrhea. Case study GIT 264-1 had a homozygous variant in this gene. Screening 39 other patients with suspected Bartter syndrome, it was found that 5 had homozygous mutations in this gene: 3 with watery diarrhea and no renal losses, and 2 with high stool chloride levels.
Term
Small RNA (sRNA)
Definition
A very short nucleotide sequence that can target and regulate sequences of DNA.
Term
SNP detection software
Definition
Identifies SNPs and indels from .SAM outputs. Includes mpileup from SAMTOOLS, and HaplotypeCaller from GATK. Output is variant call format.
Term
Specificity
Definition
Percentage of SNPs that are sequenced correctly. A reference genome is used as a cross-reference. In the GIT 264-1 sequencing, specificity was 98.6%.
Term
Splice site
Definition
A type of mutation. In the GIT 264-1 sequencing, there were 84.
Term
STAR
Definition

https://github.com/alexdobin/STAR

An alignment software.

Term
String
Definition
A series of characters, including numbers, surrounded by either single or double quotes. Single quoted strings are taken as is, with contents not interpolated. Double quoted strings are interpolated. Interpolation involves identification and replacement of special characters with contents by the perl interpreter.
Term
Synonymous changes
Definition

"Wobble"

A type of mutation where there is no change in amino acid sequence. In the GIT 264-1 sequencing, there were 6,462, and 253 of these were novel.

Term
Trait variation
Definition
Caused by genetic variation, epigenetics, and the environment. In this course, the focus is on genetic variation. It is important to refer to a specific attribute and population, because different populations may have different variation.
Term
Transcription factor binding (TFB) site
Definition
An upstream enhancer site with a high conservation score. A motif that occurs before gene sequences. Can change or eliminate transcription binding when it is mutated.
Term
Type I diabetes
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
Type II diabetes
Definition
A common disease with environmental and genetic factors that contribute to it.
Term
UNIX
Definition
A kernel and an operating system. A platform of file management infrastructure and software libraries upon which applications can be built and interact. There are several standards that are variously built.
Term
Variant call format (.vcf)
Definition

Output files of SNP detection software. A TAB-delimitated format with each data line consisting of the following fields:

1. CHROM: Chromosome name.

2. POS: The left-most position of the variant.

3. ID: Unique variatn identifier.

4. REFthe: Reference allele.

5. ALTthe: Alternate allele(s), comma separated.

6. QUAL: Variant/reference quality.

7. FILTER: Filters applied.

8. INFO: Information of variant, semicolon separated.

9. FORMAT: Format of genotype fields. Optionally colon separated.

10. SAMPLE: Sample genotypes and per-sample information.

Term
Variation
Definition
Differences between things.
Term
Wheat
Definition
Its genome is 15,000 Mbp.
Term
While loop
Definition

Runs a block of code multiple times on different pieces of data. Evaluates a condition adn runs a block if the condition is true, like an if statement.

While (condition = true) {do this}

Term
Yellow skinned chickens
Definition
Popular among consumers. The skin colour is a result of accumulation of carotenoid, which is a sign of good health in natural populations. The gene was found to be on chromosome 24, and chromosomal sites close to the gene were found using results from a backcross of F1 to the yellow parent; SNP 1 was found to be 20 cM away from the gene, and SNP 2 was 9 cM away from the gene. In close association with a SNP in APOA1. Allelic variation at any gene upstream or downstream of APOA1 could cause the trait difference. A SNP in the BCDO2 gene is found in chicken breeds with yellow skin, and not in breeds with white skin. The dominant allele is the white allele, which is opposite to what one might expect. It could have one or more nucleotides that changes an amino acid important for enzyme function or a sequence important for transcriptional regulation.
Supporting users have an ad free experience!