Tuesday, March 10, 2009

Simple statistical compression algorithm .doc (Paper Presentation)

Abstract :

This paper introduces a novel algorithm for biological sequence compression that

makes use of both statistical properties and repetition within sequences. A panel of

experts is maintained to estimate the probability distribution of the next symbol in

the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of

the biological sequence. Each symbol is then encoded by arithmetic coding. Most compression algorithms fall into one of two categories, namely substitutional

compression and statistical compression. Those in the former class replace a long

repeated subsequence by a pointer to an earlier instance of the subsequence or to

an entry in a dictionary Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time.

1. Introduction

Modelling DNA and protein sequences is an important step in understanding biology.

Deoxyribonucleic acid (DNA) contains genetic instructions for an organism.

A DNA sequence is composed of nucleotides of four types: adenine (abbreviated A),

cytosine (C), guanine (G) and thymine (T). In its double-helix form, two complementary

strands are joined by hydrogen bonds joining A with T and C with G. The

reverse complement of a DNA sequence is also considered when comparing DNA

sequences. Certain regions in a





…………….So on ..........(download any of the following links to get complete paper presentation in word document)

Photobucket

Ziddu Link

Uploaded.to Link

Mediafire Link

Adrive Link

Rapidshare Link

No comments:

Post a Comment