API Reference¶
FastxChunk¶
-
struct
FastaChunk¶
-
struct
rabbit::fq::FastqChunk¶ Public Members
-
FastqDataChunk *
chunk¶ chunk data
FastqDataChunk is defined as:
typedef core::DataChunk FastqDataChunk;
-
FastqDataChunk *
-
class
rabbit::fa::FastaFileReader¶ Public Functions
-
FastaFileReader(int fd, FastaDataPool &pool_, uint64 halo = 21, bool isZippedNew = false)¶ FastaFileReader Constructor.
- Parameters
fd: Fasta file descriptor (if fasta file is opened)pool_: Data poolhalo: halo sizeisZippedNew: if true, it will use gzopen to read fileName_
-
bool
Eof() const¶ if it is end of file
-
FastaChunk *
readNextChunk()¶ Read the next chunk.
- Return
FastaChunk pointer if next chunk data has data, else return NULL
-
FastaChunk *
readNextChunkList()¶ Read next listed chunk.
this function make sure one FastaChunk(dataPart) contains at least one whole sequence
- Return
FastaChunk pointer if next chunk data has data, else return NULL
-
void
Close()¶ close mFile
-
int64
Read(byte *memory_, uint64 size_)¶ read data from file
- Parameters
memory_: pointer to store read filesize_: read size (byte)
-
-
class
rabbit::fq::FastqFileReader¶ Public Functions
-
FastqFileReader(const std::string &fileName_, FastqDataPool &pool_, std::string fileName2_ = "", bool isZippedNew = false)¶ FastaFileReader Constructor.
- Parameters
fileName_: Fastq file namepool_: Data poolfileName2_: the second file name if source file is pair-end sequenceisZippedNew: if true, it will use gzopen to read fileName_ and fileName2_
-
FastqFileReader(int fd, FastqDataPool &pool_, int fd2 = -1, bool isZippedNew = false)¶ FastaFileReader Constructor.
- Parameters
fileName_: Fastq file descriptorpool_: Data poolfileName2_: the second file descriptor if source file is pair-end sequenceisZippedNew: if true, it will use gzopen to read fd and fd2
-
FastqDataChunk *
readNextChunk()¶ Read the next chunk.
- Return
FastqChunk pointer if next chunk data has data, else return NULL
-
FastqDataPairChunk *
readNextPairChunk()¶ Read the next paired chunk in parallel (two thread)
- Return
FastqDataPairChunk pointer if next chunk data has data, else return NULL
-
FastqDataPairChunk *
readNextPairChunk1()¶ Read the next paired chunk in single thread.
- Return
FastqDataPairChunk pointer if next chunk data has data, else return NULL
-
ChunkFormater¶
-
namespace
rabbit¶ -
namespace
fa¶ Functions
-
string
getSequence(FastaDataChunk *&chunk, uint64 &pos)¶ Get Sequence from chunk start at position.
- Return
New sequence string if pos < chunk size, else return an empty string (“”)
- Parameters
chunk: The data to get sequence frompos: start postion atchunkdata
-
string
getLine(FastaDataChunk *&chunk, uint64 &pos)¶ Get a new line from chunk start at position.
- Return
New line string if pos < chunk size, else return an empty string (“”)
- Parameters
chunk: The data to get line frompos: start postion atchunkdata
-
int
chunkListFormat(FastaChunk &fachunk, vector<Reference> &refs)¶ Format FASTA chunks(listed) into a vector os
Refenecestruct.- Return
Total number of Reference instance in vector refs.
- Parameters
fachunk: Source FASTA chunk data to formatrefs: Destation vector to store at
-
int
chunkFormat(FastaChunk &fachunk, vector<Reference> &refs)¶ Format FASTA chunks into a vector os
Refenecestruct.- Return
Total number of Reference instance in vector refs.
- Parameters
fachunk: Source FASTA chunk data to formatrefs: Destation vector to store at
-
int
chunkFormat(FastaChunk &fachunk, vector<Reference> &refs, int kmerSize)¶ Format FASTA chunks into a vector of
Refenecestruct (filter out sequence length < kmerSize)
-
Reference
getNextSeq(FastaChunk &fachunk, bool &done, uint64 &pos)¶ Get next
Refenecedata.- Return
A
Referenceinstance start at pos in fachunk- Note
Because FASTA data do not contained quality and other information except sequence infomation, the
Sequencein FASTA equal to theReferencein FASTQ- Parameters
fachunk: Source FASTA chunk data to formatdone: If reach the end of fachunkpos: Start postion in fachunk to format
-
string
-
namespace
fq¶ Functions
-
void
print_read(neoReference &ref)¶
-
int
chunkFormat(FastqChunk *fqChunk, std::vector<neoReference> &data)¶ Format FASTQ chunks into a vector of
neoRefenecestruct (no-copy format)- Return
Total number of neoReference instance in vector
data.- Parameters
fqchunk: Source FASTQ chunk data to formatdata: Destation vector to store atmHasQuality: If the FASTQ data has quality infomation (default: true)
-
int
chunkFormat(FastqDataChunk *fqDataChunk, std::vector<neoReference> &data)¶
-
int
chunkFormat(FastqChunk *fqChunk, std::vector<Reference> &data, bool mHasQuality = true)¶ Format FASTQ chunks into a vector of
Refenecestruct (copy format)- Return
Total number of Reference instance in vector
data.- Parameters
fqchunk: Source FASTQ chunk data to formatdata: Destation vector to store atmHasQuality: If the FASTQ data has quality infomation (default: true)
-
int
chunkFormat(FastqDataChunk *fqDataChunk, std::vector<Reference> &data, bool mHasQuality = true)¶
-
string
getLine(FastqDataChunk *&chunk, int &pos)¶
-
int
neoGetLine(FastqDataChunk *&chunk, uint64_t &pos, uint64_t &len)¶
-
void
-
namespace
-
namespace
std¶
Sequencing data¶
-
struct
neoReference¶ Reference struct that store the FASTA and FASTQ infomation, different from Reference neoReference only record the start position and length of name, sequence, strand and quality base on certain chunk data
basePublic Members
-
uint64_t
pname¶
-
uint64_t
pcom¶ name offset form base
-
uint64_t
pseq¶ comment offset form base
-
uint64_t
pqual¶ sequence offset form base
-
uint64_t
pstrand¶ quality offset form base
-
uint64_t
lname¶ strand offset form base
-
uint64_t
lcom¶ length of name
-
uint64_t
lseq¶ length of comment
-
uint64_t
lqual¶ length of sequence
-
uint64_t
lstrand¶ length of quality
-
uint64_t
gid¶ length of strand
-
uint64_t
DataPool¶
-
template<class
_TDataType>
classrabbit::core::TDataPool¶ DataPool class This class provide an data pool for reusing memory space.
Public Functions
-
TDataPool(uint32 maxPartNum_ = DefaultMaxPartNum, uint32 bufferPartSize_ = DefaultBufferPartSize)¶ Constructor.
- Parameters
maxPartNum_: the maximum number of part contained in DataPool.bufferPartsize_: Bytes of each part in DataPool (eg. 1<<22 means 4MB each part)
-
void
Acquire(DataType *&part_)¶ Acquire an DataType data in DataPool and assign to part_.
Acquired data from availablePartsPool, if there is no available data in availablePartsPool, program will wait.
- Parameters
part_: the pointer to acquireed space
-
void
Release(const DataType *part_)¶ Realease data to DataPool.
Realease the data in part_ to AvailablePartsPool and notify
- Parameters
part_: the pointer to be realesed to DataPool
-