API Reference

FastxChunk

struct FastaChunk
struct rabbit::fq::FastqChunk

Public Members

FastqDataChunk *chunk

chunk data

FastqDataChunk is defined as:

typedef core::DataChunk FastqDataChunk;

struct rabbit::fq::FastqPairChunk

Public Members

FastqDataPairChunk *chunk

chunk data

class rabbit::fa::FastaFileReader

Public Functions

FastaFileReader(int fd, FastaDataPool &pool_, uint64 halo = 21, bool isZippedNew = false)

FastaFileReader Constructor.

Parameters
  • fd: Fasta file descriptor (if fasta file is opened)

  • pool_: Data pool

  • halo: halo size

  • isZippedNew: if true, it will use gzopen to read fileName_

bool Eof() const

if it is end of file

FastaChunk *readNextChunk()

Read the next chunk.

Return

FastaChunk pointer if next chunk data has data, else return NULL

FastaChunk *readNextChunkList()

Read next listed chunk.

this function make sure one FastaChunk(dataPart) contains at least one whole sequence

Return

FastaChunk pointer if next chunk data has data, else return NULL

void Close()

close mFile

int64 Read(byte *memory_, uint64 size_)

read data from file

Parameters
  • memory_: pointer to store read file

  • size_: read size (byte)

class rabbit::fq::FastqFileReader

Public Functions

FastqFileReader(const std::string &fileName_, FastqDataPool &pool_, std::string fileName2_ = "", bool isZippedNew = false)

FastaFileReader Constructor.

Parameters
  • fileName_: Fastq file name

  • pool_: Data pool

  • fileName2_: the second file name if source file is pair-end sequence

  • isZippedNew: if true, it will use gzopen to read fileName_ and fileName2_

FastqFileReader(int fd, FastqDataPool &pool_, int fd2 = -1, bool isZippedNew = false)

FastaFileReader Constructor.

Parameters
  • fileName_: Fastq file descriptor

  • pool_: Data pool

  • fileName2_: the second file descriptor if source file is pair-end sequence

  • isZippedNew: if true, it will use gzopen to read fd and fd2

FastqDataChunk *readNextChunk()

Read the next chunk.

Return

FastqChunk pointer if next chunk data has data, else return NULL

FastqDataPairChunk *readNextPairChunk()

Read the next paired chunk in parallel (two thread)

Return

FastqDataPairChunk pointer if next chunk data has data, else return NULL

FastqDataPairChunk *readNextPairChunk1()

Read the next paired chunk in single thread.

Return

FastqDataPairChunk pointer if next chunk data has data, else return NULL

ChunkFormater

namespace rabbit
namespace fa

Functions

string getSequence(FastaDataChunk *&chunk, uint64 &pos)

Get Sequence from chunk start at position.

Return

New sequence string if pos < chunk size, else return an empty string (“”)

Parameters
  • chunk: The data to get sequence from

  • pos: start postion at chunk data

string getLine(FastaDataChunk *&chunk, uint64 &pos)

Get a new line from chunk start at position.

Return

New line string if pos < chunk size, else return an empty string (“”)

Parameters
  • chunk: The data to get line from

  • pos: start postion at chunk data

int chunkListFormat(FastaChunk &fachunk, vector<Reference> &refs)

Format FASTA chunks(listed) into a vector os Refenece struct.

Return

Total number of Reference instance in vector refs.

Parameters
  • fachunk: Source FASTA chunk data to format

  • refs: Destation vector to store at

int chunkFormat(FastaChunk &fachunk, vector<Reference> &refs)

Format FASTA chunks into a vector os Refenece struct.

Return

Total number of Reference instance in vector refs.

Parameters
  • fachunk: Source FASTA chunk data to format

  • refs: Destation vector to store at

int chunkFormat(FastaChunk &fachunk, vector<Reference> &refs, int kmerSize)

Format FASTA chunks into a vector of Refenece struct (filter out sequence length < kmerSize)

Return

Total number of Reference instance in vector refs.

Parameters
  • fachunk: Source FASTA chunk data to format

  • refs: Destation vector to store at

  • kmerSize: Formated Reference’s sequence length < kmerSize will be dropout

Reference getNextSeq(FastaChunk &fachunk, bool &done, uint64 &pos)

Get next Refenece data.

Return

A Reference instance start at pos in fachunk

Note

Because FASTA data do not contained quality and other information except sequence infomation, the Sequence in FASTA equal to the Reference in FASTQ

Parameters
  • fachunk: Source FASTA chunk data to format

  • done: If reach the end of fachunk

  • pos: Start postion in fachunk to format

namespace fq

Functions

void print_read(neoReference &ref)
int chunkFormat(FastqChunk *fqChunk, std::vector<neoReference> &data)

Format FASTQ chunks into a vector of neoRefenece struct (no-copy format)

Return

Total number of neoReference instance in vector data.

Parameters
  • fqchunk: Source FASTQ chunk data to format

  • data: Destation vector to store at

  • mHasQuality: If the FASTQ data has quality infomation (default: true)

int chunkFormat(FastqDataChunk *fqDataChunk, std::vector<neoReference> &data)
int chunkFormat(FastqChunk *fqChunk, std::vector<Reference> &data, bool mHasQuality = true)

Format FASTQ chunks into a vector of Refenece struct (copy format)

Return

Total number of Reference instance in vector data.

Parameters
  • fqchunk: Source FASTQ chunk data to format

  • data: Destation vector to store at

  • mHasQuality: If the FASTQ data has quality infomation (default: true)

int chunkFormat(FastqDataChunk *fqDataChunk, std::vector<Reference> &data, bool mHasQuality = true)
string getLine(FastqDataChunk *&chunk, int &pos)
int neoGetLine(FastqDataChunk *&chunk, uint64_t &pos, uint64_t &len)
namespace std

Sequencing data

struct Reference

Reference struct that store the FASTA and FASTQ infomation.

Public Members

std::string name
std::string comment
std::string seq
std::string quality
std::string strand
uint64_t length
uint64_t gid
struct neoReference

Reference struct that store the FASTA and FASTQ infomation, different from Reference neoReference only record the start position and length of name, sequence, strand and quality base on certain chunk data base

Public Members

uint64_t pname
uint64_t pcom

name offset form base

uint64_t pseq

comment offset form base

uint64_t pqual

sequence offset form base

uint64_t pstrand

quality offset form base

uint64_t lname

strand offset form base

uint64_t lcom

length of name

uint64_t lseq

length of comment

uint64_t lqual

length of sequence

uint64_t lstrand

length of quality

uint64_t gid

length of strand

rabbit::byte *base

global id

typedef Reference OneSeqInfo

One sequence sequence infomation, only for FASTA data.

typedef std::vector<Reference> SeqInfos

Sequence infomations, only for FASTA data.

DataPool

template<class _TDataType>
class rabbit::core::TDataPool

DataPool class This class provide an data pool for reusing memory space.

Public Functions

TDataPool(uint32 maxPartNum_ = DefaultMaxPartNum, uint32 bufferPartSize_ = DefaultBufferPartSize)

Constructor.

Parameters
  • maxPartNum_: the maximum number of part contained in DataPool.

  • bufferPartsize_: Bytes of each part in DataPool (eg. 1<<22 means 4MB each part)

void Acquire(DataType *&part_)

Acquire an DataType data in DataPool and assign to part_.

Acquired data from availablePartsPool, if there is no available data in availablePartsPool, program will wait.

Parameters
  • part_: the pointer to acquireed space

void Release(const DataType *part_)

Realease data to DataPool.

Realease the data in part_ to AvailablePartsPool and notify

Parameters
  • part_: the pointer to be realesed to DataPool

DataQueue

Buffer

struct rabbit::core::DataChunk

: rabbitio chunk data wapper

Public Members

Buffer data

chunk data

uint64 size

chunk size

DataChunk *next = NULL

list to matain all sequence chunk in one part

Public Static Attributes

const uint64 DefaultBufferSize = 1 << 22

default swap buffer size

class rabbit::core::Buffer

: buffer to store chunk data

Public Functions

byte *Pointer() const

return the pointer of buffer

void Extend(uint64 size_, bool copy_ = false)

resize the buffer