This webpage makes available a prototype implementation of the e-CCC-Biclustering algorithm coded in Java together with the dataset and examples used in the paper:

Sara C. Madeira and Arlindo L. Oliveira, "A polynomial time biclustering algorithm for finding genes with approximate expression patterns in gene expression time series", Algorithms for Molecular Biology 2009, 4:8 (4 June 2009). [DOI Article Link]

## Synthetic

## Real

- Heat Stress
[.txt]

## Synthetic

- Illustrative example

- Maximal CCC-Biclusters with at least two rows
[.txt]- Maximal 1-CCC-Biclusters with at least three rows/columns
[.txt]- Maximal 1-CCC-Biclusters with at least three rows/columns and errors restricted to the 1-neighborhood of the alphabet {D,N,U}
[.txt]- Illustrative example with missing values

- Maximal 1-CCC-Biclusters with at least three rows/columns when missing values are considered as valid errors
[.txt]- Maximal 1-CCC-Biclusters with at least three rows and two columns when missing values are "jumped over"
[.txt]- Maximal 1-CCC-Biclusters with sign-changes with at least three rows/columns when missing values are "jumped over"
[.txt]## Real

- e-CCC-Biclustering

- Sorted by statistical significance
p-value [.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt]- Biological validation details
- CCC-Biclustering

- Sorted by statistical significance
p-value [.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) [.txt]- Sorted by statistical significance
p-value, filtered statisticalp-values not passing the statistical test at 1% level (after Bonferroni correction) , filtered similarities above 25% [.txt]- Biological validation details

The software available here allows the reproduction of the results in the paper and also the execution of the e-CCC-Biclustering algorithm using a gene expression matrix provided by the user. The gene expression matrix must be a .txt file formatted as in the examples provided below.

The algorithm is
coded in ** Java**.
Before running the examples below please make sure the version of

In order to run
the algorithm copy the ** .jar**
file together with the

If you have any questions please contact Sara C. Madeira.

## Reproduce Results in the Paper

- Heat Stress Data

- CCC-Biclustering: follow the instructions on the CCC-Biclustering webpage.
java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_Heat_Stress.jar

- e-CCC-Biclustering:
[.jar][heat_stress.txt]## Run e-CCC-Biclustering with Other Datasets and Options

- Running standard e-CCC-Biclustering (no extensions)
Copy the following [.jar]file into the directory where you want to run the algorithm together with the .txt file containing your expression matrix. yourExpressionMatrix.txt - name of the .txt file containing your expression matrix

Then type the command below in your command line and replace the 5 parameters with the values of your choice.

java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering.jar yourExpressionMatrix.txt maxErrors rowQuorum columnQuorum overlapping

maxErrors - integer containing the amount of errors allowed, per gene, in the e-CCC-Biclustering algorithm (value of e)

rowQuorum - integer containing the row quorum (minimum number of genes allowed in e-CCC-Biclusters)

columnQuorum - integer containing the column quorum (minimum number of contiguous time points allowed in e-CCC-Biclusters)

overlapping - float in [0,1] containing the maximum percentage of overlapping allowed (all e-CCC-Biclusters overlapping more than this value are filtered)

For example, if you want to use a matrix in the file matrix.txt, use e=1, row quorum = 3, column quorum = 2 and filter e-CCC-Biclusters overllapping more than 25% you should type and execute the following command:

java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering.jar data.txt 1 3 2 0.25

- Running extended e-CCC-Biclustering
Copy the following [.jar]file into the directory where you want to run the algorithm together with the .txt file containing your expression matrix.

Then type the command below in your command line and replace the 8 parameters with the values of your choice.

java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering_Extensions.jar yourExpressionMatrix.txt maxErrors rowQuorum columnQuorum overlapping missings anticorrelation restrictedErrors

yourExpressionMatrix.txt - name of the .txt file containing your expression matrix

maxErrors - integer containing the amount of errors allowed, per gene, in the e-CCC-Biclustering algorithm (value of e)

rowQuorum - integer containing the row quorum (minimum number of genes allowed in e-CCC-Biclusters)

columnQuorum - integer containing the column quorum (minimum number of contiguous time points allowed in e-CCC-Biclusters)

overlapping - float in [0,1] containing the maximum percentage of overlapping allowed (all e-CCC-Biclusters overlapping more than this value are filtered)

missings
- char with three possible values:

R
- remove genes
with missing valuesA - allow missing values as valid errors

J - "jump over" missing values

anticorrelation - char with two possible values:

N - no anticorrelation allowed

Y - anticorrelation alllowed, the algorithm will look for e-CCC-Biclusters with Sign-Changes

restrictedErrors
- char with two possible values:

N - errors are not restricted

N - errors are not restricted

Y - errors are
restricted to the symbols in the 1-neighborhood of the symbols in the
alphabet. Since the alphabet {D,N,U} is used in the predefined
discretization step provided in this version of the prothotype, the
number of neighbors used in the restrited errors extension can only be
equal to 1.

For example, if you want to use a matrix in the file matrix.txt, use e=1, row quorum = 3, column quorum = 2, filter e-CCC-Biclusters overllapping more than 25%, "jump over" missing values, allow anticorrelation and restricted the errors to the 1-neighborhood of the symbols in the alphabet you should type and execute the following command:

java -jar -Xss50M -Xms1024M -Xmx1024M Test_AMB_E_CCC_Biclustering.jar data.txt 1 3 2 0.25 J Y Y

The e-CCC-Biclustering algorithm
and its extended version are integrated in the software BiGGEsTS
(Biclustering Gene Expression Time Series), a free and open source
software tool providing an integrated environment for the biclustering
analysis of time series gene expression data. This software enables a
user-friendly usage of the algorithm in a graphical
environment
together with the possibility to preprocess the data and
postprocess
and analyse the results using several criteria.

Last Update: July 2009