NAME

runWolfPsortHtmlTables - Run WoLF PSORT subcellular localization prediction on input sequences and output relatively detailed html files as output.


SYNOPSIS

runWolfPsortHtmlTables [OPTIONS] organismType outputDir [queryName]

runWolfPsortHtmlTables (--usage|--help|--man)

Pipe sequences in from standard in.


DESCRIPTION

Run WoLF PSORT subcellular localization prediction on input sequences and write results as html to files under the outputDir directory.

Input in fasta form is expected from standard in.

The summary output is written to the file outputDir/htdocs/results/queryName.html. The output looks roughly like

  seq1 details extr_plas: 11.5, plas: 11, extr: 10, E.R.: 4, lyso: 4, pero: 1.5, cyto_pero: 1.5, vacu: 1
  seq2 details extr: 25, lyso: 3, plas: 2, nucl: 1, E.R.: 1
  seq3 details extr: 31, lyso: 1

Tables showing the values of each localization feature for the query and nearest neighbor sequences can be found by following the ``details'' link.

Each line contains a several localization classes with their scores. The localization classes are:

        abbrev.  site                GO cellular component number
        extr extracellular              0005576, 0005618
        cysk cytoskeleton               0005856
        cyto cytosol(sans cytoskeleton) 0005829
        E.R. endoplasmic reticulum      0005783
        golg Golgi apparatus            0005794
        mito mitochondria               0005739
        nucl nucleus                    0005634
        plas plasma membrane            0005886
        pero peroxisome                 0005777
        vacu vacuolar membrane          0005774
        chlo chloroplast                0009507, 0009543
        lyso lysozyme                   0005764

The GO cellular component number is given for here, but most entries in our current dataset are actually based on the Uniprot and depend on that annotation. Localization classes including underscores indicate the possibility of localizing to two sites, for example ``cyto_nucl'' indicates proteins which can localize to both the cytosol and/or the nucleus. No distinction is made between conditional and constitutive dual localization.


OPTIONS

-n, --just-print
Print the commands that should be executed without actually executing them. Mainly useful for debugging. Mnemonic: like make -n

-p, --preserve-temporary-files
Do not remove temporary files that are generated (and normally deleted) during processing.

--no-classical-psort-prediction
Suppress classical PSORT II kNN prediction. Skip the last step of running classical PSORT II, in which standard kNN is used to make a localization prediction. This prediction is redundant to the (also kNN based) main WoLF PSORT prediction. It is also based on less data than the WoLF PSORT prediction. In the future this may become the default behavior.

--no-classical-psort-verbose-output
Skip psort II verbose output entirely.


ARGUMENTS

organismType
Type of the organism. Currently supported organism types are: ``animal'', ``plant'', and ``fungi''. This determines which dataset is used for the prediction. Note that although the results may not be interesting, the software does not care if the organism type matches the actual organism of the protein.

outputDir
Directory in which to write output files. This directory should exists before you run this command.

[queryName]
Sequence name to use for query sequence. Defaults to ``query''


EXAMPLES

runWolfPsortHtmlTables animal outdir < hamster.fasta


FILES

../data/animal.psort
../data/fungi.psort
../data/plant.psort
Dataset sequence data with localization site labels

../data/animal.wolff
../data/fungi.wolff
../data/plant.wolff
Dataset localization feature values

../data/animal.wolfw
../data/fungi.wolfw
../data/plant.wolfw
Feature weights

../data/animal.wolfu
../data/fungi.wolfu
../data/plant.wolfu
Utility matrix. Stipulates the value of predicting a protein of localization class A to to be of class B.

OUTPUT FILES

In this section, scriptDir denotes the directory in which this script resides, seqNo represents the number (e.g. 1, 2, etc.) of the input sequence when the query fasta stream contains multiple sequences

outputDir/htdocs/results/queryName.html
The main html output page.

outputDir/htdocs/results/queryName.PSORTverboseOutput.html
Output of tradional PSORT in verbose mode.

outputDir/htdocs/WoLFPSORTdoc/
Some general WoLF PSORT documentation.

outputDir/htdocs/results/queryName.detailedseqNo.html
Detailed information, including tables showing the value of localization features of the seqNoth query and its most similar proteins in the dataset

outputDir/htdocs/results/queryName.alignmentseqNo.html
Alignment of the seqNoth query and its most similar proteins in the dataset. Similarity is based on localization features, which correlates to but is different that standard sequence similarity.

outputDir/htdocs/results/alignment.queryName.html
Alignment of similar sequences in dataset (if present) based on global sequence similarity. As of this writing I believe just a stub is output because the sequence similarity step was time consuming and seemed to have a bug.

outputDir/htdocs/results/alignment.queryName.html

Temporary Files

scriptDir/tmp/queryName.fasta
Holds input sequence after filtering with checkFastaInput.pl

scriptDir/tmp//query.wolff
Holds localization features computed for the input sequences.


AUTHOR

Paul Horton horton-p AT aist.go.jp


COPYRIGHT

This Script: Copyright (C) 2004-2006, Paul B. Horton & C.J. Collier, All Rights Reserved.

PSORT: Copyright (C) 1997, 2004-2006, Kenta Nakai & Paul B. Horton, All Rights Reserved.


REFERENCE

Paul Horton, Keun-Joon Park, Takeshi Obayashi & Kenta Nakai, ``Protein Subcellular Localization Prediction with WoLF PSORT'', Proceedings of the 4th Annual Asia Pacific Bioinformatics Conference APBC06, Taipei, Taiwan. pp. 39-48, 2006.


SEE ALSO

http://wolfpsort.org/

runWolfPsortHtmlSummary