pISTil: a pipeline for yeast two-hybrid Interaction Sequence Tags identification and analysis

Background High-throughput screening of protein-protein interactions opens new systems biology perspectives for the comprehensive understanding of cell physiology in normal and pathological conditions. In this context, yeast two-hybrid system appears as a promising approach to efficiently reconstruct protein interaction networks at the proteome-wide scale. This protein interaction screening method generates a large amount of raw sequence data, i.e. the ISTs (Interaction Sequence Tags), which urgently need appropriate tools for their systematic and standardised analysis. Findings We develop pISTil, a bioinformatics pipeline combined with a user-friendly web-interface: (i) to establish a standardised system to analyse and to annotate ISTs generated by two-hybrid technologies with high performance and flexibility and (ii) to provide high-quality protein-protein interaction datasets for systems-level approach. This pipeline has been validated on a large dataset comprising more than 11.000 ISTs. As a case study, a detailed analysis of ISTs obtained from yeast two-hybrid screens of Hepatitis C Virus proteins against human cDNA libraries is also provided. Conclusion We have developed pISTil, an open source pipeline made of a collection of several applications governed by a Perl script. The pISTil pipeline is intended to laboratories, with IT-expertise in system administration, scripting and database management, willing to automatically process large amount of ISTs data for accurate reconstruction of protein interaction networks in a systems biology perspective. pISTil is publicly available for download at .


ABOUT THIS DOCUMENTATION
This documentation is intended to inform informatics or bioinformatics users on how to use pISTil. Several formatting conventions are used throughout this documentation: Commands are written in this style.
pISTil output is written in this style.

Names of programs, packages are written in this style.
References to web sites are written in this style. All scripts, programs and applications used are free software; you can redistribute them and/or modify them under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

ABOUT THE LICENCE AGREEMENT
They are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with pISTil; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

I. INTRODUCTION
pISTil (a pipeline for Interaction Sequence Tag identification and analysis) is a collection of scripts and programs -running on both Linux and MacOS X systems -for fast analysis of large yeast two-hybrid sequence datasets. pISTil is composed of (i) a database, (ii) a web interface and (iii) a perl script.
The pISTil perl script takes as input files sequence chromatogram data generated from automated sequencing technology, in either (i) Applied Biosystems INC. (ABI) format or (ii) Standard Chromatogram Format (SCF).
The pISTil package provides a combination of functionalities that allow: • to convert trace files to bases and quality indices by using Phred software • to analyse chromatograms with different Phred parameters and/or BlastX protein sequence databases • to automatically carry out sequence alignments and store aligned sequences • to store results from all analysis in a relational database • to apply different search criteria, such as the frequency of interaction, the number of distinct interactors etc… and different filters (E-value, identity, frame) • to export lists of interaction in different file formats (Excel, PSI-MI: Proteomics Standards Initiative -Molecular Interactions) The pISTil distribution includes, as a case study, the HCV (Hepatitis C Virus) dataset produced by the IMAP team (Infection MAPping) that you can be used with the tutorial described in section IV.3.
Note: pISTil was developed to analyse large datasets of cDNA sequences produced by highthroughput yeast two hybrid screens. However, it can be extended to other applications dedicated to protein-protein interaction identification, like MAPPIT (MAmmalian Protein-Protein Interaction Trap), LUMIER ( luminescence-based mammalian interactome mapping) or PCA (protein complementation assay) by modifying the open source code available at http://sourceforge.net/projects/pistil .

II. REQUIREMENTS
We have tested the software on MacOS X 10.5.X and Linux, and would recommend the following system specifications: • Operating Systems: -Mac OS X 10.4.x or higher.
-Linux Fedora 2.6.18-1.2798.fc6 or equivalent • Server Specifications: -1.5 GB of hard drive space -1 GB of RAM or better pISTil is distributed as a source code for Linux and Macintosh OS X systems. It runs on top of several software packages. These must be installed and configured before you can run pISTil.
You can access to this requirements list on this page:

1.
PostgreSQL --http://www.postgresql.org PostgreSQL is a powerful, open source relational database system to store various pieces of information: sequences, annotation, alignments, etc. A relational database is an ideal way to store large datasets as it allows very fast storing and retrieval information. To run pISTil, you must be able to create and access a PostgreSQL database. A diagram of the pISTil database structure is included at the end of this document (See Annex 1).

2.
Apache Web Server --http://www.apache.org The Apache web server is the industry standard open source web server for Unix and Windows systems. For Macintosh OS system, MAMP can be used.

4.
Perl --http://www.cpan.org Perl is a high-level programming language and CPAN is the Comprehensive Perl Archive Network, a large collection of Perl software and documentation.
The Perl interpreter is usually present on most Unix distributions. Type perl -v at the command line to find which version of Perl is available on your system (version 5.8.8 or higher is preferred).
Note: If Perl is not installed under /usr/bin/perl, either make a soft link at the location where Perl is installed. Alternatively, you can modify the first line of all Perl scripts in the pISTil directory so that they point to the correct location.

5.
Standard Perl modules --http://www.cpan.org The following Perl modules can be found on the CPAN and must be installed for pISTil to work: • CGI • DBI • Carp • Text::Wrap • Math::BigFloat 6. Bioperl version 1.5.2 or higher --http://www.bioperl.org BioPerl is a collection of Perl modules devoted to bioinformatics. It is not usually installed on Unix systems and has to be installed separately. You can find out if it is installed by running perl -MBIO::Perl -e '1' from a terminal window. If it doesn't return an error, then BioPerl is installed.

7.
NCBI BLAST Toolkit --ftp://ftp.ncbi.nih.gov/blast/executables/release/ BLAST (Basic Local Alignment Search Tool) is used to search in a formatted database for sequences that show similarities to a query sequence. Within pISTil, it is used to identify sequences that show significant similarities to a well-annotated protein, and thereby to putatively assign protein accession number to each IST (Interaction Sequence Tag). Two binaries are required, blastall (which carries out the search) and formatdb (which prepares a database for searching).

8.
Staden package --http://staden.sourceforge.net pISTil uses Pregap4, a Staden package program, to prepare sequence chromatogram data for analysis. pISTil has been tested with rel-1-6-0 release of Staden package. Install the package as described in the accompanying documentation. Make sure: • to include the directory where the Staden binaries reside in your path.
• to set the STADENROOT environment variable.
• to source the appropriate Staden script as described in the Staden documentation.
For pISTil, you have to set the 'STADLIB' environment variable. If you use sh, or variants such as bash, and install Staden package in /usr/local/staden , set 'STADLIB' with the commands: >STADLIB=/usr/local/staden/lib >export STADLIB Note: pISTil uses its own Pregap4 configuration file 'pregap4_pistil.config' provided in the pISTil directory. All settings can be changed to specify their own parameters.

9.
Phred software -http://www.phrap.org/phredphrapconsed.html The Phred software reads DNA sequencing trace files, calls bases and assigns a quality value to each called base.
pISTil has been tested for the 0.020425.c version of Phred.
Install Phred as described in the INSTALL file that comes with the Phred software. Make sure to set 'PHRED_PARAMETER_FILE' environment variable correctly. It should point to the phredpar.dat Phred parameter file that comes with Phred.

10.
JDK --http://www.sun.com To view trace files on the web, the pISTil interface uses BMC TraceViewer (available from Baylor College of Medicine: http://www.hgsc.bcm.tmc.edu/downloads/software/trace_viewer/index.html), a Java applet that allows you to see DNA sequencing traces. The BMC TraceViewer source files are included in the pISTil source code. You just have to check that the JDK is installed.

11.
csh shell A shell is a program which provides a user interface. With a shell, users can type in commands and run programs on a Unix system. The C shell was written by Bill Joy at the University of California at Berkeley. Check if you have the C shell in your Unix system or install it.

1.
Downloading and unzipping pISTil The home page of the pISTil project is available on the Sourceforge at http://sourceforge.net/projects/pistil.
To download the pISTil sources, click the Download link.
The download of the last release of pISTil will start. You can also browse pISTil releases by clicking on the "Files" link: Note: -You don't need to create a Sourceforge account to download pISTil.
Unzip and move the pISTil directory to a subdirectory in your main web directory: -For MAMP users, the standard web directory is /Applications/MAMP/htdocs. -For Linux users, the standard web directory varies, but generally takes the form of /var/www/html.

2.
Creating the pISTil database: pISTil uses a single database with 16 tables. The "create_database.csh" script in the pISTil/db folder creates automatically the database.
You must use a PostgreSQL account, which has all privileges. If you don't have it, use the following command in your shell to create the pISTil user 'IST_user' with password 'istdb': >createuser IST_user -d -l -W -P -At the questions: -You can answer no 'n'.
-You can answer no 'n'.
Note: Depending on your work environment, the password can be requested at the beginning. Now you can launch the csh script in the pISTil/db directory to create the pISTil database. 'create_database.csh' needs two arguments: the first one is the name of the database (ex: 'pistil'), the second one is the user of the database (ex: 'ist_user'). To execute the csh script go in the pISTil/db directory and launch the following command: >csh create_database.csh pistil ist_user Note: -In the example below, we use 'pistil' for the name of the database, and 'ist_user' for the user name. However you can use the database and user names you want.
-This script will try to drop the database given in argument before starting to create it. Now you have the pISTil database installed with, by defaults, some data used for the analysis of the HCV dataset in 5 tables (see section IV.3).
-For more information about the pISTil tables, please see Annex 2.

3.
Setting up the pISTil configuration file pISTil uses a central configuration file named "config_analyse.pm" that contains variables and settings that can be customized. It is localized in the pISTil root directory.
• You must configure each variable before using it: -dbname: name of the database you created for pISTil.
-dbhost: name of the PostgreSQL server.
-dbuser: user that has access privileges for the pISTil database.
-dbpass: password for that user.
>Shall the new role be a superuser? (y/n) >Shall the new role be allowed to create more new roles? (y/n) >Password: -path_to_pregap_config: location of the pregap config file used by pISTil.
-temp_dir: some of the scripts need some scratch space. pISTil will create this temporary directory in the pistil root directory. -regex_location: regular expression for pulling out the well location.
-save_BLASTN: yes ('y') or no ('n') for saving or not BLASTN results in a file.
-save_BLASTX: yes ('y') or no ('n') for saving or not BLASTX results in a file.
-log_file: yes ('y') or no ('n') for keeping or not a log file.
Note: To see how to configure the "config_analyse.pm" file for the HCV datasets analysis, please see Annex 3.
• About regular expression: A regular expression ( also "regex" ) is a string that is used to describe or match a set of strings according to certain syntax rules. You must specify two regular expressions to define the plate name and the well location compared to the name of traces. If you are not familiar with regex rules, you can find a short help in the configuration file. Example with this trace name: HCV15_1_96 -A01-Y2H_AD-9 If we 'translate' this name in regex form: Name: HCV15_1_96 -A 0 1 -Y2H_AD-9 Regex: ^\w+ \-\w \d \d .* We define the plate name like 'HCV15_1_96'. To match it, we use '()': Name: HCV15_1_96 -A 0 1 -Y2H_AD-9 Regex: ^(\w+) \-\w \d \d .* The well location is 'A01': Name: HCV15_1_96 -A 0 1 -Y2H_AD-9 Regex: ^\w+ \-(\w \d \d) .* Note: Your trace file names must be similar in one plate to work with one regex. Indeed if you have one chromatogram file like 'HCV15_1_96-A01-Y2H_AD-9' and the second one 'HCV15_1_96_A02-Y2H_AD-9', it will not work with the regex '^(\w+)\-\w\d\d.*'. So you have two options: change the name of the trace file or find a regex that works with both, like '^(\w+)[\-_]\w\d\d.*'.
• About Phred processing options: Phred can automatically remove low-quality base calls from the start and the end of DNA sequences, a process called "trimming" or "clipping". When generating trimmed output files, you will loose bases at the start and the end of sequences, so trimming should be used with care. If you plan to generate trimmed sequences, you may first want to experiment different cutoff scores to see which setting works better for you. (See Annex 9).

4.
Downloading and creating the BLAST databases: pISTil relies on protein sequence databases to analyse the screening data. You have to use a sequence database referenced in the PSI-MI 2.5 ontology (see Annex 6). Each database has its repository in the pISTil/localdb directory.
For instance you can download NCBI and ENSEMBL flat files from: -NCBI: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/ for GenBank database.
Move the downloaded file in the fasta format to pISTil/localdb/ddbj-embl-genbank/ for the GenBank database or to pISTil/localdb/ensembl/ for the Ensembl database.
You must then use this file to construct the index for the BLAST database by using the 'formatdb' program from NCBI. In the following example, formatdb is used to construct the BLAST database called 'Homo_sapiens.NCBI36.50.pep.all' from the fasta file 'Homo_sapiens.NCBI36.50.pep.all.fa' containing multiple proteic sequences: In the directory pISTil/localdb/ensembl/ type: Note: -Download and create the database may take several minutes depending both on your internet connection and your processor speed -If you want use your own database which is not referenced by PSI-MI (see Annex 6), move your fasta file into pISTil/localdb/other/

5.
Creating the pattern BLAST database pISTil relies on BLASTN to accurately locate the beginning of cDNA insert by making use of a database of vector construct sequences (see Annex 11). Thus, according to the cDNA library screened, pISTil will align the vector sequence before the cDNA and thus will retained only cDNA sequence for protein assignation. Accurate localization of the vector construct is also crucial to characterise cDNA that were encoded "in-frame" into the twohybrid system (or other systems, according to the fusion protein).
To insert library and vector data into the database, you have to use the pISTil interface (see section III.8).

6.
Configuring "the bait parameter file" The file 'define_bait' is located by default in the pISTil root directory. This file is used to identify baits present in each of the 96 wells of a plate.
To configure this file for the pISTil software you must give: the first then the last well where one bait is present, the product of this bait and optionally its database accession number and its PSI-MI database identifier. The values are separated by tabulations. In this example, in A01, the bait is NS3, and from A02 to A04 the bait is NS4, both from Hepatitis C virus (taxon=11103). The GenBank accession for these both bait products is CAB466677, a polyprotein. The PSIMI database identifier for GenBank is 0475. 'Bait proteinid', 'PSIMI database id' are required if you are going to export protein-protein interaction lists to PSI-MI format. 'Bait proteinid' is the identifier of the bait according to the database described in the following field. 'PSIMI database id' is the PSI-MI identifier for this database (See Annex 6 to choose the right identifier). If you use a personal database to identify your bait, interactions involving this bait won't be exportable in PSI-MI format.
If you have several plates for a single project, you can analyse all traces at once. However you must configure the bait parameter file by specifying the plate name before description of the plate content. Example: In this example, pISTil will analyse two plates, 'HCV15_1_96' with NS3 in all wells, and 'MARIE1' with NS4 in all wells.
Note: -Don't forget to write '--' before the plate name.
-The plate name must be identical to the one extracted from the regex (section III.3) -Don't change the configuration file format to identify baits.

7.
Setting up the pISTil interface The pISTil web interface (ex: http://localhost/pISTil/www) provides a powerful and userfriendly way to query and to navigate throughout the pISTil results.
First, you need to fill up a configuration file named 'config_www.inc' in the pISTil/www/inc directory. This file contains many variables and settings that can be customized: -$HOST_NAME: name of the PostgreSQL server. -$DATABASE_NAME: name of the database you created for pISTil.
-$DATABASE_USER: user that has all access privileges for the pISTil database. -$DATABASE_PASSWORD: password for that user.
-$LOCAL_DIR: location of the pISTil directory which contains all the data and the scripts for the interface. -$FORMATDB_EXEC: absolute path to formatdb to use when formatting the blast pattern database. Type which formatdb in your terminal to know its path.
Note: To see how to configure the 'config_www.inc' file for the HCV datasets, please see Annex 5.

Edit library and vector data
To insert or remove library or vector data in the pISTil database, use the pISTil web interface.
• In the pISTil home page, select "Library screening" from the "Information" dropdown menu. This page shows you all vectors and libraries already inserted in the database.
• When you want to insert a new library you need to specify a vector. So you must first insert a vector if it's not already in the database.
• To insert a vector, fill out the vector form, and click the button insert.
Note: When you insert a new vector, the pISTil interface will automatically format the pattern database.
After that, the new vector will appear in the vector field of the library form.
• To insert a library, fill out the library form, and click the button insert.
• When you want to remove a vector or a library, select it and click to the remove selected vector or library button. Note than if you delete a vector, the database server also deletes any libraries associated with that vector.

1.
Quick start: Running pISTil is very simple once the configuration files have been set on.
The default command in you shell is: Input zip file containing all the traces from one or more plates of the same project.
Note: The zip file is one of the archive file in pISTil/dataset directory.

2.
Running with your own bait parameter file: If you have more than one configure file to define the baits or if you change its name 'define_bait', run pISTil with a second argument.

3.
Example with the two HCV datasets: In this example, we analyse two datasets from I-MAP team experiments (de Chassey B, Navratil V, Tafforeau L et al., Hepatitis C Virus infection protein network. Molecular Systems Biology 4:230, 2008).
These two datasets are distributed with pISTil and already in the pISTil/dataset directory. HCV.zip contains 96 trace files from yeast two-hybrid screening against a Homo sapiens spleen library. HCV2.zip contains 96 traces from two hybrid screening against a Homo sapiens fetal brain library.
We consider that the pISTil database has already been created as described in section III.2, using 'pistil' as database name, 'ist_user' as PostgreSQL user and 'istdb' as password. Please adapt the corresponding variables in the "config_analyse.pm" and "config_www.inc" files if you have used other parameters. Now we have to format this file to construct the index for the BLAST database by using formatdb program from NCBI.
In the directory pISTil/localdb/refseq/ execute this command:

> formatdb -p T -i ./human.protein.faa -o -n refseq_human_prot
Please ensure to correctly: • configure the config_analyse.pm file (see Annex 3) localized in the pISTil directory • configure the config_www.inc file (see Annex 5) localized in the pISTil/www/inc directory For the demo, library and vector data were already integrated into the pISTil database, so you don't have to insert them for this example. Hence, in the library and vector page in the web interface, you can see these vector data: And these library data: Let's start the first analysis with HCV.zip. Write '0' to create a new project.
Choose a project name and a description: You must select the appropriate library for the analysis. This first dataset comes from a screen against the Homo sapiens spleen library, identified by '1'.
pISTil analyses all traces files and tests you regex. If it's correct, write 'y' for yes: Project identifier= 0 You decide to create a new project: Project name: Project name: Hepatitis C virus Project description: Screening from the IMAP team All the data needed to create this new project is now recorded  At the end of the pISTil pipeline, you have the choice to insert automatically all results in the pISTil database, or to do it manually using sql files generated during the analysis.
At this step, we have analysed the first dataset. Now we have to change two parameters before starting with the second one, named 'HCV2.zip'. First we must be sure that all regex in the "config_analyse.pm" file are correct according to trace file names. Here, regex are the same than for the first analysis. Secondly, we must change the "define_bait" file and configure it according to the criteria of the second plate.
Here are the lines for the "define_bait" file: We select the appropriate library, identifier '2': pISTil asks if your regex is correct: pISTil analyses your traces and identifies ISTs.
We insert all information in the pISTil database.   Note: A summary of the pISTil analysis results for the complete HCV dataset is given in Annex 10.

4.
Miscellaneous Running 'perl ist_analyse.pl' without argument will display pISTil error: "Must give a zip file name localized in dataset directory".
Running 'perl ist-analyse.pl --fasta' or 'perl ist_analyse.pl -f ' option allows the use of ASCII fasta sequence files instead of chromatogram files. The method of analysis remains the same, without Phred extraction and quality analysis.

V. pISTil WEB INTERFACE
After pipeline processing of the chromatogram dataset and data insertion into the pISTil database, open your web browser and go to the web folder in which pISTil is located, for example http://localhost/pISTil/www/.
You should see a welcome page with some global statistics about all analyses run by pISTil and a menu to navigate throughout results:

1.
Viewing projects Once projects have been added to the database, they can be browsed using the web menu. A project includes one or more plates of DNA sequences, which have been analysed by pISTil software to identify interactors.
To see all projects inserted in the pISTil database, use the menu and click on the "Projects" tab.
By checking the remove radio button and clicking to the delete button, you can remove a project and all associated information. A confirmation page will appear: By clicking on a project name, you can access detailed information on the current project including the plates that have been added to this project.
By clicking on an analysis link, which corresponds to the number of analysis done for this plate, you can access plate analysis information.
If your plate has been analysed only once: If you have analysed a plate more than one time, here for example the plate "MARIE1" was analysed with two different BLASTX databases and different Phred parameters: By clicking on the green arrow, you can access plate information.

2.
Viewing plates You can access plate information using the "Plates" tab from the menu or by clicking on a plate name from a project information page, described below.
If you click on the name of the project, you will be brought to the project information page.
If you click on the name of the plate, you will be brought to the plate information page, which shows you more detailed information about each well on the plate.
If you have analysed a plate more than one time, for example with another BLASTX database, you must choose one analysis before seeing all plate information: Check one of the analysis and click to the "Select analysis" button.
The top table lists general information about the plate and the analysis done by the pISTil software. By using the filter table, users can choose a combination of filters to generate different lists. After searching and eventually filtering interactions, you can export the resulting table to tab-delimited format for Excel (or a text editor) by clicking on the "export to tab-delimited format" link (please save the file first, before opening). You can also export the list of chosen interactions to MIMIx PSI-MI format (see section V.5).
The second table lists all of the wells along with their analysis results. Bait and Protein columns include direct links towards public databases according to the 'define_bait' configuration file for baits and the Blast databank used for the IST identification for preys.
You can see the IST sequence corresponding to a well by clicking on the corresponding "View" link.
General format for the FASTA sequence header: > HCV15_1_96-A01-Y2H_AD-9; Phred base calling with trim cutoff=0.05; 662 bp Trace file name Phred analysis Length • If you click on one of the "good quality length" link, you will see the corresponding quality page: This HTML page shows Fasta and colour-coded sequence with quality values assigned by Phred. During quality analysis, Pregap4 calculates the average confidence level for a sliding window. The low quality regions (at the start and end of the sequence) are in red.
Note: to compare Phred fasta extraction with or without base calling, see Annex 9.
If you click on one of the PSI-MI interaction detection method, you will see the corresponding method page description: If you click on one of the location link, you will see the corresponding protein-protein interaction (ppi) page (see section III.3).
If you click on "Sorted by distinct ISTs" link, you can sort interactions by the frequency of observation and as before you can apply multiple filters.

Viewing protein-protein interaction (ppi)
The ppi page lists all information concerning a specific well: • Project and plate information: This part shows you the project name, the project description, the plate name for the interaction, and the analysis date.

• Bait information:
Here you find the bait name, well location in the plate and occasionally its protein accession number and PSI-MI database identifier. If you click on the PSI-MI link you will be redirected to the PSI-MI databases page (see section V.6).

• Trace information:
The nucleic sequence is the trace sequence extracted by Pregap4, which has calculated the start and the end location for the good quality sequence.
If you click on one of the "View quality file" link, you will see the corresponding quality page.
If you click on the 'Visualize' link, you see the chromatogram using the Trace Viewer applet: If you click on the 'Download' link, you download the trace (in SCF format) on your computer.

• Phred analysis:
The Phred sequence is the nucleic sequence used by BLASTX to identify IST. This sequence depends on the Phred parameter. So if you analyse this trace with two different Phred parameters, you can obtained two different IST sequences.

• Pattern information:
This table contains all information about the pattern search, according to the vector used in the library. To have an explanation about the "correction" term, please see Annex 11.

• Blast information:
This last part of the page gives all BLASTX result information. The minimum information about the IST is the protein hit accession number, corresponding to the database used during the analysis. In this case, 94% of query sequence aligned was found identical to the protein NP_057698. This hit is not in frame with the GAL4-AD pattern (Frame=1).

4.
Search page Once you have analysed a number of ISTs, it can become difficult to find individual interactor, bait or a special interaction. The pISTil web interface proposes thus a search page which is accessible via the "Search" tab in the menu.
You can query interactions found by pISTil according to: -a specific bait: select one bait under the bait drop-down menu.
-a specific prey: specify a protein accession number.
-a short description of a prey.
Alternatively you can filter the result if you select a project using the project drop-down field.
• Example 1: here we search for all interactions of the bait NS2 in the HCV project: After selecting the correct bait and the HCV project, click on the 'Search' button to see the results: We find 14 records. We can filter the results using the filter table by: -BLAST values: identity and/or frame and/or e-value -BLAST database -Phred base calling • Example 2: we want all interactions in frame with the GAL4-AD pattern and with at least 80% identity and an e-value inferior or equal to 1.E-40. So we used the filter table and we click on the filter button after completing the fields as search criteria.
After searching and eventually filtering interactions, you can export the resulting table to tab-delimited format for Excel (or a text editor) by clicking on the "export to tab-delimited format" link (please save the file first, before opening): You can also export the list of chosen interactions to MIMIx PSI-MI format (see section V.5).
At last, you can sort the interactions by the number of time they were found (click on the "Sorted by number of interactions" link) as presented in this screenshot: Like the precedent search, you can eventually filter interactions and export the displayed table.
The column 'Number of IST(s) ' represents the number of IST found for a given proteinprotein interaction, i.e. for a given bait and prey protein. If you click on it, you will see the interaction domain: The first part of this page is a graphic representation of all ISTs supporting the interaction. We represent in blue the minimal interaction domain (MID), in green the protein and in red ISTs. The second part is a table with all information about IST alignments.

PSI-MI export
MIMIx is the minimum of information required for reporting a molecular interaction experiment, building on the PSI-MI XML v2.5 interchange standard format. You could then thus describe your experimental protein interaction data in a journal article, display it on a website or drop it directly into a public database. The link "export to PSI-MI MIMIx format" leads you to a form, where you have to enter some administrative and experimental informations. The validity of the created file depends on the way you fill in the form.
Please note moreover that : -Only distinct interactions are considered. By clicking on the 'see" link you can visualize the XML file in your browser.

Information
To see current BLAST databases or to add vectors or libraries in the database, use the "Information" tab in the menu.
If you have already analysed sequences, click on the "Databases" drop down menu, you will see which BLAST databases are used: Before launching an analysis you must insert in the pISTil database vector and library information. Click on the "Library and vector" tab from the "Information" drop down menu (See section III.8 to learn how to insert vector and library data).
If you want to know all information about the PSI-MI databases, click on the "PSI-MI databases" tab from the "Information" drop down menu.

VI. pISTil PROGRAM FLOW
-pISTil shows all libraries in the pISTil database. • After the first run with the HCV dataset, you will generate: tmp -Contains temporary files www/data/1/ -'1' corresponds to the project identifier in the pISTil database www/data/1/outfile_blast_dir/ -Contains all Blast result files (depends on your pISTil configuration) www/data/1/phred_scf_dir/ -Contains all scf trace files www/data/1/qual_dir/ -Contains all html quality files www/data/1/sql_dir/ -Contains all sql files www/data/1/pISTil.log -pISTil log (depends on your pISTil configuration) • After a second run on the same project, you have more: www/data/1/sql_dir_X -Contains all sql files created in X date time.

VIII. BUGS AND PROBLEMS
Some crash can occur when you run pISTil. Errors may be due to incorrectly configured programs required.

Environment variable STADLIB
If the run stops prematurely, displaying the message: Then you need to define the 'STADLIB' environment variable. Please follow instructions in II.8.

Stash not found
If the run stops prematurely with the message : Then you must be sure that you have define in your environment variable: LD_LIBRARY_PATH, TCL_LIBRARY, TK_LIBRARY and $STADENROOT/staden.profil (please see Staden Instructions for more details).
Please report pISTil problems and bugs to johann.pellet@inserm.fr bait: Stores bait information and location in its plate. By reference, the bait protein corresponds to the investigator protein of interest and is fused to the DNA Binding Domain (BD) of the transcription factor Gal4 (Gal4-BD). It is assayed against a cDNA library encoding proteins fused to the Activation Domain (AD) of the transcription factor Gal4 (Gal4-AD), that are referenced as prey-proteins. To identify the bait (protein identifier and name), pISTil uses the 'define_bait' file. Each bait can interact with one and only one prey during an analysis.
blast: Stores BLAST results. Each analysed sequence is aligned with BLASTX (only in the three positive frames) against a protein database to identify an IST.
database: Stores database information. Preys are identified using BLASTX against a protein database.

library: Stores libraries information.
A library consists of a collection of protein-encoding sequences that represent all the proteins expressed in a particular organism, tissue and/or cellular type.
method: Stores method information. All information in this table come from PSI-MI 2.5 methods information. midb: Stores PSI-MI database information.
Database collecting nucleic or amino acid sequence mainly derived from genomic sequence.

pattern: Stores pattern information.
During the trace analysis the first step consists of looking for a sequence corresponding to the last nucleotids of Gal4-AD in the trace sequence (by BLASTN alignment), which is defined as the pattern.
plate: Stores information about plates. A traditional two-hybrid plate contains 96 wells, so one plate can contain between 1 and 96 bait(s).
ppi: Stores physical interaction between the bait and the prey. It corresponds to a protein-protein interaction (ppi) between a given bait protein (fused to Gal4-BD) and a prey protein (fused to Gal4-AD). If bait and prey interact, the two functional domains of Gal4 are brought closer, leading to the expression of a reporter gene in the yeast two hybrid system.

prey: Stores prey information.
The prey protein is fused to the activation domain (AD) of the transcription factor (Gal4-AD). It can either be a known protein in the case of a yeast two-hybrid assay, in order to test by a priori the interaction between two known proteins. It can also be an unknown protein, encoded by a cDNA of a yeast two-hybrid library.
project: Stores generic project information.
A project includes the analysis of one or several plate(s).
quality: Stores quality information sequence. Each trace is analysed to define the sequence quality.
reference: Stores the bibliographic reference of a method.
trace: Stores trace information.
pISTil tries to identify each prey thanks to traces. These traces come from the sequencing of the cDNA encoding the prey protein fused to Gal4-AD (obtained by a PCR on positive yeast colonies of the yeast two-hybrid screen). pISTil uses Extract_seq (Pregap4 module) to extract the sequence component from traces and experiment files.
vector: Stores vectors information. cDNA libraries are cloned into a yeast two-hybrid vector, allowing the expression of a prey protein fused to Gal4-AD. The resulting vectors, thus composed of a library vectors, are transformed in yeast in order to be screened by the two-hybrid method.