We support open-access science and therefore allow the downloading of full primary screening data sets.
We recommend using GNU Wget for downloading the raw data of the RNAi screening datasets. GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, which runs on Linux computers. It is a non-interactive command-line tool. We give the instructions for downloading the separate screening datasets below, but you can also browse directly to http://www.infectome.ethz.ch/data/ and use your own downloading software.
The data structure for all datasets available for download is as is shown in the figure left and as described below:
LIBRARY The first level describes the siRNA library, i.e. 49K for the 49 kinases tested in triplicate with three independent siRNAs, while DG contains data for the Qiagen druggable-genome RNAi screens.
ASSAY Below the siRNA library directories you find assay directories wich are Virus and HeLa strain abbreviation, such as SV40_CNX which contains data for SV40 infection in HeLa CNX cells.
PLATE Below the assay directories you find the cell plate (CPXXX-YXX) directories, which combine all the data for an individual multi-well plate. Each CP-number represents an individual plate layout, so screens with replicate plates will have multiple times a plate with the same CP-number.
DATA Each cell plate directory contains multiple data directories: JPG contains compressed, stitched and merged jpg image overviews per well and per plate. MATLAB holds the matlab files containing amongst others CellProfiler formatted single-cell measurements, BASICDATA*.mat files describing the Gene Symbol and Gene ID and siRNA number assayed in each well of that plate, and files containing single-cell SVM classification results. PNG contains the original (losslessly compressed) images per channel and site, SEGMENTATION_IMG contains segmentation images per CellProfiler object corresponding to each image site in PNG, and SEGMENTATION_WELL contains merged and relabeled segmentation images per well from a multiwell plate.
49K: 34 RNAi screens, for 17 mammalian viruses with 49 kinases in various HeLa strains
Complete data size: 624GB, in 1,853 directories and 870,332 files. The data contains information for 303 x 96-well plates, with first channel images containing images from the DAPI stain and the second channel images containing the corresponding virus signal.