ImprintPocket: a program of generation of binding site negative images

Zheng Ouyang

 

The main aim of this study is to develop and implement ¡°ImprintPocket¡±, an algorithm for the rapid, accurate and automated identification negative image (pseudoligand) of binding pocket in protein structures for structure-based virtual screening. The program to perform a negative image search is an extension of CASTp, a suite of programs for locating pockets and cavities in protein crystal structures and quantifying their size, which developed by Jie Liang et al. . The method is a computational geometry treatment of complex shapes, based on alpha shape and discrete flow theory. ImprintPocket primarily developed for proteins, but the algorithm is sufficiently robust to allow the analysis of any molecular system, including nucleic acids or inorganic material. Calculations can be performed using discrete structures from crystallographic analysis and NMR experiments as well as with trajectories from molecular dynamics simulations. It has be integrated into a new structure-based pharmacophore search method (called Shape4). ImprintPocket is freely available from the web site http://sts.bioengr.uic.edu/pni/.

1.2  Algorithm

Based on alpha-shape and discrete-flow methods [1-3], ImprintPocket creates pocket negative images follow seven steps, (1) identifies the atoms forming pockets, (2) computes volume and area of pockets, (3) identifies atoms forming ¡°rims¡± of the pocket mouth(s), (4) computes the number of mouth openings for each pocket, (5) computes the area and circumference of mouth openings, and (6) locates cavities and measures their size. (7) builds pocket negative image from pocket tetrahedrons. The pocket prediction process specifies the calculation of the socalled "dual complex" (or alpha shape) and is summed up for a simplified two-dimensional depiction of binding site atoms. The procedure includes the calculation of the Voronoi diagram which consists of Voronoi cells. Each Voronoi cell contains one protein atom and controls all spatial points that are closest to the respectively considered atom. The Voronoi diagram is mathematically equivalent to the Delaunay triangulation of the complex hull drawn around the protein atom centers. To obtain the dual complex, Voronoi edges and vertices are disregarded in the triangulation, if they are situated completely or in part outside of the molecule. A triangle with one or more omitted edges is denoted as "empty". Neighboring empty triangles are combined in the "disrete-flow" method to outline continuous voids in the protein surface. In the course of this process an obtuse empty triangle flows to its neighboring triangle, whereas acute empty triangles act as sinks to collect the flow of neighboring triangles

Methods for extracting binding site negative image have been developed and used in various docking programs. For example, DOCK16 uses a process that places spheres into the binding site randomly followed by clustering analysis to identify the best set of spheres to represent the shape of the binding site. Surflex24 uses a different approach to generate so-called ¡°prototype molecules¡± to represent the binding site shape and other information. Our approach is unique in that it uses alpha-shape to deterministically detect the binding site atoms, followed by a geometric casting algorithm to generate the negative image as a collection of spheres. Here, we define a pocket negative image as a set of circumscribed spheres derived from the discrete set of Delaunay tetrahedra and triangles for a pocket. For a tetrahedron abcd, there is a unique point z, the orthogonal center, that has the same power distance to the four atom centers at points a, b, c, and d.

Figure 1:  2D representation of orthogonal center. C is orthogonal center;

r is power distance; V0 , V1 , V2 are atom 1,2,3 respectively.

 

The center tangent spheres are used as circumscribed spheres for pocket triangles. Overlap checking is performed to prevent circumscribed spheres from overlapping with other pocket atoms. We also set a threshold to remove tiny spheres. The remaining circumscribed spheres thus make up the negative image of the ligand pocket. According the pocket size, certain number of top largest negative images will be output in PDB format. Each negative image is composed of points and their radius.

 

1.3  Implementation

The method was implemented in C++ on both Linux and Windows. The program uses the publicly-licensed software newmat for matrix calculation. The core functions of CASTp are integrated in this program. Based on the testing results of 50 protein structures, the calculation time for most structure is less than 2 minutes. A web server (http://sts.bioengr.uic.edu/pni/) was setup for users upload target structure for negative image computing, and the result files will be packed and emailed back or use can download it online. It allows multiple structures uploading as long as user compresses them into one file. A stand alone negative image viewer is available for Windows uses. With a user-friendly GUI, user can visualize the output easily. The program performance was tested on Pentium IV 1.6 GHz machine with 1 GB RAM running on Windows XP Professional operating system and AMD Sempron(tm) machine with RedHat 8.0.

 

1.4  Conclusions

A new algorithm for the identification of negative images of surface pocket in large molecular systems was developed and implemented in the ImprintPocket program, which is available within the public domain. Under this method, the negative image of the ligand-binding pocket of the target protein is generated using virtual atoms, and with the expectation that it could represent an optimal ligand of the target protein. The algorithm automatically explores all kinds of surface cavities. Rigorous shape representation and adjustable details of level provide user an accurate and customizable tool for various structure-based researches. The user needs only to provide the molecular PDB file and probe radii to enable the analysis of any molecular system. The algorithm is sufficiently rapid and robust for the routine analysis of large number of structures analysis. Here is an example of p38 MAP kinase and RO3201195, an orally bioavailable and highly selective inhibitor of p38 which was selected for advancement into Phase I clinical trials. The binding pocket and the negative image are illustrated in Figure 39, Figure 40.   

Figure 2: Mitogen-activated protein kinase (2gfs) and inhibitor

 

  

 

Figure 3:  Mitogen-activated protein kinase (2gfs) pocket and negative image.

Blue spheres are pocket atoms and red spheres are negative image atoms.

 

ImprintPocket has been used to develop a new virtual screening tool - Shape4. Negative image is further represented as Gaussian functions using the Shape toolkit (from OE Scientific) and fast shape overlays between the negative image and database molecules can be performed. Such an implementation can capture the intricate details of a binding site shape, the effect of which during virtual screening experiments has been demonstrated by five test cases in this work. Shape4 performed very well in all virtual screening experiments and in some cases significantly better than other virtual screening methods (ROCS and FRED) studied in this work. Shape4 offers a fast, effective and intuitive virtual screening alternative in cases where the X-ray crystal structure of the target is known without performing computationally more expensive docking calculations.


 

 

1.         Tropsha, A. and H. Edelsbrunner, Biogeometry: applications of computational geometry to molecular structure. Pac Symp Biocomput, 2005: p. 1-3.

2.         Natarajan, V. and H. Edelsbrunner, Simplification of three-dimensional density maps. IEEE Trans Vis Comput Graph, 2004. 10(5): p. 587-597.

3.         Edelsbrunner, H. and P. Koehl, The weighted-volume derivative of a space-filling diagram. Proc Natl Acad Sci U S A, 2003. 100(5): p. 2203-2208.