Welcome to PDBbind-CN database

Download this statement

In the past, we have received a number of queries about the status of the PDBbind core set. We also noticed that there are some confusions in literature regarding the naming convention of the CASF benchmark developed by our group. Here, we would like to make a formal statement about the PDBbind core set and the CASF benchmark in a hope to answer those queries and also clarify the confusion.

Our group has a long-standing interesting in scoring function development. The PDBbind database is a notable outcome along the path (Liu et al., Acc. Chem. Res. 2017, 50, 302-309). The PDBbind database is now updated on an annual basis, and each release of PDBbind is named after the release year, such as PDBbind v.2016, PDBbind v.2017, and so on. The PDBbind database collects experimentally measured binding affinity data for four types of molecular complexes, i.e. protein-ligand complexes, nucleic acid-ligand complexes, protein-protein complexes, protein-nucleic acid complexes. Among them, we have named the collection of protein-ligand complexes as the "general set". We put a focus on this data set because it is most relevant to drug design and discovery studies. Apparently, not every entry in the general set is suitable for calibrating or validating docking/scoring methods due to misc problems in 3D structure, binding data, and other aspects. Therefore, we have selected the relatively "healthy" entries from the general set to compile the so-called "refined set". The refined set serves as a generally acceptable data set for docking/scoring studies. Other researchers may apply the refined set directly to their studies, or use the refined set as the starting point to compile data sets with their own focus. Both the general set and the refined set are updated with the PDBbind database on an annual basis. They should be correctly cited as, for example, "the PDBbind general set v.2016", "the PDBbind refined set v.2017", and so on.

As another part of our efforts, we have established the CASF benchmark (Comparative Assessment of Scoring Functions), which aims at providing an objective platform for assessing scoring functions. The first published work was CASF-2007 (Cheng et al., J. Chem. Inf. Model. 2009, 49, 1079-1093). Another major update, i.e. CASF-2013, was published a few years later (Li et al., J. Chem. Inf. Model. 2014, 54, 1700-1716; J. Chem. Inf. Model. 2014, 54, 1717-1736). The CASF benchmark employs a high-quality set of protein-ligand complexes as the primary test set. This data set is selected from the PDBbind refined set through a systematic, non-redundant sampling procedure, which is named as the PDBbind "core set" by us. Accordingly, each public release of the CASF benchmark is named after the version of the PDBbind database from which the test set is selected. For example, the test set in CASF-2007 was compiled based on PDBbind v.2007, the test set in CASF-2013 was compiled based on PDBbind v.2013, and so on. It is not a good idea to name each CASF benchmark by its publish year. It is because we cannot predict when our paper will be published in prior when we prepare the manuscript.

It is important to point out that unlike the PDBbind database, the PDBbind core set is not updated on an annual basis. As implied above, the PDBbind core set is a component of the CASF benchmark rather than the PDBbind database. The CASF benchmark is not updated on an annual basis due to the following reasons:

• A HUGE amount of efforts is needed to finish each CASF update. The CASF benchmark is more than a simple data set. For instead, it consists of a whole set of evaluation methods, the test set, as well as a large panel of standard scoring functions to be tested as demonstration. A lot of material needs to be prepared, and a lot of computation needs to be conducted for each CASF update.

• Even if it were doable, in our opinion, there is no need to update CASF so frequently. Our current plan is to update the CASF benchmark every three years. In fact, we have already finished CASF-2016, and are preparing a manuscript regarding it. We hope that this paper can be published in the year of 2018.

As mentioned above, the last published version of the PDBbind core set is v.2013. This data set was not updated with PDBbind v.2014 and v.2015, so there is no PDBbind core set v.2014 and v.2015. For historical reasons, the PDBbind core set used to be included in the downloadable data package in some previous releases of PDBbind. To avoid further confusion, we have removed the core set from the data packages of recent releases of PDBbind (e.g. PDBbind v.2014, v.2015, v.2016, and v.2017). If needed, the user can obtain the information of the PDBbind core set in the data package of the corresponding CASF benchmark (e.g. CASF-2007 and CASF-2013), which is also downloadable from the PDBbind-CN web site.

In conclusion, the take-home message is:

• The CASF benchmark should not be referred to as the "PDBbind benchmark". There are such wrong naming conventions in literature, and now you know what the correct one is.

• Data package of the CASF benchmark can be downloaded from the PDBbind-CN web site under the "CASF" tab (http://www.pdbbind-cn.org/casf.php). At this point, we do not think it is necessary to set up two separate web sites to host PDBbind and CASF, respectively.

• Currently, the latest public release of the CASF benchmark is CASF-2013. There will be CASF-2016 soon.

----------------------------------------------------------------------------------------------------------------------------

By Prof. Renxiao Wang, Mar 3rd, 2018

Dear PDBbind users,

We have received a good number of queries regarding the next release of PDBbind. PDBbind database has a long-standing tradition of regular annual update since its inception. However, it is already year 2023 but the available release is still version 2020 --- We understand your concern.

In fact, our team has been working diligently on PDBbind version 2021 in the past three years. It is important to note that version 2021 is not a regular update but the most significant update in the history of PDBbind, encompassing more binding data (increased by ~20%), new workflow for processing structures, new on-line functions, and a new cloud-based server. It turns out that achieving all these objectives requires much more efforts than what we had anticipated. After version 2021, we will be able to return to the tradition of annual update in the near future.

Our current plan is to relase PDBbind version 2021 officially before the new year of 2024. We would like to express our gratitude for your continued support of PDBbind. Please keep an eye on the new announcements put on this website.

Best wishes,

Prof. Renxiao Wang, on behalf of the PDBbind team
Department of Medicinal Chemistry, School of Pharmacy, Fudan University
Shanghai, P. R. China
E-mail: wangrx@fudan.edu.cn

Dear All,

We are excited to announce that the beta version of our new PDBbind+ web site is now ready for test. Starting from version 2021, all future new versions of the PDBbind database will be released solely on PDBbind+. We cordially invite you to experience the upgraded features of the PDBbind+ web site.

Current registered PDBbind users will be receiving an e-mail soon, from which his/her account on PDBbind+ can be activated directly after transferring his/her user profile on PDBbind-CN to the new web site. Others are encouraged to visit PDBbind+ at www.pdbbind-plus.org.cn. Registration on the PDBbind+ web site as a demo user is FREE. Demo users may access the contents of the PDBbind database up to version 2020 on the new web site.

We plan to release version 2021, as well as additional functional modules, on PDBbind+ once the beta test is completed. Official release of version 2021 is anticipated in this month, so please stay tuned. For the sake of current PDBbind users, the PDBbind-CN web site will still be up running as is, but no future update of PDBbind-CN is planned.

If you need any assistance or have any questions regarding PDBbind+, please feel free to reach us at support@pdbbind-plus.org.cn. Thank you for your continued support to the PDBbind database!

Best regards,

The PDBbind Team
School of Pharmacy, Fudan University

Dear valued PDBbind users,

We are pleased to announce the official release of PDBbind version 2024 on the PDBbind+ platform (https://www.pdbbind-plus.org.cn/). Note that the previous release is version 2021. It means we have chosen to provide version 2024 directly by skipping version 2022 and 2023. The PDBbind database will return to the track of regular annual update from now.

The key highlights of PDBbind version 2024 include:

(1) Expanded collection of binding data: The new release encompasses experimental binding affinity data for 33,660 biomolecular complexes sourced from the Protein Data Bank, marking a 23% growth from the previous release (version 2021). Compared to the last free-accessible release (version 2020), the growth reaches a significant level of 43%. PDBbind version 2024 provides binding data for >27300 protein-ligand complexes, >200 nucleic acid-ligand complexes, >4500 protein-protein complexes, and >1400 protein-nucleic acid complexes. This expansion enables a broader and deeper exploration of molecular interactions, such as training deep-learning models and so on.

(2) Carefully processed complex structural files: We have implemented a new workflow since version 2021 to ensure the structural files of protein-ligand complexes are processed properly to be compatible with other popular software (such as RDKit). As for version 2024, we have further refined this workflow to achieve even higher accuracy and reliability in data interpretation. This workflow has been applied to process nucleic acid-ligand complexes as well.

(3) Attention to Macrobiomolecular complexes: For the first time in the history of PDBbind, version 2024 now provides processed structural files for the protein-protein complexes and protein-nucleic acid complexes in PDBbind. For this purpose, necessary annotations are added to the binding data, so one can interpret the interacting chains in those complexes. A new workflow has been established to process and fix certain defects in the original PDB structural files. This new feature is expected to facilitate the computational research focusing on such molecular systems.

(4) New functions implemented on the PDBbind+ platform: Our web platform offers useful features for structural visualization and data analysis. Additionally, computational tools developed by our team, such as COMET (target-fishing for bioactive molecules) and PLANET (ultra-fast structure-based virtual screening), have been integrated to enrich user experience, while cloud resources facilitate efficient on-line computation. Even the demo users may use these computing services on PDBbind+.

As part of our commitment to fostering collaboration and knowledge exchange, registration on PDBbind+ as a demo user remains FREE. Demo users have access to a range of data and computing services free of charge. For those seeking access to the latest data collection and complete functions, we offer the users the option to purchase the PDBbind dataset (version 2024 for now) with a modest licensing fee. Upon becoming a paid user, you will unlock full access to all available data and computing services on PDBbind+.

It needs to be emphasized that the PDBbind+ platform is the only legal resource where one can obtain the PDBbind dataset since version 2021. Public re-distribution of the PDBbind dataset, or a derivative dataset, is prohibited by the user license agreement.

We express our heartfelt gratitude to you for your unwavering support, which serves as the cornerstone of our endeavors. Your feedback and engagement continue to inspire us as we strive to evolve the PDBbind database into a more valuable community resource.

With best regards,

The PDBbind+ Team

School of Pharmacy, Fudan University & TopScience Ltd., Shanghai