**SITE STATUS UPDATE**

  Site content: 50% restored. VCS: 100% restored. Public repos: 80% restored. Private repos: 100% restored. NMR eNGINe: 0% restored. Text-to-vocals: 0% restored.

DEVELOPING A PIPELINE FOR SPEEDING UP STRUCTURAL VENOMICS STUDIES

BIOPHYSICS HONOURS THESIS — 6th of November, 2009

Tomas Marko Miljenović 1, 2
Supervisor: Dr. Mehdi Mobli 2

  1. School of Mathematics and Physics, The University of Queensland, St Lucia, Qld 4072, Australia
  2. King Group, Institute for Molecular Bioscience, St Lucia, Qld 4072, Australia

Contents


Chapter 1.  Introduction

1.1  The need for new insecticides

The world is currently experiencing a major food crisis1-5, with more than one billion people estimated to be living in hunger in 2009. The rising world population continues to increase the demand for food, while decreasing the amount of free arable land per capita1; 3. The global food supply is further being threatened by global warming2; 4; 5. Climate change is predicted to create a significant net decrease in fertile land area worldwide2; 4, and global temperature rises are expected to increase the population and spread of many agricultural pests2. Modern agriculture is dependent on the use of insecticides to maintain current supply levels, with current yields and crop quality predicted to decrease dramatically if the use of existing pesticides were abandoned. However, the efficacy of several insecticides has decreased, with pest populations becoming increasingly resistant6-8. Furthermore, in addition to their effect on their intended targets, many existing pesticides are capable of adversely affecting the health of humans, livestock and the environment9. A significant share of the insecticides in use today were developed decades ago during the golden age of pesticide development, when there was less concern about resistance, and health and environmental issues7. While the prospective loss of ineffectual and unsafe insecticides looks to have a detrimental impact on food supply, the development of novel insecticides against a range of insect pests could be a major boon to global food security. Yet despite modern demand for new, safe and effective insecticides, few have been developed in recent years7 .

Similar problems are faced in efforts to control arthropod-borne diseases. Human population growth and migration have facilitated the spread of insect vectors, threatening disease outbreaks in previously unaffected areas8; 10-14. Containment and eradication strategies for these diseases rely heavily upon the use of insecticides8; 15. However, the number of vector populations showing resistance to current insecticides is increasing, and novel replacements are lacking6; 8; 16; 17. The use of chemical insecticides with adverse health or environmental effects is necessitated in several countries due to the lack of effective alternatives, and resistance to these chemicals is increasing. The small number of compounds able to control certain insect vectors frustrates resistance-prevention strategies, which often rely upon rotational use of different compounds17. Warnings have been made that new insecticides for disease vector control will be required to avert or combat health crises8; 18-20.

1.2  Spider toxins as novel insecticide candidates

Spiders have been natural predators of insects for millions of years21; 22. Most spiders are venomous creatures21; 23, producing venom comprising numerous different types of peptide toxins24. (Non-proteinaceous toxins are also produced, but they will not be discussed here.) Spider toxins characteristically possess fairly short peptide chains of about 40–60 amino acids, which are folded into very stable globular conformations due to a common post-translational modification involving the formation of a disulphide bond between two cystine residues. Spider toxins typically contain 3–5 disulphide bonds often arranged to produce a knot-like geometry, resulting in highly stable molecules capable of withstanding much of the biochemical insults they may be exposed to in an envenomated organism. Collectively, toxins are active against a diverse range of targets within numerous species. Modes of action vary, with neurotoxic, haemolytic, antiarrhythmic and cytolytic activity being some examples of lethal and non-lethal activity exhibited.25 Interest in spider-venom peptides as potential insecticide candidates is growing26; 27, and several potent toxins with high insecticidal activity have already been isolated28-30. Considering that there are currently ~41,000 species of spider classified31, and it is estimated that each species could produce over one thousand unique toxins, the potential pool of molecules with insecticidal activity may be several millions28; 32. This contrasts the meagre 521 spider toxins that have thus far been isolated and published25. Fortunately, however, the number of toxins discovered each year is rising exponentially34 (see Fig. 1.1). This dramatic expansion necessitates a depository for the storage and interrogation of the ever-growing accumulation of data. Ideally, such a database would include information on each toxinʼs amino acid sequence, three-dimensional structure, disulphide-bonding pattern, biological activity, molecular target, phyletic specificity, and, where available, pharmacophore and genomic information. The development of such a database is considered in Chapter 2.


The process of translating an insecticidal spider toxin into a novel insecticide involves detailed biochemical and biophysical characterisation of the toxin at an atomic level. A very important part of this process is the production of pure toxins; Chapter 3 will discuss the available technologies for this purpose. Finally, to produce stable and effective, small molecule, insecticides from these toxins requires detailed knowledge of the three-dimensional (3D) structures of the toxins (as will be discussed in Chapter 4). This information can then be used to guide the chemical synthesis of potent mimetics, which may be mass-produced and utilized using conventional agricultural methods.


Figure 1.1

Figure 1.1: Plot of the number of spider toxins discovered per year, taken from ArachnoServer (www.arachnoserver.org). The discovery year was taken to be the year in which the toxinʼs amino acid sequence was first isolated, published or accepted into a database (whichever is earliest).

Currently this approach is greatly hampered by the lack of toxin structures available (currently only 3925 structures of spider-venom peptides have been deposited in the Protein DataBank (PDB)27; 33). Unfortunately, the exponential rise seen in the number sequences discovered annually (Fig. 1.1) has not been translated a greater number of toxin structures being determined, as illustrated in Fig. 1.2.25. The preferred method for high-resolution structures determination of toxins is by nuclear magnetic resonances. The largest prohibiting factors to high-throughput structure determination are the time-consuming data acquisition and analysis steps. Chapter 4 will discuss novel approaches to data acquisition capable of significantly reducing the time requirements, and making NMR analysis a viable high-throughput method in structural studies of proteins. The various components which will be discussed in this thesis are part of what is emerging as a new field dubbed “Structural Venomics” 27.


Figure 1.2

Figure 1.2: of number of unique 3D structures of spider-venom toxins discovered per year. Discovery year was taken to be the year of publication or deposition into the Protein Data Bank (PDB), whichever was earliest. The data used for this plot, including toxin names and PDB accession codes, are listed in Appendix A. Additional structures for the same toxin were disregarded, as were theoretically derived model structures.


Spider-venom peptides are also of interest as lead compounds for drug development21; 27. Isolated toxins have been found to target a variety of ion channels and receptors of pharmacological interest21; 35; 36, and they are currently being investigated for the treatment of atrial fibrillation37, erectile dysfunction38 and acute and neuropathic pain39. Although the results and methods developed herein are equally applicable to such toxins, the focus of this thesis will be insecticidal toxins.


Chapter 2.  ArachnoServer

Ongoing mapping of spider venomes (i.e. a complete catalogue of all peptides and proteins in the venom of a spider) will result in the accrual of a wealth of diverse yet interrelated information on numerous spider toxins. Information on amino acid sequence, three-dimensional structure, disulphide-bonding pattern, biological activity, molecular target, phyletic specificity, pharmacophore, and genomic information are expected to be collected for the continually-increasing range of identified toxins (see Section 1.2). Some of this information will be a prerequisite for collecting other remaining data (e.g. knowledge of the amino acid sequence is generally required before the 3D structure can be determined). This information will also be crucial for selecting candidates for insecticide and drug development. Ideally, this information should be readily and freely accessible to all researchers, stored in a manner that is easy to query and highly scalable (see estimated number of toxins in Section 1.2).

Online databases containing some of this information are already available. Tox-Prot (www.expasy.ch/sprot/tox-prot) and the Animal Toxin Database (protchem.hunnu.edu.cn/toxin) are two repositories designed to hold information on all animal toxins. generality of these databases makes them useful for comparing toxins between lineages and species, but they do not allow for the detailed storage of rich, manually curated information that is critical for spider toxin research focussed on insecticide and drug development. An example of such a species-specific database, tailored for research on a particular class of venoms, is ConoServer (www.conoserver.org), a database of peptide toxins from venomous marine cone snails. An analogous database that provided detailed information on all characterised spider toxins would greatly assist with structural venomics research and insecticide/drug development. For these reasons, a web-based database was built to serve as a central repository for information on proteinaceous spider toxins25. Named ArachnoServer (to allow for future expansion from Araneae-sourced toxins to toxins from all Arachnida), it can be accessed online at http://www.arachnoserver.org .

2.1  Requirements

Certain requirements would need to be met in order for this database to be an invaluable component of the venomics pipeline. It would need the capability to logically store all information generally relevant to such research, along with citations for all provided information. Foremost, each toxins data record would need to describe the property that makes it unique, namely its mature toxin sequence. Data for (potentially) multiple precursor sequences would also need to be included, as well as notable sequence features for each precursor (including signal peptide, propeptide, and excised regions). Also important would be information on features of the mature toxins such as post-translational modifications, pharmacophores, and disulphide-bonding pattern. In the case of disulphide bonds, an indication of how the bond connectivity was determined (i.e. by experiment, homology, or theoretical modelling) would need to be indicated without exception for each disulphide bridge, to assist with assessing this informations reliability; most other databases do not include this information. The ability to include genetic information in the form of DNA, cDNA, and mRNA sequences is also required.


In order to easily find and compare toxins, entries would need to be stored using a suitable classification system. This would best be done by following the recently devised rational nomenclature for spider-venom peptides, which names each toxin on the basis of molecular target, channel or receptor type, toxin family (grouping similar toxins), source spider (genus/species), and isoform57. The location where the source spider was collected and its taxonomy could assist with collecting samples and searching for toxins that have yet to be isolated. Importantly, the database would also need to store or link to structural information. Ideally, an online viewer would be incorporated to allow for an immediate examination of each solved 3D structure for a particular toxin. Structure files should also be available for download in Brookhaven PDB format, given its widespread use.


Information on toxicity would also be required. All applicable modes of action (such as whether a toxin is neurotoxic or antimicrobial) would need to be included. Inclusion of molecular targets and quantitative measures of efficacy such as the inhibitory concentration (IC50), dissociation constant (Kd), and effective dose (ED50) for each target would be of extreme importance for insecticide and drug development. Furthermore, biological activity on a per-species basis would be required, with lethal dose (LD50), paralytic dose (PD50), and further qualitative information included as relevant for the induced effect.


The above were deemed key features needed to make the database a comprehensive repository consistent with the needs of mapping the spider venome. While the above would allow for much useful information to be stored, it was recognised that certain studies could require information beyond the capabilities and scope of the completed database. The inclusion of accessions to external databases would provide links to more specialised information stored elsewhere, and a flexible data structure would allow later upgrades to include additional information.

2.2  Database Design and Implementation

A database structure was devised, based upon the requirements outlined in the previous section (see figure 2.1). A relational database model was selected, with unique primary and foreign key pairs used to ensure data integrity. A more normalised data structure was favoured, though not always possible due to the complex nature and relationships between the data being stored. The database schematic can be seen in figure 2.3 .


The initial database was implemented using MySQL (mysql.com). The final application was built with a Model View Controller (MVC) architecture using the Spring (springsource.org) framework. A MySQL database was selected for a storage backend, with Hibernate (hibernate.org) Object Relational Mapping (ORM) used to create an abstraction layer for the controller. With the mapping of database table to Plain Old Java Object (POJO) beans, the ORM layer was capable of generating all required SQL statements for querying and modifying the database. This approach simplifies future alterations to the data structure. A web application interface was designed for the view, using JavaServer Pages (JSP) and Asynchronous JavaScript and XML (AJAX) to serve content. A Spring service layer connects the interface to the ORM abstraction layer, facilitating the viewing, searching and curation of database content.


The database was initially populated from existing records in the UniProt (www.uniprot.org), EMBL/GenBank (http://www.ebi.ac.uk/embl/) and PDB databases. Initial records were created by import scripts, populating fields supported by the external databases. Manual curation verified imported data and added information from other literature sources, in addition to populating remaining fields. Source spider information was obtained from the World Spider Catalog31, and post-translational modifications to mature toxins were classified using a system based upon that implemented for ConoServer (www.conoserver.org). A new ontology was developed for classifying molecular targets, conforming with International Union of Basic & Clinical Pharmacology recommendations63, with targets able to be classified at the subtype level of receptors or ion channels.


Figure 2.1

Figure 2.1: Toxin Record (STR) for δ-hexatoxin-Ar1a.


2.3  Interface and Functionality

For visualisation purposes, repository data was grouped into spider toxin records (STR), with a STR for each entry in the TOXIN table of the database (see figure 2.1). Information on individual toxins are displayed in an STR card (an example is shown in figure 2.2). The initial view displays basic descriptive information at the top of the card, including the toxin name, source species (together with location and a photograph), discovery year and basic descriptive information. A dynamically-generated display of the mature toxin sequence is prominently displayed. This display graphically marks the locations of any disulphide bonds (differentiating between bonding patterns determined experimentally, by homology or predicted), pharmacophoric residues (distinguishing between predicted and determined) and post-translational modifications (together with modification type). Hovering a cursor over the image provides further information, including residue number (with respect to the mature sequence) and full residue name.

Below the default view items, the STR card holds a number of expandable sections, in which all other information on that toxin is grouped. The sections are carefully categorised so that researchers need only view information relevant to their discipline or project stage. The Taxonomy section provides the full taxonomy of the source spider, as well as any historical taxonomy and alternate or deprecated names for the species. Biological Activity provides mode of action, toxicity in different species, and activity for specific molecular targets and subtargets. Accessions provides hyperlinks and accessions relevant to this toxin in external databases, such as links to PDB entries. Literature References cites other source information, including original publications. Protein Information displays annotated precursor sequences and includes a table displaying disulphide bond connectivity. Sequences contains peptide, precursor and genetic sequences in FASTA format, with links to perform BLAST (www.ncbi.nlm.nih.gov/BLAST/) searches with each sequence. Toxin Synonyms contains common, alternate and historical names for toxins (important due to the the new rational nomenclature system being relatively new). Toxin Structure contains a Jmol (http://www.jmol.org/) applet which can be used to interactively view each available toxin structure. Placeholder headers for expandable sections are only displayed if some or all of the relevant database records for the section are populated.

An important feature of the Spring interface is its advanced search capability. While basic keyword searches can be performed across key fields, advanced searches allow for multi-clause conditional queries across several data fields. Most displayable database fields can be searched with a set of different operators depending on data type. Additional clauses can be dynamically added to the search page, then grouped and joined by Boolean operators. The advanced search capabilities allow researchers to quickly locate potential lead compounds or find other toxins of interest. An example search shown in Figure 2.2 reveals all toxins with known structure exhibiting antimicrobial activity against Escherichia coli.


Figure 2.2

Figure 2.2: Example of the advanced search capabilities of ArachnoServer. For brevity, only the first four results are shown in (A) and (B). Most database fields may be searched, with numerical, string and list operators available for search conditions. Multiple search clauses can be linked by Boolean operators and grouped hierarchically.


Figure 2.3

Figure 2.3: ArachnoServer database schema


Chapter 3.  Bacterial expression system

The production of toxins on the scale required for structural venomics studies is said to be the “greatest remaining bottleneck” 27 preventing realisation of the full potential benefits of animal venoms. For spider toxins, sufficient quantities for the full range of structural and functional studies cannot be obtained from milking or sacrificing spiders. Methods such as bacterial expression58, solid-phase peptide synthesis in combination with native chemical ligation59 and insect cell expression60 can be used to produce toxins in vivo. However, structural studies require these toxins be in their native conformation. Spider toxins typically contain between three and five disulphide bonds24 ; 28; as this results in between 15 and 945 possible isomers after disulphide-bridge formation, care must be taken to ensure that the produced toxins have the correct disulphide-bonding pattern prior to attempting structural and functional studies.

Production of recombinant toxins in bacteria was selected as the method of production for the structural venomics pipeline. The bacterial species chosen was Escherichia coli as it has been studied extensively, it is most commonly used method for recombinant protein expression, and allows for relatively quick, uncomplicated peptide expression. The periplasm of E. coli contains sophisticated molecular machinery for ensuring correct disulphide-bond formation, and it was surmised that this machinery could be utilised to produce toxins with the correct disulphide-bonding pattern.

Furthermore, E. coli is an ideal system for the production of uniformly 15N/13C-labelled toxin for 3D triple resonance NMR experiments. The native isotopes of carbon and nitrogen are not suitable for NMR studies as 12C has zero spin-quantum number, and hence is not NMR-active56, whereas 14N is a highly insensitive quadrupolar nucleus. Thus, replacing these nuclei with the stable isotopes 13C and 15N, both sensitive spin-1/2 nuclei, opens up a range of additional 3D and 4D NMR experiments that both aids resonance assignments and leads to higher-quality structures. In order to make isotopically-labelled proteins, E. coli is gown in minimal medium containing 15NH4Cl as the sole nitrogen source and, if 13C labelling is also required, with [U-13C6]glucose as the sole carbon source for growth. So long as appropriate vitamins and minerals are provided in the minimal medium, the levels of protein expression are typically not significantly different to those of bacteria grown in a nutrient-rich medium.

In order to test the bacterial expression pipeline, five test toxins with potential as bioinsecticides were selected from ArachnoServer. Each toxin was required to have proven insecticidal activity, possess no more than five disulphide bonds, and be less than 70 amino acid residues in length. Sequences that were likely to encode novel 3D structures were also given high priority. From these five, the following two were selected for my thesis project:

Toxin name: U2-segestritoxin-Sf1a (Sf1a)

Source: Segestria florentina (Tube-web spider)

Sequence: KECMTDGTVCYIHNHNDCCGSCLCSNGPIARPWEMMVGNCMCGPKA

Number of disulphide bonds: 4


Toxin name: U1-agatoxin-Ta1a (Ta1a)

Source: Tegenaria agrestis (Hobo spider)

Sequence: EPDEICRARMTHKEFNYKSNVNGCGDQVAACEAECFRNDVYTACHEAQK

Number of disulphide bonds: 3


3.1  Experimental procedures

(See also Low, C.F. (2009). Development of an efficient bacterial expression system for production of disulfide-rich toxins for structural venomics, MSc PROJECT REPORT, Uni. of Queensland.)

3.1.1  Materials

Synthetic genes, with codons optimised for E. coli expression, were synthesised by GENEART (BioPark, Regensburg, Germany), cloned into the pLICC expression plasmid (see Section 3.1.3), and sequenced to ensure fidelity with the supplied amino acid sequence. Acid (TFA) was sourced from Auspep Pty Ltd (Vic., Australia). -grade acetonitrile was sourced from Lab-Scan Analytical Sciences (Lab-Scan Co. Ltd, Bangkok, Thailand). BenchMarkTM Prestained Protein Ladder was obtained from Invitrogen (Vic., Australia). Ni-NTA Superflow column resin was sourced from Qiagen (Vic., Australia). -Bertani (LB) broth and agar was obtained from United States Biochemical (Ohio, USA). Ampicillin (Amp), Tris hydrochloride (Tris-HCl) and isopropyl β-D-1-thiogalacto-pyranoside (IPTG) were obtained from Astral Scientific (NSW, Australia). Ethylenediamine-tetraacetic acid (EDTA) was obtained from Fisher Scientific (Vic., Australia).

3.1.2  Common abbreviations

  • LB/Amp refers to 100 μg of ampicillin mixed into 5.0 mL of Luria-Bertani media.
  • TSA buffer refers to 30 mM Tris-HCl pH 7.2, 40% sucrose and 2 mM EDTA
  • TNG buffer refers to a 20 mM Tris pH 8.0, 200 mM NaCl and 10% glycerol solution
  • Redox buffer refers to 6mM reduced and 0.6mM oxidised glutathione in TNG

3.1.3  Plasmid design

Each toxin gene was cloned into the pLICC plasmid vector by GENEART to allow for periplasmic expression in E. coli. In this plasmid vector, the toxin is encoded as a MalE-His6-MBP-toxin fusion protein, with a tobacco etch virus (TEV) protease site engineered between the MBP and toxin coding regions. The MalE signal sequence at the beginning of the construct is used to ensure that the protein is exported to the periplasm (the signal sequence is cleaved off during this process). The hexahistidine tag enables purification of the MBP-toxin fusion protein using nickel affinity chromatography. The MBP (maltose binding protein) is used to aid folding and solubility. The TEV cleavage site enables the pure recombinant toxin to be realised from the MBP fusion protein. The pLICC vector contains an ampicillin resistance gene (AmpR) for selection, and expression of the target gene is inducible with IPTG via a T7 promoter. The plasmid map for the pLICC:Sf1a vector used to produce recombinant U2-segestritoxin-Sf1a expression is shown in Fig. 3.1.

Figure 3.1

Figure 3.1: pLICC:Sf1a, the plasmid used to produce a MalE-His6-MBP-Sf1a fusion protein via bacterial expression. The regions encoding the His6 tag, MBP, Sf1a, and TEV cleavage cite are indicated, as are the T7 promoter and ampicillin resistance gene.

3.1.4  Test expressions

The following was performed in duplicate for each of the two plasmid vectors. vectors obtained from GENEART were transformed into E. coli strain BL21(λDE3) using the heat-shock method. L of 10 mg/μL plasmid was added to 50 μL of competent cells, then the mixture was vortexed and incubated on ice for 30 min. The culture was heated for 55 s at 42℃, then chilled on ice for 2 min. Transformed BL21(λDE3) cells were incubated at 37℃ in 1.0 mL LB media for 1–3 h, then centrifuged at 8,000 rpm. The pellets were then separated from the supernatant and resuspended in a further 50–100 µL LB. The resuspended pellets and media were streaked onto LB/Amp plates that had been preheated to 37℃; these were then incubated at 37℃ overnight. Six colonies were selected, suspended in LB/Amp, and again incubated overnight at 37℃. 500 µL of the BL21(λDE3) culture was transferred to new LB/Amp and incubated at 37℃ with shaking until an OD600 between 0.8 and 1.3  (w.r.t. pure LB) was attained. A 1.0 mL aliquot of each culture was induced with 1 μL of 1M IPTG and incubated for 3 h at 37℃ with shaking. An additional 1.0 mL aliquot of each were taken as negative controls, and incubated under the same conditions without being induced by IPTG. Cells were then harvested from induced samples and controls after 10 min of centrifugation at 8,000 rpm.

The pellets from the induced samples were resuspended in 100 μL TSA buffer. A 30 μL aliquot of each suspension (hereafter referred to as “whole cell extract – induced” or simply “WC induced”) was removed and set aside, and the remainder was centrifuged at 12,000 rpm for 10 min. The pellets were resuspended in 100 μL ice-chilled water to induce hypotonic shock and break the outer membrane, then again centrifuged at 12,000 rpm for a further 10 min. A 30 μL sample of the supernatant was collected (hereafter referred to as “periplasmic cell extract – induced” or simply “PE”). The pellets were then resuspended in 100 μL SDS-PAGE running buffer, and a 30 μL aliquot was retained (henceforth referred to as “cytoplasmic cell extract – induced” or simply “CE”). The pellets from the control samples were resuspended in 100 μL SDS-PAGE running buffer; a 30 μL sample (hereafter referred to as “whole cell extract – uninduced” or simple “WC uninduced”) was retained. 30 μL of 2× SDS-PAGE loading dye was mixed into each of the WC induced, WC uninduced, PE and CE samples, then boiled for 5 min prior to PAGE analysis on a 12.5% polyacrylamide gel using a running potential of 200 V.

3.1.5  Production of isotopically-labelled toxins

The PE samples of each toxin that were shown to have the highest level of expression based on SDS-PAGE analysis of the test expressions, were selected for the production of uniformly 15N-labelled toxins. The protocol used was adapted from a previously published method56. of the solutions prescribed in the original publication – a thymine-uracil solution and solution containing tricine and MOPS buffer – were not used for the following expressions.


Vitamin solution, metal stock solution, O solution, SBMX solution, S solution and thiamine solution were prepared as described in the literature56. growth media was prepared for each toxin by mixing 40 mL SBMX solution, 1 mL S solution, and 940 mL distilled water into a 5 L baffled conical flask and autoclaving. The selected E. coli transformants were streaked on fresh LB/Amp plates and incubated overnight at 37℃. colonies selected from the plate were inoculated into 5 mL LB/Amp media to produce starter cultures that were then incubated at 37℃ with shaking. 2 mL O solution, 1 mL vitamin solution, and 1 mL thiamine solution were then mixed, filter sterilised, and added to the growth medium. -sterilised solutions of 4 g glucose in 10 mL of water and 1 g 15NH4Cl in 5 mL of water were also added to the medium, followed by 1 mL of 100 mg/mL ampicillin. Conical flasks with 50 mL of minimal media were prepared for each of the starter cultures, then inoculated with 0.4 mL of the culture after 4 h. The cultures were incubated at 37℃ with shaking until an OD600 between 2 and 3 was reached. The 50-mL cultures were then added to the remaining minimal medium for that toxin. Once each suspension reached an OD600 of 0.8, 1 mL of 1 M IPTG was added to induce toxin expression. After 3 h cells were harvested by centrifugation at 8,000rpm for 10 min; the pellets were retained and stored at -80℃.

3.1.6  Toxin purification

Toxin purifications were carried out on ice with pre-chilled buffers. The pellets obtained from enriched media culture were defrosted on ice and resuspended in 60 mL TSA buffer. The mixtures were centrifuged at 12,000 rpm for 20 min at 4℃, then the pellets were resuspended in 60 mL chilled water. 5.4 mL of 20 mM magnesium chloride was mixed into the suspension, prior to 10 min incubation followed by centrifugation at 12,000 rpm for 20 min at 4℃. 650 μL TNG buffer was added to the extracted supernatant. 6 mL Ni+-equilibrated Ni-NTA Superflow resin was added to the solution, and mixed for 1 h prior to setting in an empty column. After four washes with 50 mL TNG mixed with 15 mM imidazole in order to remove unbound proteins, 20 mL TNG mixed with 250 mM imidazole was used to elute the fusion protein from the resin matrix. The eluate was concentrated down to 5 mL using an Amicon Ultra 3 kDa centrifugal filter, then diluted with 10–12mL TNG buffer prior to recconcentration to 5 mL. The concentrate was then made up to 10 mL by the addition of redox buffer. 1 mg/mLTEV protease was added to the solution, before incubation for 12–13 h with shaking, in order to release the recombinant toxin from the MBP fusion proteins. , the solution was passed over the Ni-NTA Superflow matrix in order to capture the His6-MBP. The column was washed with TNG, and purified recombinant peptide toxin was collected in TNG buffer.

3.1.7  RP–HPLC

Reverse-phase high pressure liquid chromatography (RP-HPLC) was used to further purify the recombinant toxin. Shimadzu Prominence HPLC system fitted with a Vydac analytical C18 RP-HPLC column (218TP54 series, 5 μm, 4.6  mm × 250 mm) was utilised for all purifications, with a Shimadzu Prominence SPD-20 UV/Vis detector operating at 214nm and 280nm used to monitor peptide elution. Trials with small amounts of peptide were used to determine optimal concentration gradients for purification. 0.1% TFA in water was used as solvent A, while solvent B comprised 0.09% TFA, 90% acetonitrile and 9.91% H2O. toxins were eluted at a flow rate of 1 ml/min using the following gradient profiles: 5–25% solvent B over 10 min, 25–65% solvent B over 40 min, then 80% solvent B over 5 min for Sf1a; 5–18% solvent B over 5 min, 18-35% solvent B over 26 min then 35–80% solvent B over 5 min for Ta1a. toxins were purified to >98% homogeneity.

3.1.8 NMR and MALDI–TOF MS

Toxin samples were analysed using both NMR and mass spectrometry. 2D 1H-15N HSQC NMR spectra were acquired on a Bruker 900 MHz NMR spectrometer equipped with cryogenic triple resonance probe. Sf1a (54.4   μM at pH 5.0) and Ta1a (43.5  μM at pH 6.0) were used for NMR data collection. Matrix-Assisted Laser Desorption Ionisation–Time of Flight mass spectrometry (MALDI-TOF) mass spectra were acquired using an Applied Biosystems 4700 Proteomics Bioanalyser with 5 mg/mL α-cyano-4-hydroxycinnamic acid in 1:1 acetonitrile/H2O as the matrix.


3.2  Results

Uniformly 15N-labelled U2-segestritoxin-Sf1a and U1-agatoxin-Ta1a were both successfully produced using the E. coli periplasmic expression system. An SDS-PAGE gel showing overexpression and initial purification of the toxins can be seen in Fig. 3.2 . of the whole-cell extracts before IPTG induction (lane 1, labelled “WC” in black) and after IPTG induction (lane 2 for U2-segestritoxin-Sf1a and lane 3 for U1-agatoxin-Ta1a, labelled “WC” in red) reveals that an intense band at ~48 kDa, consistent with the expected size of the His6-MBP-toxin fusion protein, is only present after IPTG induction. Moreover, as expected, this protein is found predominantly in the periplasmic fraction (labelled “PE”), with only minimal amounts, if any, present in the cytoplasmic extract (labelled “CE”). was conclude that the MBP-toxin fusion proteins have been highly overexpressed and are found almost exclusively in the periplasm.

Figure 3.2a
(a) SDS-PAGE gel: Sf1a test expressions
Figure 3.2b
(b) SDS-PAGE gel: Ta1a test expressions

Figure 3.2: SDS-PAGE gels showing test expressions for Sf1a and Ta1a. Each test expression was performed in duplicate, with both being run on the same gel. Lane markings in red text indicate samples induced with IPTG. WC, PE, and CE indicate whole-cell extract, periplasmic extract, and cytoplasmic extract, respectively, while L is the ‘ladderʼ of protein molecular-weight standards. The masses for each of standard are shown on the left of the gel. The bands highlighted with a box in the PE lanes are presumed to be the MBP-toxin fusion proteins.

Following nickel affinity chromatography and TEV cleavage of the fusion protein, the recombinant toxins were eluted from the Ni-NTA column and further purified using RP-HPLC. The resulting HPLC chromatograms (see Fig. 3.3) show that a single disulphide-bond isoform of each peptide was isolated (retention times of ~27 min and ~18 min for U2-segestritoxin-Sf1a and U1-agatoxin-Ta1a, respectively). The insets shown in Fig. 3.3  are MALDI-TOF mass spectra of the collected peaks and they indicate that the correct toxins were produced and isolated based on mass parity. The peaks eluting at ~40 min are presumed to be MBP.

Figure 3.3a
(a) HPLC chromatogram: Sf1a sample
Figure 3.3b
(b) HPLC chromatogram: Ta1a sample

Figure 3.3: Reverse-phase HPLC chromatograms showing purification of Sf1a and Tf1a. As shown in (a) and (b), Sf1a and Ta1a were found to elute at 38-40% and 22-23% solvent B, respectively. The inset in each chromatogram is a MALDI-TOF-MS spectrum of the peak taken to be the purified toxin. The experimentally determined masses for each toxin (51181.51 Da for Sf1a and 5836.73 Da for Ta1a) are identical, within experimental error, with their predicted monoisotopic, fully oxidised masses (5118.44 Da and 5836.99 Da), indicating that the correct toxins have been produced and purified.

The 2D 1H-15N HSQC spectra obtained for each of the recombinant toxins (Figs 3.4 and 3.5 for U2-segestritoxin-Sf1a and U1-agatoxin-Ta1a, respectively), indicates that the peptides have folded into well-ordered structures based on the excellent chemical shift dispersion in both the 15N and 1H frequency dimensions. Moreover, each HSC spectrum contains the expected number of backbone amide peaks based on the known amino acid sequence. The expected number of peaks is the total number of amino acid residues minus the number of prolines (which do not have a backbone amide proton and therefore do not give rise to correlations in 1H-15N HSQC spectra) minus the N-terminal residue (the amide group of this residue is not involved in a peptide bond and hence these the amide protons exchange rapidly with those of the solvent are broadened beyond the level of detection).

Figure 3.4

Figure 3.4: HSQC spectrum of purified U2-segestritoxin-Sf1a. The peaks (numbered) are well dispersed, indicating the peptide is in an ordered, globular conformation. There is a peak for each HSQC-sensitive (i.e., non-proline) amino acid residue. The horizontal lines join pairs of sidechain amide protons from Asn/Gln residues. 1D projections of the 1H and 15N frequency dimensions are shown above and to the left of the 2D spectrum, respectively.

Figure 3.5

Figure 3.5: HSQC spectrum of purified U1-agatoxin-Ta1a. The peaks (numbered) are well dispersed, indicating the peptide is in an ordered, globular conformation. There is a peak for each HSQC-sensitive (i.e., non-proline) amino acid residue. The horizontal lines join pairs of sidechain amide protons from Asn/Gln residues. 1D projections of the 1H and 15N frequency dimensions are shown above and to the left of the 2D spectrum, respectively.

Chapter 4.  Speeding Up Heteronuclear NMR

4.1  Rationale

4.1.1  Spider Toxins and NMR

The two most prominent methods for protein structure determination are X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy64. X-ray crystallography is viewed as a more mature technique26; 65; 66. However unlike X-Ray crystallography, NMR spectroscopy does not require complex crystallisations64; 66, can be used on proteins in solution46; 50; 64; 69, and is better suited for molecules with molecular weight under ~30 kDa.

Peptide toxins from insect venoms are generally small, tightly conformed molecules less than 10kDa in size. of the three-dimensional structures of toxins in the Protein DataBank were obtained using Nuclear Magnetic Resonance (NMR) spectroscopy27. Spider toxins are no exception, with all but one published structure obtained via solution NMR (see Appendix A).

Structure determination with NMR spectroscopy uses resonance data to create a set of structural restraints. nucleus with spin s and gyromagnetic ratio γ will possess a magnetic moment μ=ħs, where ħ is the reduced Planck constant. placed inside a magnetic field B0, this moment will experience a torque and its angular momentum will process about B0 at its Larmor frequency ω= γBnet 67; 68. the same nuclear isotope, spins in different chemical environments will resonate at differing frequencies, due to locally induced magnetic fields contributing to Bnet at each nucleiʼs location70; 71. detector coil on an axis normal to B0 can measure the relaxation of the net magnetisation vector M = ∑ μi. contribution from each moment is a decaying sinusoid; the relaxation or free induction decay (FID) observed from the magnetisation vector is the superposition of this signals. this reason NMR spectra are usually viewed in Fourier space, with unique moments and resonance effects viewed as separate signal peaks.

4.1.2  Multidimensional NMR and Indirect Dimensions

For more complex molecules such as proteins, there is often too much signal overlap for a spectrum with only one frequency dimension, or a 1D spectrum, to be useful in isolation. frequency dimensions may be added by via additional time delays (evolution times) between the excitation of nuclei and data acquisition. set of n-dimensional data points can be obtained by using n different time series, corresponding to the evolution times for each dimension. a two-dimensional case, a standard 1D dataset for the first dimension is acquired for each evolution time of the second dimension, resulting in a spectrum with two frequency dimensions in Fourier space. first dimension is referred to as a direct dimension and the second as an indirect dimension, due to the methods by which data for each has been acquired8. indirect dimensions may be added by introducing another time series of evolution times, for all data points in lower dimensions41. allows for heteronuclear experiments to be conducted, with resonances for separate nuclei being recorded in each dimension.

4.1.3  Time Acquisition Problem, Toxin Structures and NUS

The Nyquist theorem holds that if a spectrum contains a maximum frequency fmax, then that spectrum should be sampled at a frequency greater or equal to twice fmax to avoid aliasing42. results in a need to frequently collect data for long periods43. a sample must be taken every tifmax -1, i ∈ [0, tmax] for each indirect dimension, the temporal cost for increasing the dimensionality of an NMR experiment increases exponentially with each additional indirect dimension44. In addition the accuracy with which a frequency component can be measured i.e. the resolution of the spectrum is related to tmax, which is the total length of the data record. And the total number of points (N) required to achieve a given resolution whilst sampling at the Nyquist rate is N = tmax / Δt, where Δt is 1/2 fmax -1 (also referred to as the dwell time).


Both the Nyquist theorem and the need to obtain data over large time intervals apply for each indirect dimension in multidimensional NMR45. , the temporal cost for increasing the dimensionality of an NMR experiment increases exponentially with each additional indirect dimensions44. Figure 4.1 shows a list of NMR experiments commonly used to collect through-bond restraint data for a peptide or protein (i.e., to assign all resonances in the structures), along with an estimate of the NMR time required for each experiment. experiments are heteronuclear, and all but one require two indirect dimensions and well over a day of NMR time. further two to three days would be required to collect through-space restraints with NOESY experiments. can see that over two weeks of NMR data collection is required to determine the structure of just one spider toxin. are potentially millions of spider toxins with undetermined structure (see Section 1.2). it is predicted that spider toxins are structurally based upon a limited number of folds57, at the least several hundred experimentally determined toxin structures would be required to map the structural scaffold of the venome.

Experiment t1   t2   R1   R2   scans   acqu+
rel.(s)
  NMR time
(day)
HSQC 512     × 2     × 8 × 1.1 = 0.1
CbCa(CO)NH 60 × 80 × 2 × 2 × 8 × 1.1 = 1.96
HNCaCb 60 × 80 × 2 × 2 × 16 × 1.1 = 3.91
HNCO 60 × 60 × 2 × 2 × 8 × 1.1 = 1.47
(H)CC(CO)NH 60 × 80 × 2 × 2 × 16 × 1.1 = 3.91
H(CC)(CO)NH 60 × 80 × 2 × 2 × 16 × 1.1 = 3.91

Figure 4.1: NMR time required to collect through-bond restraint data using FT-NMR and Nyquist sampling. All experiments listed are heteronuclear, with 1H resonances measured in the direct dimension; the first (HSQC) is 2D and collects 15N resonances in the first indirect dimension (t1), and the remainder are 3D with15N resonances in the first indirect dimension (t1), and 13C resonances in the second indirect dimension (t2). The first four experiments are required to assign resonances in the peptide backbone, while the HCCCONH experiments are for assigning side chain resonances. Here, t1 and t2 are the number of samples in each dimension required by the Nyquist theorem, R indicates a multiplicative factor accounting for the collection of hypercomplex (as opposed to real) data in each dimension, ‘scansʼ indicate the number of scans to be performed for the purpose of signal averaging, and ‘acqu. + rel.ʼ the acquisition and relaxation time required for each data point to be collected. Based upon these values, the rightmost column reveals the total NMR time required to perform these experiments; the total time estimated time to perform all experiments would be 15.26 days. (Note that NOSEY data is also required to produce an accurate 3D structure.)


Fortunately, conventional NMR experiments often result in highly redundant data sets being obtained. data tends to be sparse, meaning that there are few signals dispersed in a large volume; it is not uncommon for several orders of magnitude of data being collected than is strictly required for the required information to be extracted46. is due to the requirement of high resolution in all dimensions whilst conforming to the Nyquist condition, resulting in very large densely sampled data. New processing techniques have been developed that do not require data from indirect dimensions to be obtained at the Nyquist interval, but can instead utilise spectral data sampled non-uniformly in each indirect dimension42; 45; 47; 48. number of samples taken (M) is typically less than the total number prescribed by the Nyquist theorem (N) to achieve the desired resolution. -uniform sampling (NUS) allows sampling to be spread over a wide domain, but concentrated in regions that have high signal to noise or more likely to contain unique data43. -uniform sampling can allow for a subset of the data to be sampled at appropriate locations, meaning enough data to reconstruct an NMR spectrum can be collected in a fraction of the experimental time taken for DFT methods46-50.


There are several methods for reducing the temporal cost of heteronuclear NMR data collection using NUS. include G-matrix Fourier Transform-NMR (GFT-NMR)44; Three-Way Decomposition (TWD),also known as muti-dimensional NMR spectra interpretation (MUNIN)46; 51 ; Back-Projection Reconstrucion52, Maximum Likelihood Reconstruction53 and Maximum Entropy Reconstruction54.

4.1.4  Maximum Entropy Reconstruction (MaxEnt)

Perhaps the most prominent46 of the aforemention methods, MaxEnt is a reconstruction method that uses a least-information approach to rebuild a spectrum from a subset of the data required to produce a conventional spectrum. allows for accurate reconstructions with high sensitivity, which is able to produce usable spectra in situations where DFT produces spurious or unusable results49. algorithm effectively uses entropy as a smoothing function48, reducing the relative size of noise and artifact peaks while preserving the sharpness of signal peaks48-50. A Lagrange multiplier is used to balance producing spectra with the maximum possible entropy, and retaining fidelity with experimental data. A solution is obtained by maximising for Q(f) in the following function:

Q of f equals S of f minus lambda C of f, where C of f is less than or equal to C0

where f is the frequency set or spectrum being produced by the algorithm, λ is the Lagrange multiplier, S(f) is the entropy function and Cf provides the experimental constraint41. Here, C(f) produces a χ2 statistic by taking f into the time domain and comparing it to the set of data d collected during the NMR experiment:

C of f equals the sum, from i equals 1 to M, of the following: the Inverse Direct Fourier Transform of f sub i minus d sub i all squared

where IDFT is used for the inverse DFT49. A global maximum exists, due to the convex nature of the entropy function41; 49. C0 is often referred to as aim, and λ spelt out as lambda, when discussing computational implementations of MaxEnt; that convention will be followed here.


As one of the oldest NUS techniques, and perhaps the best studied, thus the peculiarities and potential problems with this method appear to be the best understood49. this reason, this project will use MaxEnt as the method for reconstructing spectra from NUS datasets.

(For more on the principles behind and algorithms for MaxEnt reconstruction, the following41; 49; 54 are recommended reading.)

4.1.5  NUS Pre-Selection Problem

While many of the techniques used for processing of non-uniformly sampled data differ greatly in terms of their underlying principles or implementations, it has been shown that similar reconstructions can be obtained with each if like NUS sampling schedules are employed48. This indicates that the sample size and distribution of sample points is more important than the technique selected. However there is presently no way in which to predict the minimum set of samples required to produce an accurate reconstruction, with any of the NUS-NMR techniques55.

4.2  Finding a Minimal M

Non-uniform sampling looks to be a promising way to speed up structure NMR studies for determining the three-dimensional structure of spider toxins. With reductions in data collection time directly related to the number of samples discarded from the full Nyquist grid in the indirect dimensions, the exact temporal gain is dependent on the sampling schedules used. NUS-NMR is already being employed for some NMR experiments on arachnid toxins. However the values of M are deliberately selected to be far larger than required as the limitations of the method are unknown, and the manner in which the method decays is also poorly understood72.

In order to harness the full potential of NUS-NMR for high throughput NMR studies of spider toxins, a solution for the pre-selection problem is required. This would allow the smallest possible dataset – resulting in the greatest possible temporal saving – to be collected which still results in a suitable NMR spectrum. In the absence of any reliable model for predicting this sample set, a comprehensive empirical study on the effects of varying the sampling schedule of a NUS-NMR experiment was performed. MaxEnt reconstructions of data from the same NMR experiment would be performed with varying sample sizes and distributions, followed by analysis of all the resulting spectra. NUS-NMR data collection was simulated by taking a full dataset, and discarding data not in a specified sampling schedule. It was hoped a means would be found of either predicting the minimal sample set prior to the NMR experiment, or (by monitoring data as it is collected) determining when sufficient samples have been obtained. Remarkably , no such study has thus far been published despite more than two decades of research and speculation on NUS-NMR.

4.2.1  Previous work: HSQC

A trial study had previously been undertaken out using data from a two-dimensional HSQC experiment72. An unpublished 41-residue toxin from an Australian arachnid, currently designated P1, was used as test subject (see Figure 4.2 for sequence). A two-dimensional 1H-15N-HSQC experiment, with one indirect dimension containing 400 points (fully sampled) was collected with a Bruker 900MHz ARX spectrometer. sampling schedules with M sample points on the Nyquist grid were created for selected M, 0 < M ≤ 400. For each schedule, mock NUS-NMR experiments were conducted by ‘samplingʼ from the full dataset. NUS data sets were reconstructed using MaxEnt as implemented in the Rowland NMR Toolkit (RNMRTK)73, and the number of HSQC signal peaks retained for each M was determined. The signal peaks from these spectra were then compared to those present in a conventional (DFT) spectrum produced from the full dataset. It was shown that all expected signal peaks (viz, all signal peaks in the full DFT spectrum) were present in reconstructed spectra, except where M was below a threshold value (see Figure 4.2). This value was ~40 sample points, or 10% of the Nyquist grid.


Similar results could be expected for HSQC experiments on other spider toxins, due to the nature of the HSQC experiment. HSQC spectra of proteins result in a predictable number of peaks within a well-defined area. One peak is observed per amino acid residue other than proline, with additional resonances for amide side chains (see Figures 3.4 and 3.5). The co-ordinate of each peak is within a well-known chemical shift range, dependent on amino acid residue type. Substituting P1 for another spider toxin of like size would affect the co-ordinates of signal peaks, but not significantly alter other properties of the spectrum. Thus the results of this test case could be deemed representative for all 15N-1H HSQC spectra with this peptide size (at the measured sample concentration) in that only ~40 data points are required to reconstruct a satisfactory spectrum. As a precaution against small variations from this minimum (due either to a physical property of a peptide, or a random factor such as noise) a minimum sample size of M=80 is recommended for HSQC experiments on toxins containing similar number of residues. This cautious doubling of the ‘thresholdʼ would still result in a temporal saving of ca. 2 h, cutting NMR time by 80%.

(a) Figure 4.2a
(b) Figure 4.2b

4.2: Results from previous systematic study on the effects of changing non-uniform sample size (M). Reconstructions were performed for various M, with the maximum being 400 (the full number of Nyquist regime samples available). In (a), contour plots of spectra reconstructed at M=10, M=40 and M=400 can be seen while (b) displays the number of signal peaks from the DFT spectrum found to have a matching signal peak in the reconstructed spectrum.

From (b) one can see that M≳40 was required for all signal peaks to be reconstructed; below this threshold value, the number of identified peaks drops sharply. The contour plots in (a) show how peak content is virtually unchanged between M=40 and M=400, and demonstrate the loss of signal peaks and dominance of artifacts for the M=10 spectrum.


4.2.2  Current Work: CbCa(CO)NH

While beneficial, the results from the above HSQC analysis cannot be directly applied to any of the other experiments required for resonance assignment (see Figure 4.1). Firstly the complexity, dimensionality and quality of spectra from the other experiments all differ from that of HSQC spectra, such that the temporal requirements are vastly different. However, it was predicted that a similar behaviour would be observed in terms of number of peaks found before and after a threshold M. Also, while data for the above experiment explored many different sample sizes, it did not investigate the distribution of sample points for each particular size; for more complicated spectra, the location as well as number of points may be key to finding a minimum sample set.

The work described in the remainder of this chapter is on data collected with a 3D CbCa(CO)NH NMR experiment. The CbCa(CO)NH spectrum is vital for the sequential assignment of backbone resonances, which in turn is required for the protein structure determination process (see Figure 4.1). This requires the collection of data in two indirect dimensions, one for 13C shifts and another for 15N (with 1H frequencies in the direct dimension). NUS can be employed in both the 13C and15N dimensions, potentially allowing for a far greater time saving than a two-dimensional experiment such as HSQC (see sections 4.1.2  and 4.1.3  above).

In many NMR experiments, the signal to noise ration (S/N) is higher for some data points than others76, which adds more complexity to the situation of which samples to acquire and whether to bias NUS for certain regions. However, this particular experiment employs a particular type of data acquisition mode known as constant time acquisition, where the total length of the decay period in the indirect dimensions is kept constant. Theoretically, this would result in a constant S/N value for all data points, allowing another unknown to be eliminated.

4.2.3  Determining Spectral Quality

4.2.3.1  Automatic Peak-Picking and Verification

The standard methods for evaluating the quality of reconstructed spectra are all based upon peak-picking. Peak-picking involves the identification of signal or real peaks within a spectrum, based upon the line shape, intensity or chemical shift co-ordinates of spectral features. the above experiment, quality was measured in terms of the number of signal peaks identified. The number and co-ordinates of peaks in the spectrum was extracted using an automated peak-picking tool. Then, the number of peaks retained in relation to a reference peak list (extracted using the full dataset) was used as an indication of spectral quality.

A similar approach was used with the CbCa(CO)NH analysis to reveal the quality of reconstructed spectra in terms of how much signal information was retained. The automated peak-picker employed, PEAKY, is one that operates within the software platform used (Rowland NMR Toolkit). The program performs lineshape analysis to reject noise spikes or baseline errors. However this approach may result in rejection of real peaks due to failures in the fitting procedure, where peaks are assumed to have a mixed Lorentz-Gaussian shape. While one would expect signal peaks to theoretically conform to this behaviour41 this is not always borne through in experiment as a result of deviations from homogeneity in the reference field B0 due to inadequate shimming or choice of RF pulses, and resolution enhancements75.

No automatic peak-picker has thus far been proven to select all signal peaks, without false positive or negatives, and the human eye is still seen by some as the best way of identifying peaks in spectra. However, manually finding peaks in all of the spectra created during the course of this work is not remotely feasible. the use of a reference list spurious peaks can be filtered out, and manual checking can be used to uncover consistent false negatives.

4.2.3.2  Quality factor

While peak-picking provides a useful measure of information content, it does not provide a measure of the inherent quality of a spectrum. A spectrum may contain most or all desired signal peaks, yet still be useless if one cannot distinguish them from noise or artifact peaks. Currently , there is no quantitative definition of spectral quality in the literature. For the purposes of this analysis, a quality factor (Qf) was defined to fit this role:

Q sub f is equal to the sum of the ten smallest signal amps, divided by the following: the sum of the ten smallest signal smps and the sum of the ten largest artifact amps

where amp is an abbreviation for amplitude. If either the peak or signal count is below 10 for a given spectrum, zero-intensity peaks of the appropriate type and quantity are introduced to make up the count. Qf is intended to quantify the relative dominance of signal peaks over artifact peaks when comparing two different spectra from similar experiments. As can be seen from the definition Qf tends towards unity as signals become more distinguishable from artifacts and background noise. If Qf ~ 0.5, then there exist signal peaks which cannot be distinguished from artifacts.

For the purposes of calculating Qf, artifacts were deemed to be any peaks identified by the peak-picker but not present in the reference list. The rationale behind this is that any artifact not mistakenly classified as a signal after lineshape analysis, could no longer be considered a spurious peak. remaining artifacts would be identifiable as such, and thus not considered to have an impact on spectral quality.


4.2.4  Monitoring MaxEnt parameters

Although these are good measures of spectral quality, they are not ideal if one wishes to analyse and determine the quality of a spectrum as it is being acquired. Techniques such as MaxEnt are compatible with incremental acquisition74, and one could theoretically peak-pick spectra created ‘on the flyʼ, but without an existing reference list the above measures cannot be used. Any alternative measure of quality that is based upon peak counts would instead require an a priori method of verifying peaks. However, the end-state values of MaxEnt parameters used to arrive at a reconstructed spectrum could provide an alternate real-time measure of quality. It was hypothesised that since entropy is a measure of information content, it may be a suitable and independent variable to monitor as a guide for when to terminate the experiment.

It was thought that the investigation of other reconstruction parameters might also reveal information on spectral quality. MaxEnt algorithm operates in one of two different modes — one in which aim is held constant during optimisation, allowing lambda to vary freely, and another where lambda is constrained an aim may vary. It has been proposed that the constant lambda mode results in better continuity in incrementally acquired spectra61. Performing reconstructions in both of these modes would allow one to monitor the values of lambda (with constant aim) and aim (with constant lambda) determined by the algorithm. As an alternative to manually specifying a value for either parameter, an approach (hereafter referred to as auto mode) has been described where in situ data is used to automatically set a value for lambda through inspection of the experimental data62.

It was decided that the values of the entropy (S), aim and lambda parameters would be monitored, to assess their suitability as monitors of spectral quality. It was hoped that a ‘quality critical pointʼ, or other indication that sufficient samples had been collected to produce a quality spectrum, would be revealed.

4.2.5  Experiment

The study was designed in a similar manner to that used in the 2D case for the HSQC experiment, in that a full 60×80 (N=4800) dataset was acquired from which subsets of size M were extracted. The variability in the absolute distribution of M was monitored by generating sampling schedules using different random seeds (4 seeds used). Values of M were taken at intervals of 50 from 4800 to 50 (resulting in 96 values of M). Spectra were then reconstructed from these datasets via MaxEnt, with three sets of initial parameters. The output parameters (aim, lambda and entropy (S)) were monitored for all datasets. In addition, the spectra were peak-picked and verified, and an inherent quality factor value (Qf) assigned to each spectrum. A schematic of the analysis system can be seen in Figure 4.3, and each step of the process is detailed below.

Figure 4.3

Figure 4.3: Workflow diagram for generation and analysis of P1 CbCa(CO)NH spectra.

4.2.5.1  Input Dataset (data)

P1 was again used as a study toxin. A Nyquist-sampled CbCa(CO)NH dataset was collected with a Bruker 900MHz spectrometer. 1200 points were collected in the 1H (direct) dimension, 60 in the 15N dimension and 80 in the 13C dimension. The maximum number of sample points, N, is 60×80=4800 (points are always collected in direct dimensions).

4.2.5.2  Random seed (RND seed)

The program sched3d (described here77) was selected as the means of generating sampling schedules. The program outputs a schedule containing a user-specified number of samples. The sample co-ordinates chosen are selected with the use of a pseudo-random number generator (PRNG), hence, changing the input seed (s) alters the spread of the output sample set. Further, this means that for a particular seed, a sampling schedule for M points would contain all samples specified in a schedule with M-1 points, allowing one to simulate incremental acquisition. Four three-digit seeds were obtained, in order to evaluate the effect of sample distribution. The seeds were themselves provided by an entropy-seeded PRNG (/dev/random from a Linux kernel).

4.2.5.3  Generating sampling schedules (NUS)

Non-uniform sampling schedules were produced using each of the random seed values. The samples were all on points specified by the 60×80 Nyquist grid. were created for each M = 50x, 1 ≤ x ≤ 96 .

4.2.5.4  Setting of MaxEnt parameters (MaxEnt)

The MaxEnt algorithm (see section 4.1.4  of this chapter) requires either aim or lambda to be user-specified. The aim parameter is related to the noise in the experimental data, and lambda determines the weighting between the entropy minimisation constraint and the requirement for the reconstructed spectrum to match the experimentally-collected data. Three sets of reconstructions were performed for each NUS schedule — one where aim was fixed to 100.0, one where lambda was kept constant at 0.5, and a third using auto. This method is quite convenient, removing the requirement to carefully set a value of aim or lambda. The value 100.0 was taken from the noise level in the collected dataset (determined by inspection). 0.5 was used for constant lambda experiments, as this was found to be the average value of lambda achieved during constant aim reconstructions.

4.2.5.5  Spectral Reconstructions (spectra)

Reconstructed spectra were obtained for each random seed, sample size and MaxEnt setting, resulting in 1152 different spectra being collected. The Rowland NMR Toolkit (RNMRTK)73 implementation of the MaxEnt algorithm was used for all reconstructions. While many spectra were examined by eye, it was readily apparent that manual inspection of all spectra and collected information could not be done in a timely and reliable manner. From here a set of automated tests were conducted to examine the quality of the reconstructed spectra (quality), and search for any trends in MaxEnt parameters for the reconstructions (param), as specified above.

4.2.6  Analysis of Spectral Quality

The quality of all spectra generated was assessed using the two aforementioned measures: counts of identified signal peaks, and the quality factor Qf. The former was seen to be a good measure of how much sought or required information (viz, signal peaks) was reconstructed by MaxEnt from the NUS data. Qf was seen as an effective way of evaluating the quality inherent in a spectrum, independent of the presence or absence of any expected information. Plots for spectra reconstructed in both constant aim and constant lambda mode can be seen in Figure 4.4 . Similar results were also obtained for auto, though were not included in the figure for redundancy reasons.

4.2.6.1  Information Quality (Peak Counting)

A peak-picker was used to create lists of real peaks in reconstructed spectra. This list was then compared with a standard peaklist, obtained from a conventional (DFT) CaCb(CO)NH spectrum produced with this dataset. Tolerances of ± 0.28 ppm for 1H, ± 0.55 ppm for 15N and ± 1.2 ppm for 13C chemical shift values were used to determine matches, with each tolerance being equal to five times the step size in its respective dimension. The greater the number standard peaks identified as real peaks in a reconstructed spectrum, the greater the information quality of that spectrum was perceived to be.

In a manner similar to that shown for HSQC experiments, the number of signal peaks reconstructed remained fairly constant for spectra recreated with more than a threshold number of data points. Below this value, the number of signal peaks found tended sharply to zero with decreasing M. This threshold value (hereafter referred to as M0) appears to be around M=150, varying by 50 depending on the random seed used. For these analyses, the range of available sample sizes was covered in increments of 50 samples; if a smaller step size Mi was used (such as Mi=1) this variance could potentially have been found to be smaller.

Examining the amino acid sequence of P1, one will notice that the maximum number of peaks identified is lower than the 71 one might expect. In fact, spectra with M greater than 400 were selected randomly and manually checker for peaks; there appear to be two weak signal peaks present in several reconstructions, but consistently not identified by the peak-picker. An exception to this occurs for reconstructions using s=123 when M is close to 800, where the peak is identified as such by the peak picker.

Figure 4.4a
(a) No. real peaks and Qf, constant aim
Figure 4.4b
(b) No. real peaks and Qf, constant lambda

Figure 4.4: Reconstruction fidelity (# r.p.) and inherent spectral quality (Qf) for (a) constant aim and (b) constant lambda. For each, information quality sharply rises with increasing sample size, then plateaus after a threshold value ~ M=150. Spectral quality also increases sharply up to this threshold, but then decays as information on weaker peaks and sampling artifacts are reconstructed.

The above results indicate that all signal information present in the fully-sampled dataset are retained when M=200, which is ~4% of the full dataset. As the duration of an NMR experiment is directly proportional to M, this is quite promising. The CbCa(CO)NH dataset analysed for this experiment took ca. 2 days to collect, and is identical to the full dataset that would be used to produce a conventional DFT spectrum. However, a spectrum with the same signal peaks could have been collected in 2 h with the use of NUS and MaxEnt.

4.2.6.2  Inherent Spectral Quality (Qf)

Informational quality is the single most important measure of quality, as it indicates how much of the information being sought (i.e., identifiable real peaks) is present in a reconstructed spectrum. However, a spectrum with all peaks present may still be unsuitable due to other problems with the spectrum, such as the presence of artifacts resembling real peaks. For this reason, Qf was examined in the context of its relationship with the count of identifiable real peaks.

From the perspective of increasing M, the inherent spectral quality was seen to spike at the same threshold M (on a random seed basis) where information quality begins to decrease rapidly. After this, Qf fluctuated about an exponential decay line whose asymptote was well above Qf = 0.5  . This may appear to contradict the information content definition of quality, but in fact complements it. Sectra begin to lose information and deviate from the DFT spectrum with decreasing M. When rebuilding spectra with less information, entropy is being effectively used as a smoothing function to suppress artifacts while preserving signal peaks (c.f. section 4.1  of this chapter). Further, as weaker signals begin disappearing, the average signal amplitude becomes relatively higher than the noise level, resulting in a better quality spectrum. Potentially, weak signals might even be ‘tradedʼ for higher inherent quality when selecting sample sets. As Qf was shown to increase rather than decrease around M0, it appears that usable spectra can be obtained from datasets of this size.

4.2.6.3  Noise and Spectral Quality

As a ‘reality checkʼ for both of the measures of quality discussed above, the change in noise and artifact levels relative to the amplitude of select signal peaks were investigated. The weakest peak in the spectrum was selected, along with a strong peak from a highly intense plane. 1- and 2-dimensional slices of the spectra about these peaks were qualitatively examined, by means of including them as frames in an animation. Both the strong and weak peaks are located next to more intense neighbour peaks, which were used as references; each plot was scaled by the intensity of the respective reference peak.

In both cases, the noise level remained well below the amplitude of the signal peaks for the densest three quarters of the sample space (25%-100% sampled). After this point the weak peak began to noticeably deteriorate with decreasing M, until near M=50 it was no longer distinguishable as a signal peak. As this peak was never located by the peak-picker, this loss would not have been shown as a decrease in inherent spectral or information quality in Figure 4.4. Conversely the strong peak was always classified as a peak by the peak-picker, and always distinguishable as a signal peak by remaining well above the noise level. In both cases the baselines were relatively smooth in the first two panels, but became more jagged due to noise in the last panel. These qualitative checks further confirm the general comments made about spectral quality, and in the case of the weak peak provide a further caution about the reliance on automatic peak-picking for measuring spectral quality.

4.2.7  MaxEnt parameters

The values of reconstruction parameters were found to vary with NUS sample size and reconstruction inputs, though not with random seed (sample spread). As spectral reconstructions are carried out on a plane-by-plane basis (13C-15N planes), the mean value over all plans is used for each reconstruction. The constant aim and lambda reconstructions both yielded similar results (see Figure 4.5). The error bars displayed on the plots are an unbiased estimate of the standard deviation of the parameter over all planes; it can be seen from the tightness of the error ranges that use of the mean is suitable.

a) Figure 4.5a
b) Figure 4.5b
c) Figure 4.5c

Figure 4.5: Analysis of MaxEnt reconstruction parameters, with (a) the change in entropy with M for constant aim and lambda experiments, together with (b) change in lambda in the constant aim experiment and (c) chang in aim in the constant lambda experiment. Results for all random seeds are displayed.The later two plots are best modelled as power functions, while the trends in (a) are close to linear (R2>0.987).

As can be seen from the plot, entropy has a clear relationship with M for both reconstruction modes. Unfortunately these pseudo-linear monotonic decreases do not appear to reveal any information which may assist with the NUS pre-selection problem. The R2correlation co-efficient for linear fits were greater than 0.987 in both cases, making it unlikely that S reveals any information. Both the values of aim for constant lambda mode and lambda for constant aim relate to M as perfect power decay functions with very high R2 values. Due to the lack of any expected distinguishing points on the plots, the aim and lambda did not seem to suggest that monitoring MaxEnt parameters would mark any sample sets as significantly different from any others.

The auto results were found to be largely unsuitable for these analyses. As (effectively) a constant lambda value is calculated and set for each individual reconstruction, the results were not self-consistent across values of M, with local trends appearing only where lambda was stable over several consecutive M. For these reasons, no plot of results for auto are included here.

It was hoped that analysis of reconstruction parameters would somehow reveal that a spectrum of suitable quality or information content had been collected, in a manner suitable for on-the-fly collection of NMR data. While analysis of the values of S, aim and lambda achieved by the algorithm at various sample sizes did not reveal any useful quality indicators in isolation, it is hoped that comparisons with informational and inherent spectral quality may yield some useful correlations. Work on making and interpreting MaxEnt parameter values with the quality measures discussed above is still in its infancy. While initial results indicate possible connections (see Figure 4.6), further research is required to draw definite conclusions. It is even possible that threshold values of the entropy, aim or lambda as opposed to trends could be used as indicators.

Figure 4.6

Figure 4.6: Plot of first derivate of lambda, number of real peaks identified vs M for a constant aim experiment. While no definite link between reconstruction parameter values and spectral quality has been found, further research is definitely warranted. This plot suggests a potential yet unproven link between lambda and the number of signal peaks in a reconstruction.

Further study of MaxEnt parameters as a means of monitoring quality is recommended. The constant lambda and constant aim modes of the MaxEnt algorithm would be better suited for these purposes, due to their self-consistency across sample sizes. However if an isolated reconstruction is to be performed, as would be the case if a method or empirical rule is found for predetermining which and how many samples to take, there is no reason not to use auto mode and save having to manually input reconstruction parameters. Indeed, there is no guarantee that the constant aim and lambda values chosen for this experiment are the ideal; further empirical research would be required to test their suitability.


Chapter 5.  Conclusions

With the production of ArachnoServer, a database of structural venomics-related information on spider toxins (see Chapter 2), an expression system for the production of natively-conformed, isotopically labelled toxins (see Chapter 3) and initial findings into methods of speeding up heteronuclear NMR studies of spider toxins (see Chapter 4), the groundwork has been laid for a pipeline to speed up structural venomics studies of spider toxins. Findings from all stages of the venome mapping process can be deposited in the database as they are collected, and used as the basis for further structural venomics studies. The bacterial expression system can be used produce suitable toxins at the required concentrations for NMR experiments. Also, a system has been developed for analysing the effects of NUS on structure-determination experiments. Initial results indicate that two NMR experiments may now be performed in a fraction of the time they would normally require, and an analysis system has been developed which could investigate whether similar savings may be achieved with other experiments.

5.1  Arachnoserver

ArachnoServer was created as a repository for spider toxin information suitable for structural venomics research. The database is already in use by research groups that were involved with or consulted during its production. ArachnoServer was also used to identify viable insecticide candidates for the purpose of testing the bacterial expression system (see Chapter 3).

Some improvements could be made for a future release of the database. While the web application facilitates the use of a sequence alignment tool (BLAST), the inclusion of online structural alignment tools would also be beneficial. The low number of known spider toxin structures results in their presently being no great demand for structural comparisons of spider toxins. However, as it is expected that use of the pipeline could potentially increase the number of solved structured by a significant amount, such a feature could become increasing useful over time. Further, the ability to store experimental details such as NUS sampling schedules, pulse sequences or even raw NMR datasets would be of use, and could assist with acquiring NMR data for similar toxins.

5.2  Bacterial Expression System

With the development of the new expression system, one of the greatest obstacles to rapid mapping of the spider venome may now have been removed. It was shown in Chapter 3 that the expression system could successfully produce conformed, isotopically labelled toxins in quantities suitable for NMR structure determination experiments. Now that the viability of the expression system has been demonstrated, work is ongoing to produce other toxins and make any necessary refinements to the procedure used. With this widening of the “greatest remaining bottleneck” 27in structural venomics research on spider toxins, the largest remaining challenge is to speed up structure determination itself.

5.3  Speeding Up Heteronuclear NMR

Advances have also been made in using non-uniform sampling to speed up NMR experiments on spider toxins. Chapter 4 describes a framework created to systematically analyse the effects of varying NUS sample sets on various measures of spectral quality. This framework was used to analyse a CbCa(CO)NH experiment, an NMR study highly important for 3D structure determination of peptides. The results indicated that only ~4% of the full dataset specified by the Nyquist theorem was actually required to obtain a spectrum containing all available signal information, resulting in a time reduction of ca. 46 h. However, as the number of reconstructed signal peaks and inherent quality of the spectrum both decrease rapidly for sample sizes below this threshold, and value of this threshold may vary slightly with the spread of the sample set, one may wish to collect some additional samples as a ‘buffer’. With reasoning similar to that used to apply the results of the HSQC experiment on P1to other peptides (see section 4.2.1  of Chapter 4), one could expect similar results for other CbCa(CO)NH experiments on spider toxins. Conservatively, it is proposed that collecting 10% of the available sample points – or ~1.5× the threshold value – should be sufficient for procuring similar spectra. Now that a target sample size has been identified, it remains to trial this with other toxins and confirm that all available real peaks are reconstructed.


Further, the analysis pipeline that was developed may now be used to check for similar results with other NMR experiments. If threshold sample sizes can be determined for each, this could greatly improve the speed at which NMR structures for spider toxins can be determined. Additionally , if a link between MaxEnt parameters such as aim or lambda and signal content can be found, it could allow the use of incremental acquisition to only obtain the data required for structure determination.

5.4  Structural Venomics Pipeline

With the completion of the above components, a pipeline for speeding up structural venomics studies of spider toxins can be developed. Information on spider toxins can now be deposited into a central repository, ArachnoServer, allowing other researchers to easily locate candidate toxins. Mature sequence information can then be obtained from the database, used to design synthetic genes, and cloned into plasmid vectors designed for use with the bacterial expression system described above. The expression system could then be used to produce conformed, isotopically labelled toxins in sufficient quantities for structure determination with NMR. Finally, results from the NMR analysis system could be applied to significantly decrease NMR experiment time.

Given the need for new, effective insecticides, the rate at which the spider venome is being mapped needs to be increased significantly. Use of a pipeline similar to that developed for this project could considerably decrease the time required for structural venomics studies. While it was designed with insecticidal spider toxins in mind, it could be adapted for several other types of animal toxins. Improvements could be made to the database and expression system, and more analysis on is required for several types of NMR experiments before the pre-selection problem can be addressed in their case. In spite of this, the pipeline in its present form is already of value for the streamling structural venomics studies. With further research and refinement, it has the potential to play a significant role in speeding the coming structural venomics revolution.

References

1. Rosegrant, M. W. & Cline, S. A. (2003). Global Food Security: Challenges and Policies, Science 302 : 1917-1919.

2. Schmidhuber, J. & Tubiello, F. (2007). Global food security under climate change, Proceedings of the National Academy of Sciences 104 : 19703.

3. Ramankutty, N.; Foley, J. & Olejniczak, N. (2008). 3. In: Braimoh, A. K. & Vlek, P. L. G. (Ed.), Land-Use Change and Global Food Production, Springer Netherlands.

4. Brown, M. E. & Funk, C. C. (2008). Climate: Food Security Under Climate Change, Science 319 : 580-581.

5. FAO (2009). The State of Food Insecurity in the World, ˙

6. Roberts, D. R. & Andre, R. G. (1994). Insecticide resistance issues in vector-borne disease control, The American Journal of Tropical Medicine and Hygiene 50 : 21.

7. Casida, J. & Quistad, G. (1998). Golden age of insecticide research: past, present, or future?, Annual review of entomology 43 : 1-16.

8. Zaim, M. & Guillet, P. (2002). Alternative insecticides: an urgent need, Trends in parasitology 18 : 161-163.

9. Foley, J. A.; DeFries, R.; Asner, G. P.; Barford, C.; Bonan, G.; Carpetner, S. R.; Chapin, F. S.; Coe, M. T.; Daily, G. C.; Gibbs, H. K.; Helkowski, J. H.; Holloway, T.; Howard, E. A.; Kucharik, C. J.; Monfreda, C.; Patz, J. A.; Prentice, I. C.; Ramankutty, N. & Snyder, P. K. (2005). Global Consequences of Land Use, Science 309 : 570-574.

10. Rogers, D. J. & Randolph, S. E. (2000). The Global Spread of Malaria in a Future, Warmer World, Science 289 : 1763-1766.

11. Martens, P. & Hall, L. (2000). Malaria on the move, Emerging Infectious Diseases 6 : 7-13.

12. Gubler, D. J. (2002). Epidemic dengue/dengue hemorrhagic fever as a public health, social and economic problem in the 21st century, Trends in Microbiology 10 : 100 - 103.

13. Hume, J.; Lyons, E. & Day, K. (2003). Human migration, mosquitoes and the evolution of Plasmodium falciparum, Trends in parasitology 19 : 144-149.

14. Abu-Raddad, L. J.; Patnaik, P. & Kublin, J. G. (2006). Dual Infection with HIV and Malaria Fuels the Spread of Both Diseases in Sub-Saharan Africa, Science 314 : 1603-1606.

15. Cooper, J. & Dobson, H. (2007). The benefits of pesticides to mankind and the environment, Crop Protection 26 : 1337 - 1348.

16. Hemingway, J. & Ranson, H. (2000). Insecticide Resistance in Insect Vectors of Human Disease, Annual Review of Entomology 45 : 371-391.

17. Nauen, R. (2007). Insecticide resistance in disease vectors of public health importance, Pest management science 63 : 628-633.

18. Walther, B. & Walther, M. (2007). What does it take to control malaria?, Annals of Tropical Medicine and Parasitology 101 : 657-672.

19. Kasai, S.; Shono, T.; Komagata, O.; Tsuda, Y.; Kobayashi, M.; Motoki, M.; Kashima, I.; Tanikawa, T.; Yoshida, M.; Tanaka, I. & others (2007). Insecticide resistance in potential vector mosquitoes for West Nile virus in Japan, Journal of Medical Entomology 44 : 822-829.

20. Cantrell, C.; Klun, J.; Pridgeon, Y.; Becnel, J.; Green Iii, S. & Fronczek, F. (2009). Structure-Activity Relationship Studies on the Mosquito Toxicity and Biting Deterrency of Callicarpenal Derivatives, Chemistry and Biodiversity 6 : 447-458.

21. King, G. (2004). The wonderful world of spiders: preface to the special Toxicon issue on spider venoms, Toxicon 43 : 471-475.

22. Penney, D. & Selden, P. (2007). Spinning with the dinosaurs: the fossil record of spiders, Geology Today 23 : 231.

23. Goldfrank, L.; Flomenbaum, N.; Nelson, L.; Howland, M. & Hoffman, R. (2002). Goldfranks toxicologic emergencies, ˙

24. Escoubas, P.; Diochot, S. & Corzo, G. (2000). Structure and pharmacology of spider venom neurotoxins, Biochimie 82 : 839-907.

25. Wood, D.; Miljenović, T.; Cai, S.; Raven, R.; Kaas, Q.; Escoubas, P.; Herzig, V.; Wilson, D. & King, G. (2009). ArachnoServer: a database of protein toxins from spiders, BMC genomics 10 : 375.

26. Ménez, A.; Stöcklin, R. R. & Mebs, D. (2006). ‘Venomicsʼ or: The venomous systems genome project, Toxicon 47 : 255 - 259.

27. Escoubas, P. & King, G. (2009). Venomics as a drug discovery platform, Expert Rev. Proteomics 6 : 221-224.

28. Tedford, H. W.; Sollod, B. K.; Maggio, F. & King, G. F. (2004). Australian funnel-web spiders: master insecticide chemists, Toxicon 43 : 601-618.

29. Fitches, E.; Edwards, M. G.; Mee, C.; Grishin, E.; Gatehouse, A. M. R.; Edwards, J. P. & Gatehouse, J. A. (2004). Fusion proteins containing insect-specific toxins as pest control agents: snowdrop lectin delivers fused insecticidal spider venom toxin to insect haemolymph following oral ingestion, Journal of Insect Physiology 50 : 61 - 71.

30. Khan, S.; Zafar, Y.; Briddon, R.; Malik, K. & Mukhtar, Z. (2006). Spider venom toxin protects plants from insect attack, Transgenic research 15 : 349-357.

31. Platnick, N. I. (Accessed 15 October 2009). World Spider Catalog v9.5, ˙

32. Escoubas, P. & Rash, L. (2004). Tarantulas: eight-legged pharmacists and combinatorial chemists, Toxicon 43 : 555-574.

33. Berman, H.; Battistuz, T.; Bhat, T.; Bluhm, W.; Bourne, P.; Burkhardt, K.; Feng, Z.; Gilliland, G.; Iype, L.; Jain, S. & others (2002). The protein data bank, Acta Crystallographica Section D: Biological Crystallography 58 : 899-907.

34. King, G. F.; Gentz, M. C.; Escoubas, P. & Nicholson, G. M. (2008). A rational nomenclature for naming peptide toxins from spiders and other venomous animals, Toxicon 52 : 264 - 276.

35. Grishin, E. (2001). Polypeptide neurotoxins from spider venoms, European Journal of Biochemistry 264 : 276-280.

36. Lewis, R. & Garcia, M. (2003). Therapeutic potential of venom peptides, Nature Reviews Drug Discovery 2 : 790-802.

37. Bode, F.; Sachs, F. & Franz, M. (2000). Tarantula peptide inhibits atrial fibrillation, Circulation 101 : 2200-2205.

38. Nunes, K.; Costa-Gonalves, A.; Lanza, L.; Cortes, S.; Cordeiro, M.; Richardson, M.; Pimenta, A.; Webb, R.; Leite, R. & Lima, M. D. (2008). Tx2-6 toxin of the Phoneutria nigriventer spider potentiates rat erectile function, Toxicon 51 : 1197 - 1206.

39. Mazzuca, M.; Heurteaux, C.; Alloui, A.; Diochot, S.; Baron, A. & Voilley, N. (2007). A tarantula peptide against pain via ASIC1a channels and opioid mechanisms., Nature neuroscience 10 : 943-5.

40. Chong, Y.; Hayes, J. L.; Sollod, B.; Wen, S.; Wilson, D. T.; Hains, P. G.; Hodgson, W. C.; Broady, K. W.; King, G. F. & Nicholson, G. M. (2007). The ω-atracotoxins: Selective blockers of insect M-LVA and HVA calcium channels, Biochemical Pharmacology 74 : 623 - 638.

41. Hoch, J. C. & Stern, A. S., 1996. NMR Data Processing. Wiley-Liss, Inc., 605 Third Avenue, New York, NY 10158-0012.

42. Maciejewski, M.; Qui, H.; Rujan, I.; Mobli, M. & Hoch, J. (2009). Nonuniform sampling and spectral aliasing, Journal of Magnetic Resonance 199 : 88-93.

43. Marion, D. (2005). Fast acquisition of NMR spectra using Fourier transform of non-equispaced data, Journal of Biomolecular NMR 32 : 141-150.

44. Kim, S. & Szyperski, T. (2003). GFT NMR, a new approach to rapidly obtain precise high-dimensional NMR spectral information, J. Am. Chem. Soc 125 : 1385-1393.

45. Maciejewski, M.; Stern, A.; King, G. & Hoch, J. (2006). 2. In: Ed.), Nonuniform Sampling in Biomolecular NMR, Springer.

46. Orekhov, V.; Ibraghimov, I. & Billeter, M. (2003). Optimizing resolution in multidimensional NMR by three-way decomposition, Journal of Biomolecular NMR 27 : 165-173.

47. Rovnyak, D.; Frueh, D.; Sastry, M.; Sun, Z.; Stern, A.; Hoch, J. & Wagner, G. (2004). Accelerated acquisition of high resolution triple-resonance spectra using non-uniform sampling and maximum entropy reconstruction, Journal of Magnetic Resonance 170 : 15-21.

48. Mobli, M. & Hoch, J. (2008). Maximum entropy spectral reconstruction of nonuniformly sampled data, Concepts in Magnetic Resonance Part A 32 : 436-448.

49. Schmieder, P.; Stern, A.; Wagner, G. & Hoch, J. (1997). Quantification of Maximum-Entropy Spectrum Reconstructions, Journal of Magnetic Resonance 125 : 332-339.

50. Stern, A.; Li, K. & Hoch, J. (2002). Modern Spectrum Analysis in Multidimensional NMR Spectroscopy: Comparison of Linear-Prediction Extrapolation and Maximum-Entropy Reconstruction, American Chemical Society 124 : 1982-1993.

51. Orekhov, V.; Ibraghimov, I. & Billeter, M. (2001). MUNIN: A new approach to multi-dimensional NMR spectra interpretation, Journal of Biomolecular NMR 20 : 49-60.

52. Stillman, A.; Levin, D.; Lauterbui, P.; Marr, R. & Yang, D. (1986). Back projection reconstruction of spectroscopic NMR images from incomplete sets of projections, Journal of Magnetic Resonance 69 : 168-175.

53. Chylla, R. & Markley, J. (1995). Theory and application of the maximum likelihood principle to NMR parameter estimation of multidimensional NMR data, Journal of Biomolecular NMR 5 : 245-258.

54. Grant, A. I. & Packer, K. J. (1989). Enhanced Information Recovery from Spectroscopic Data Using MaxEnt, Maximum entropy and Bayesian methods, Cambridge, England, 1988 : 251-9.

55. Weber, D.; Gittis, A.; Mullen, G.; Abeygunawardana, C.; Lattman, E. & Mildvan, A. (2004). NMR docking of a substrate into the X-ray structure of staphylococcal nuclease, Proteins: Structure, Function, and Bioinformatics 13 : 275-287.

56. Mobli, M.; Stern, A. & Hoch, J. (2006). Spectral reconstruction methods in fast NMR: reduced dimensionality, random sampling and maximum entropy, Journal of Magnetic Resonance 182 : 96-287.

57. King,G.F.; Gentz, M.C.; Escoubas, P.; ;Nicholson, G.M. (2008) A rational nomenclature for naming peptide toxins from spiders and other venomous animals. Toxicon52, 264–276.

58. Maggio, F. and King, G.F. (2002) Scanning mutagenesis of a Janus-faced atracotoxin reveals a bipartite surface patch that is essential for neurotoxic function. J. Biol. Chem. 277, 22806–22813.

59. Jensen, J.E.; Durek, T.; Alewood, P.F.; Adams, D.J.; King, G.F.; and Rash, L.D. (2009) Chemical synthesis and folding of APETx2, a potent and selective blocker of acid sensing ion channel 3. Toxicon 54, 56–61.

60. foreignlanguageEscoubas, P.; Bernard, C.; Lambeau, G.; Lazdunski, M.; and Darbon, H. (2003) Recombinant production and solution structure of PcTx1, the specific peptide inhibitor of ASIC1a proton-gated cation channels. Protein Sci. 7, 1332–1343

61. Schmieder, P.; Stern, A. S.; Wagner, G.; Hoch, J. C. J. Magn. Reson. (1997), 125, 332-339.

62.
#1653#>Mobli, M.; Maciejewski, M. W.; Gryk, M. R.; Hoch, J. C. Nature Methods (2007), 4, 467-468.

63. Alexander SPH; Mathie A, Peters JA: Guide to receptors and channels (GRAC), (2008).
Br J Pharmacol 3rd edition. 2008, 153(Suppl 2):S1-S209.

64. Drenth, J.: Principles of Protein X-Ray Crystallography Springer Verlag, (1999).

65. Baker, D. & Sali, A. textitProtein Structure Prediction and Structural Genomics (2001). Science, 294, 93-96.

66. Edwards, A.; Arrowsmith, C.; Christendat, D.; Dharamsi, A.; Friesen, J.; Greenblatt, J. & Vedadi, M. Protein production: feeding the crystallographers and NMR spectroscopists (2000). Nature Structural Biology, Nature America Inc,2000, 47, 970-972

67. Abragam, A. Principles of Nuclear Magnetism (1961).Oxford Science Publications.

68. Keeler, J (2005). Understanding NMR Spectroscopy, Wiley.

69. Mobli, M.; Maciejewski, M.; Gryk, M. & Hoch, J (2007). Automatic maximum entropy spectral reconstruction in NMR, Journal of Biomolecular NMR, Springer, 39, 133-139.

70. Pfeffer, J. I. & Nir, S (2000). Modern Physics: An Introductory Text. Imperial College Press.

71. Que, L (2000). Physical Methods in Bioinorganic Chemistry: Spectroscopy and Magnetism. University Science Books.

72. Mobli, M.; Bermel, W.; Miljenovic, T.; Pierens, G. & King, G (7-11-2008). ASAP-NMR: automated processing and analysis of non-uniformly sampled NMR data, Proceedings of the Australian and New Zealand Society for Magnetic Resonance (ANZMAG), Couran Cove, Queensland, Australia.

73. Hoch, J. C. & Stern, A. S. (2005). textitRNMR Toolkit, Version 3.

74. Jaravine, V. & Orekhov, V (2006).Targeted acquisition for real-time NMR spectroscopy. J. Am. Chem. Soc, 128 : 13421-13426

75. Laatikainen, R.; Niemitz, M.; Malaisse, W.; Biesemans, M. & Willem, R (1996). A Computational Strategy for the Deconvolution of NMR Spectra with Multiplet Structures and Constraints: Analysis of Overlapping 13C- 2H multiplets of 13C Enriched Metabolites from Cell Suspensions Incubated in Deuterated Media, Magnetic Resonance in Medicine, New York: Academic Press36: 359-365

76. Rovnyak, D.; Hoch, J.; Stern, A. & Wagner, G (2004). Resolution and sensitivity of high field nuclear magnetic resonance spectroscopy, Journal of Biomolecular NMR, Springer, 30, 1-10.

77. Saez, N. J.; Mobli, M. & King, G. F (2009). ESI (In press).

Appendix A.  Spider Toxin Structures by Discovery Year

A table of all proteinaceous spider toxins with known three-dimensional structure is shown below. Structures not determined by experiment were rejected from this list. Listed is the PDB accession code for the earliest experimentally-determined structure for each toxin, its deposition year and the method used to solve the structure. The publication year was substituted for the deposition year, if earlier. Solution NMR was the method used for all structures (both listed accessions and later structures) for each of the toxins, excluding Shingomyelinase D. This 285-residue phosphodiesterase toxin has two known structures (1XX1 and 2F9R), both of which were solved using X-Ray crystallography.


Table A.1  Experimentally-Determined Spider Toxin Structures (earliest deposition)
Table A.1