Non-synonymous genetic variation in exonic regions of canine Toll-like receptors

Background Toll-like receptors (TLRs) are pattern recognition receptors (PRRs) considered to be the primary sensors of pathogens in innate immunity. Genetic variants could be associated to differences in breed innate immune response to pathogens and thus to susceptibility to infections or autoimmune diseases. There is therefore great interest in the characterization of canine TLRs. Results Polymorphisms in canine TLRs have been characterized by massive sequencing after enrichment of their exonic regions. DNAs from 335 dogs (seven different breeds) and 100 wolves (two different populations) were used in pools. The ratio of SNP discovery was 76.5% (in relation to CanFam 3.1); 155 out of 204 variants identified were new. Functional annotation identified 64 non-synonymous variants (43 new), 73 synonymous variants (56 new) and 67 modifier variants (57 new). 12 out of 64 non-synonymous variants are breed or wolf specific. TLR5 has been found to be the most polymorphic among canine TLRs. Finally, a TaqMan OpenArray® plate containing 64 SNPs with a possible functional effect in the protein (4 frameshifts and 60 non-synonymous codons) has been designed and validated. Conclusions Non-synonymous genetic variation has been characterized in exonic regions of canine Toll-like Receptors. The TaqMan OpenArray® plate developed to capture the individual variability that affects protein function will allow high-throughput genotyping either to study association to infection susceptibility or even TLR evolution in the canine genome. Electronic supplementary material The online version of this article (doi:10.1186/2052-6687-1-11) contains supplementary material, which is available to authorized users.

Toll-like receptors (TLRs) are pattern recognition receptors (PRRs) and are the primary sensors of pathogens in the body. Genetic variants could be associated with differences in breed response to pathogens and also to susceptibility to infections and/or autoimmune diseases. There is great interest in the characterization of canine TLRs.
Genetic variation in canine TLRs has been characterized using massive parallel sequencing. DNA from 335 dogs (seven breeds: Beagle, German Shepherd dog, Yorkshire terrier, French bulldog, Boxer, Labrador and Shar Pei) plus 100 wolves (two populations: Iberian and Russian) were sequenced in 16 pools of 25 dogs or 50 wolves. In total, we found 204 variants, of which 155 were new. Comparison of these variants with the published dog genome sequence (called CanFam 3.1) Functional annotation identified 64 non-synonymous variants (43 new), 73 synonymous variants (56 new) and 67 modifier variants (57 new). Twelve of 64 non-synonymous variants were breed or wolf specific. TLR5 has been found to be the most polymorphic among canine TLRs. Finally, a TaqMan OpenArray(R) plate containing 64 SNPs with a possible functional effect in the protein (4 frameshifts and 60 non-synonymous codons) has been designed and validated.
Non-synonymous genetic variation has been characterized in exonic regions of canine Toll-like Receptors. The TaqMan OpenArray(R) plate developed to capture the individual variability that affects protein function will allow high-throughput genotyping either to study association to infection susceptibility or even TLR evolution in the canine genome.

Background
Toll-like receptors (TLRs) are the most widely studied pattern recognition receptors (PRRs) and are considered to be the primary sensors of pathogens in innate immunity. These molecules are constituted by leucine-rich repeat (LRR) domains, a unique intramembrane domain and a Toll/Interleukin-1 receptor (TIR) domain. Pathogenassociated molecular Patterns (PAMPs) are sensed through LRR domain, and signals are transduced through TIR domain, which is always located in the cytoplasm, in order to activate innate immunity response (for a review, see [1]).
Ten TLRs have been identified in dogs. TLRs can be classified into groups, depending on the PAMPs detected and their cellular location. TLR 1, 2, 4, 5 and 6 detect pathogen extracellular components. TLRs 3, 7, 8 and 9 target nucleic acids. The ligand for TLR10 is unknown [2].
Another way to classify TLRs is their cellular location. TLRs 1, 5, 6 and 10 are expressed at the cell surface and mainly recognize bacterial products. On the other hand, TLRs 3, 7, 8 and 9 are located almost exclusively in intracellular compartments and are specialized in recognition of nucleic acids, with self versus non-self discrimination provided by the exclusive localization of the ligands rather than their different molecular structure from that of the host. TLRs 2 and 4 can be located both on the cell surface and intracellular [2,3]. In this study, TLRs will be divided in two groups: TLRs 1, 2, 4, 5, 6 and 10 as extracellular TLRs and TLRs 3, 7, 8 and 9 as intracellular TLRs and nucleic acid sensors.
TLRs are conserved through evolution, from Drosophila to mammals (reviewed at [4]), because of its essential role in innate immunity. However, there are significant distinctions between intracellular and extracellular TLRs. Intracellular TLRs do not accept much variability, because they have evolved under strong purifying selection [5]. Viruses can only be detected through their nucleic acids; therefore intracellular TLRs have an essential non-redundant role in host survival. Moreover, mutations in those TLRs could end up with an autoimmune disease against own nucleic acids or with high susceptibility to some viral infections. On the other hand, membrane or extracellular TLRs have evolved under less evolutionary pressure, due to they can recognize one pathogen through different PAMPs (immunological redundancy). So they show a higher rate of damaging non-synonymous and STOP mutations.
Although infective pressure that has reached these molecules is one of the main mechanisms of evolution, it is not the only one. Non-adaptative evolution has also an important role, through genetic drift, bottlenecks and migratory routes [6]. This kind of evolution should be seen in dogs, due to a first bottleneck with domestication and a second one for the artificial selection of breeds [7]. For these reason it should be taken into account the need for dealing with different breeds, and even with the wolf, for the analysis of canine TLR polymorphism.
In humans, many studies are addressed to find out possible links between some TLR polymorphism and susceptibility or resistance to disease (for a review see [6]). Some genetic variants in TLRs in dogs could be associated to differences in breed innate immune response to pathogens and thus to susceptibility to infections or autoimmune diseases. So far, polymorphisms in TLR4 and TLR5 have been associated with Inflammatory Bowel disease (IBD) in German Shepherd dogs (GSD) [8], but only protective SNPs from TLR5 have been associated with IBD in other 38 dog breeds [9] There is therefore great interest in the characterization of canine TLRs. TLR5 risk-associated haplotype for canine IBD confers hyper-responsiveness to flagellin [10]. Moreover, dogs with spontaneous IBD exhibit alterations in the enteric microbiota, which bear resemblance to dysbiosis reported in humans with chronic intestinal inflammation [11].
So our aim is the analysis of genetic variation in exonic regions of canine TLRs by massive sequencing, focusing in non-synonymous substitutions and their segregation in different dog breeds and wolf populations. A second objective is to design and validate a TaqMan OpenArray® plate of SNPs with a possible functional effect in the protein (STOP, frameshift and non-synonymous codons). High-throughput genotyping of canine TLRs with this TaqMan OpenArray® plate will allow studying the association of non-synonymous variants with individual differences in immune response, their relationship with either the commensal or the disease associated microbiota and TLR evolution in the canine genome.

Results
We have identified 156 new variants in canine TLRs by massive sequencing after the enrichment of exonic regions. DNAs from 335 dogs (seven breeds) and 100 wolves (two populations) were pooled in 16 pools and sequenced in 2 lanes of Illumina Hiseq, with a mean coverage value of 15,162.23. Dog breeds included were Beagle, Labrador, German Shepherd, Yorkshire, French Bulldog, Boxer and Shar Pei. Wolves included were Iberian (Canis lupus signatus) and Russian (European grey wolf, Canis lupus lupus). A total of 204 variants were detected: 193 SNP and 11 insertions or deletions (1 to 18 bases). Only one of the indels (insertion/deletion) mapped to an exonic region (TLR7 3′ UTR), meanwhile the others were mapping to intronic regions (5 out of 11) and intergenic regions upstream or downstream a TLR gene (5 out of 11). The SNPs identified were classified by functional annotation from ENSEMBL [20] (effect and effect impact): 73 synonymous variants, 64 non-synonymous variants and 67 modifier variants which include intergenic (upstream and downstream a TLR gene), intronic and 3′ UTR (untranslated region) variants (see Table 1). None of the variants detected in the pools analyzed had a high effect (STOP codon, frameshift mutation or splicing) on the protein function. The ratio of SNP discovery was 76.5% (in relation to CanFam 3.1); 156 out of 204 variants identified were new: 43/64 non-synonymous variants (nsSNP), 56/73 synonymous variants (synSNP) and 57/67 modifier variants.
Genetic variation differs among all TLRs. Variants detected in either extracellular or intracellular canine TLRs by massive sequencing and its classification according their effect in the protein are shown in Table 1. TLR5 gene presents the highest polymorphism, with 28 synonymous changes and 23 non-synonymous changes (Additional file 1, Table 1), although it also codifies for the longest annotated protein (1422 aminoacids). Table 2 shows the aminoacid (AA) change ratio, which are AA changes caused by nsSNPs or frameshift mutations divided by total number AA for each one of the TLRs. The AA change ratio confirms that indeed TLR5 and TLR4 are the most polymorphic ones. On the other hand, TLR3 seem to be the most conserved receptor, just presenting one AA change in 905 AA.

Non-synonymous SNPs
A more exhaustive analysis was performed for the 64 nsSNP detected through massive sequencing, because they are expected to have a greater effect on the protein function. First, a glimpse on allelic frequencies of the nsSNP was performed. The frequencies of the alternative allele for all 64 nsSNPs are shown for each breed and wolf pools in Additional file 2.
Allelic frequencies for alternative variants in nsSNPs differ among breeds. Beagle and Russian wolf are the most variable pools, with 35 out of 64 nsSNPs segregating. Some of the variants identified are breed-specific (8 out of 64) or wolf-specific (4 out of 64). Most of the breed specific variants are found in TLR5 and TLR4, which as seen before, are the most polymorphic TLRs. German Shepherd dogs (GSD) and wolf share 3 nsSNPs, all located in TLR4. The same happens with Shar Pei and wolf, they share 3 nsSNPs located in TLR2, TLR5 and TLR6.
SNPs with a MAF (Minor allele frequency) <0.05 have been considered to be fixed in the cohort [21]. Usually it is the reference allele the one which is fixed, but in some cases (perhaps due to bad annotation of the SNP) is the alternative one. Iberian wolves' cohort is the one with more fixed variants, with only 24 out of 64 that are segregating, followed by Yorkshire and Boxer, with 25 out of 64 segregating variants.

Predicted impact of canine TLRs amino acid substitutions
Polyphen-2 [22,23], SIFT [24,25], and PROVEAN [26,27] tools were used in order to predict the effect of each nsSNP in the protein structure. Each of these tools uses a different algorithm to predict the consequence of the aminoacid change on the protein and classifies it as benign/ tolerated/neutral or damaging/affect protein function/ deleterious (for more detail, see Methods). 28 out of 64 nsSNPs were predicted to have an effect on the protein structure by at least one of the tools used (Table 3). When frequency of the alternative variant was high for all the cohorts tested, the alternative allele was exchanged with the reference allele in ENSEMBL sequences [20] in order to perform the Polyphen-2 analysis with the less frequent variant as the "alternative variant". Therefore, SNPs with frequencies greater than 0.25 for the alternative allele were tested also for the annotated reference allele. Then, 3 more SNPs were predicted to affect the protein structure (indicated as reference on the column dbSNP ID in Table 3). When considering also these ones, 31 out of 64 nsSNPs (48%) were predicted to have an impact on the protein structure. Results from Polyphen-2, SIFT and PROVEAN were convergent in predicting damaging effects for 8 out of 31 nsSNPs (27%). On the other hand, 6 out of 64 nsSNPs were not correctly predicted, giving unknown or low confidence results, because they were not aligning to enough similar sequences to give a reliable result. Curiously most of this nsSNPs were located on the N-terminal region of TLR5. Protein structure of the canine TLRs was assessed using SMART [28,29], which predicts domains taking into account aminoacid sequences: 6 out of 31 nsSNPs predicted to be damaging in canine TLRs were found to be in a Leucine Rich Repeat C-terminal (LRRCT) or really close to it. Only 1 out of 31 was found to affect TIR domain in TLR 5, other 2 were found to be really close to this domain in TLR5 and 10. With the exception of these last ones, nsSNPs were in most cases located in the sensor domain of TLRs (Table 3).
Frequencies in Table 3 are an average of all dog pools tested and both wolf populations respectively, so all variants are polymorphic (MAF > 0.05) at least in one breed. 17 out of 31 show a MAF > 0.05 when considering the average frequencies in all the pools together (15 out of 31 with MAF > 0.05 in wolf populations). However, as mentioned above, frequencies of nsSNPs differ among breeds (see Additional file 2). It's worthy to note the differences on the alternative allele frequency observed for the 8 nsSNPs that were predicted to affect protein function by the three tools used (Figure 1).

TaqMan open array design and SNP validation
A TaqMan OpenArray® plate has been developed for the validation of the nsSNPs by individual genotyping (Table 4). This panel contains (i) 27 out of 31 nsSNPs that were predicted to have an impact on the protein structure (4 wolfspecific SNPs were not considered for the array: TLR1 A525V, TLR5 N833K, TLR6 P579L and TLR10 F787L; see Table 3); (ii) 28 out of the 33 remaining nsSNPs segregating in dogs (5 SNPs that were not suitable for a correct primer design were rejected for posterior analysis: TLR4 T36A, TLR4 T36I, TLR5 T243A, TLR5 Q213R and TLR9 A442V); and (iii) 4 frameshift and 4 non-synonymous TLR polymorphisms described on CanFam 3.1 but not detected in our cohorts (see Table 4). One of the non-synonymous variants added (rs23572381, TLR1 N634K) was designed with two different TaqMan assays due to the presence of other variants close to the interrogated SNP.
A total of 99 DNA samples of the first massive sequencing pools were chosen to be individually genotyped in order to validate the SNPs with the TaqMan Open Array® designed: 15 Beagle, 15 Boxer, 14 French bulldog, 15 Labrador, 15 German Shepherd dog, 13 Yorkshire and 12 Shar Pei were used. One Shar-Pei and 2 Yorkshires do not pass the quality control for samples (call rate > 0.9) and were removed from the posterior analysis. Finally, analysis was performed with a total of 96 individuals. Fifty-nine out of the 64 SNPs (92%) included in the OpenArray have been successfully validated and all of them had a call rate greater than 0.9.
Some downstream analyses have been performed with the individual genotypes. However, it should be taken into account that these are just preliminary results, which need to be validated with larger cohorts of dogs.
All the TLR SNPs were in Hardy-Weinberg Equilibrium (HWE) on Beagle, Boxer, German Shepherd, Labrador and Shar-Pei. In Yorkshire, TLR10 has two SNPs in linkage disequilibrium which are not in HWE, one of them is predicted to affect protein function by the algorithms tested (Table 3). French Bulldog was the breed that had more SNPs that did not follow HWE proportions, with 3 SNPs in TLR4 and one SNP in TLR5 (Table 5). TLR7 and 8 were not included because they are both located in chromosome X.
Principal components analysis (PCA) combined data from the individual genotypes obtained for the subset of SNPs which were not in linkage disequilibrium. It was used to illustrate if dogs cluster by breed for genetic variants in TLRs. The first two components from the PCA have been plotted in Figure 2. Visual examination of this plot shows overlapping for most breeds, excluding Labrador and perhaps German Shepherd, which seem to be more differentiated for these receptors.

Discussion
Canine breed specific variants in TLRs could be associated to differences in breed innate immune response to pathogens and thus to susceptibility to infections or autoimmune diseases. So far, polymorphisms in TLR4 and TLR5 have been associated with IBD in German Shepherd dogs [8], but only protective SNPs from TLR5 have been associated with IBD in other 38 dog breeds [9]. There is therefore great interest in the characterization of canine TLRs. Different dog breeds and 2 different populations of wolves (Iberian and Russian) were included in the analysis to represent some of the major phylogenetic radiations: Wolves, Ancient&Spitz breeds, Scent hounds, Working dogs, Mastiff-like dogs, Small Terriers and Retrievers [30]. SNPs functionally annotated as non-synonymous are expected to have a greater effect on protein function, and therefore a more exhaustive analysis was performed on them. Although allelic frequencies for nsSNPs differ among breeds and 12.5% of them are breed-specific (6.25% are wolf specific), dogs from different breeds share most non-synonymous variants in TLRs.
A TaqMan OpenArray® plate containing 64 SNPs with a possible functional effect in the protein (4 frameshifts and 60 nsSNPs) has been designed and validated. 55 out of 64 SNPs contained in the OpenArray® plate have been identified in this work through massive sequencing by HISEQ; the remaining 9 were obtained from CanFam 3.1.
As shown in Figure 2, the individual genotypes are not clustering by breed, with the exception of Labrador and German Shepherd dogs.
The functional impact of non-synonymous variants in dog TLRs was predicted using Polyphen-2, SIFT and PRO-VEAN. Knowing that TLRs are highly conserved receptors, it is not unexpected that half non-synonymous mutations in dogs have a benign effect, which agrees with results from similar approaches in other non-primate species such as bovine [31]. In dogs, TLR5 is the one that presents more damaging non-synonymous mutations (possibly damaging + probably damaging), followed by TLR4, both of them extracellular receptors.
Results from SIFT and Polyphen-2 from some nsSNPs located in TLR5 returned no output and no prediction ("unknown" or "low confidence"). In dogs, TLR5 was described as a longer protein compared to their homologs in other species. In CanFam 3.1 TLR5 has 1422 aminoacids, however other species like human, cow and pig have 858 aa, 858 aa and 856 aa, respectively. A protein BLAST was performed with the extra 5′ and 3′ TLR5 fragments, but no result was obtained. Furthermore, the 5′ sequence begins with ATG codon in the same phase as the initial coding ATG in other species, whereas the 3′ sequence eliminates the STOP codon due to some repeats in tandem (data not shown). So, bad annotation of this gene in CanFam3.1 is suggested. However, SNPs have been found in this region. In fact, there are 2 SNPs that had already been wrongly described as an aminoacid change   (in ENSEMBL) moreover in this study 5 more SNPs have been detected. So it would be interesting to either determine the existence and functionality of these extra fragments in canine TLR5 cDNA or correctly annotate it in CanFam 3.1. Intracellular TLRs, which detect nucleic acids, have less nsSNPs (15), moreover these are predicted to be less damaging variants than those identified in extracellular TLRs, suggesting that intracellular TLRs are selectively constrained. TLR9 is the intracellular TLR that accepts more nsSNPs in dogs, but the predicted effect of these nsSNPs is usually benign.
These results agree with previously reported data revealing major differences in the intensity of selection acting upon the different members of the TLR family. Different TLRs differ in their immunological redundancy, reflecting their distinct contributions to host defense [5,32]. Intracellular TLRs act as nucleic acid sensors and have evolved under strong purifying selection, indicating their essential non-redundant role in host survival. Higher rates of  damaging non-synonymous and nonsense mutations are tolerated in cell-surface or extracellular TLRs, which recognize compounds other than nucleic acids, suggesting a higher redundancy. Location of the SNPs in the protein was approached using the software SMART [28], which identifies TLR domains using the aminoacid sequence. The intracellular TIR domain is highly conserved between different TLRs and species due to its involvement in intracellular signaling [33]. Also in dogs, TIR domains have few SNPs; only one is present in the predicted TIR domain (TLR5 H1017Y) and another two are really close to it (TLR5 A1078T and TLR10 F787L). Extracellular domains of TLRs are those that recognize PAMPs, and they have an enhanced susceptibility to mutate adapting to different microbiologic environments [33]. It can also be seen that a high number of mutations (some with damaging effects) are located in LRR domains, which form the extracellular domain of TLRs.
So far, polymorphisms in TLRs have been associated with Inflammatory Bowel disease (IBD) in German Shepherd dogs (GSD) and in other breeds. Variants in TLR5 previously reported to be associated to IBD (G22A, C100T and T1844C from [8,9]) have been also detected in our cohorts and correspond with TLR5 T243A, TLR5 R269C and TLR5 L850S respectively. SNP G22A, where the risk allele is A in G22A (corresponding to Thr in TLR5 T243A as named in this work), is found to be an additive allele. So when a GSD is homozygous for the risk allele it has more susceptibility to suffer IBD. This risk allele is not segregating in our GSD cohort. This could be due to the difference in the geographical origin of the GSD cohort between both studies. In [9] GSD are from UK, whilst our cohort is from Spain. SNPs C100T and T1844C were found to be significantly protective against canine IBD in many breeds [9]. The frequencies of the protective alleles (T in C100T and T in T1844C or Cys in TLR5 R269C and Leu in TLR5 L850S as named in this work) differ among breeds (Figure 3), with a frequency higher than 0.5 in Yorkshire and GSD.

Conclusions
Polymorphisms in the exonic regions of canine TLRs have been characterized by massive sequencing and 156 out of 204 variants identified were new: 43/64 non-synonymous variants, 56/73 synonymous variants and 57/67 modifier variants. None of the variants detected in the pools analyzed had a high effect (STOP codon, frameshift mutation or splicing) on the protein function.
A TaqMan OpenArray® plate containing 64 SNPs with a possible functional effect in the protein (4 frameshifts and 60 nsSNPs) has been designed and validated to allow the high throughput genotyping of canine TLRs.

Ethics statement
The dogs in the study were examined during routinary veterinary procedures by the veterinary clinics participating in the study. All samples were collected for routine diagnostic and clinical purposes. The samples were obtained during veterinary procedures that would have been carried out anyway and DNA was extracted from residual surplus of samples and used in the study with verbal owner consent. This is a very special situation in veterinary medicine. As the data are from client-owned dogs that underwent normal veterinary exams, there was no "animal experiment" according to the legal definitions in Spain and the United Kingdom, and approval by an ethical committee was not necessary.

DNA sources
Samples available from the DNA bank at the SVGM (Molecular Genetics Veterinary Service, UAB) were used. Total DNA from blood cells had been extracted either as described elsewhere [34]  Exon capture and massive sequencing for SNP discovery Twenty exonic regions of 10 canine TLR genes annotated in CanFam 2.0 were chosen to perform the enrichment (see Additional file 3 with corresponding coordinates in CanFam 3.1).
Oligonucleotides were first automatically designed for the enrichment of selected regions [35]. Regions rejected in the automated design, because of the presence of gaps, repeats or shorter sizes than required (at least 120 nucleotides) were manually redesigned. Finally 235 ultra-long 120-mer biotinylated cRNA baits were designed to capture the exonic regions of canine TLRs (28,200 bases) by the Agilent Sure Select technique. High-throughput sequencing was performed using 2 lanes of Illumina HISEQ, with 8-labelled pools each, at CNAG (Centre Nacional d' Anàlisi Genòmica, Barcelona, Spain).
Sequences obtained were mapped to CanFam 3.1 (released September 2012). All pools were analysed together for variant calling, for better comparison. Alternative variant frequencies were estimated for each breed pool and wolf populations. The variants were annotated with statistical information from the Genome Analysis Tool Kit (GATK) and functional annotations were added from Ensembl using snpEff [36].

Prediction of functional impact of non-synonymous SNPs
The functional impact of non-synonymous mutations detected was predicted using Polyphen-2 [22,23], SIFT [25,24] and PROVEAN [27,26]. When the mean frequency of an alternative variant on the dog population analyzed was more than 0.25, both alleles of those SNPs were tested with algorithms mentioned before as reference and alternative.
Polyphen-2 classifies mutations in three categories: benign, possibly damaging and probably damaging. Polyphen algorithm considers protein structure and/or sequence conservation information for each gene [23]. SIFT is based on the evolutionary conservation of the amino acids within protein families performing multiple sequencing analyses using PSI-Blast algorithm. Highly conserved positions tend to be intolerant to substitution, whereas those with a low degree of conservation tolerate most substitutions. Therefore, it classifies each non-synonymous polymorphism as tolerated or affect protein function and provides also a confidence measure [24]. PROVEAN introduced a regionbased "delta alignment score" which measures the impact of an amino acid variation not only based on the amino acid residue at the position of interest but also the quality of sequence alignment derived from the neighborhood flanking sequences. It classifies variants either as neutral or deleterious [26].