Demographic assessment of the Dalmatian dog – effective population size, linkage disequilibrium and inbreeding coefficients

Background The calculation of demographic measures is a useful tool for evaluating the genomic architecture of dog breeds and enables ranking dog breeds in terms of genetic diversity. To achieve this for the German Dalmatian dog population, 307 purebred animals of this breed were genotyped on the Illumina Canine high density BeadChip. The analysis of pedigree-based inbreeding was performed based on a pedigree with 25,761 dogs including the genotyped dogs. Results The effective population size derived from squared correlation coefficients between SNP alleles (r2) was 69. The maximum value of r2 was 0.56, resulting in a 50% decay value of 0.28 at a marker distance of 37.5 kb. The effective population size calculated from pedigree data using individual increase in inbreeding over equivalent generations was 116. The pedigree inbreeding coefficient was 0.026. The genomic inbreeding coefficient based on the length of runs of homozygosity (ROH) was calculated for seven length categories of ROHs, and ranged from 0.08 to 0.28. The fixation coefficients FIS_PED and FIS_GENO were at 0.017 and 0.004. PANTHER statistical overrepresentation analysis of genes located in consensus ROHs revealed highly underrepresented biological processes in 50% of the investigated dogs. One of those is the 0.28 fold enriched “immune response”, which might be associated to the high prevalence of allergic dermatitis in the breed. Candidate genes for congenital sensorineural deafness (CCSD, a highly prevalent disease in the Dalmatian) were discovered in consensus ROHs. Conclusions The fast decay of r2 and the moderate inbreeding coefficients indicate that the German Dalmatian dog population is rather diverse. Pedigree- and genomic-based inbreeding measures were highly correlated and therefore prove good reliability for the given population. Analyses of consensus ROHs with genes coding for deafness and other breed-defining traits, such as hyperuricosuria, indicate that those ROH became fixed in the Dalmatian population about 500 years ago. In case of the Dalmatian dog, a ROH of 40 SNPs length is enough to investigate signatures of selection (e.g. the ROH with the fixed hyperuricosuria mutation) as far back as the breed formation point approximately 500 years ago.


Plain English summary
Dalmatian dogs are widely known for their uniquely spotted coat. The sporty, medium sized dog was originally bred as a carriage and fire wagon dog and is now a versatile companion animal. Dalmatians are a dog breed with a high prevalence of canine congenital sensorineural deafness (CCSD). It is necessary to combat genetic diseases of purebred dogs, including the Dalmatian, to increase animal welfare. To achieve this, it is useful to gain an insight in the genomic architecture of the breed, to assess whether diseases could stem from increased inbreeding, selection for particular traits or accidental enriched deleterious mutations. With pedigree analyses, one can also identify ancestors with many offspring. An overuse of popular breeding animals can lead to increased inbreeding and loss of genetic diversity in the following generations. This phenomenon is common in dog breeding and called the "popular sire effect". In this study, we aimed to gain an insight into the demography of the Dalmatian dog breed. We calculated inbreeding coefficients from pedigree and genomic data. The pedigree encompassed 25,761 dogs in total and the genomic data were obtained from genotyping 307 Dalmatians on a single nucleotide polymorphism (SNP) array. Pedigree analysis revealed a mean inbreeding coefficient of 0.026 and an effective population size of 116. From the SNP array data, the genomic inbreeding coefficient was derived and ranged from 0.08 to 0.28. The effective population size amounted to 69. In this study, we reinforced that inbreeding estimation using genomic data tends to be more accurate than by pedigree analysis. Both approaches were highly correlated, though and therefore prove good reliability of both methods. The Dalmatian was found to be a rather genetically diverse dog breed. Investigations of runs of homozygosity (regions with loss of genetic variation) revealed four deafness candidate genes, which might indicate a connection to the high prevalence of CCSD. Those genes therefore deserve further investigation for their contribution to CCSD.

Background
The issues of inbreeding and inherited diseases of purebred dogs have steadily gained more public recognition in the recent years [1]. To combat those issues and improve the animal health it is necessary to gain solid knowledge of the genomic architecture of the particular breed [2]. Highdensity single nucleotide polymorphism (SNP) arrays are tools for genetic diversity assessment, as they provide genomic data necessary for the calculation of demographic measures. In this study, we aimed to gain an insight into the demography of the Dalmatian dog breed using pedigree and genome-wide SNP data from highly dense arrays.
Other studies have already focused on the demography of other dog breeds using various data sources. For example, with SNP array data, a high amount of withinbreed genetic differentiation was found in Labrador Retrievers. The total length of runs of homozygosity (ROH) was highly correlated with the pedigree based inbreeding coefficient (F PED ) [3]. Further work on the dog breeds Golden Retriever, Rottweiler and Newfoundland focused on the linkage disequilibrium (LD). The 50% decay value of r 2 (squared correlation coefficients between SNP alleles) was used to measure the variation of LD between breeds [4]. The Lundehund exhibited an extraordinarily high homozygosity indicative for a highly inbred breed. The low genetic variability was reflected in numerous, extensive ROHs, as well as a very low effective population size (N e ), and a slow decay of r 2 . Recent efforts in outcrossing the breed were reflected in visible changes in the effective population size [5]. In the Korean aboriginal Sapsaree dog, SNP array and pedigree data were used to estimate N e . The declining N e -values over the recent generations, demonstrate the need for improved breeding strategies to preserve genetic variability in the Sapsaree dog [6]. For the Nova Scotia Duck Tolling Retriever and Lancashire Heeler dog, their respective effective population sizes were estimated. From the analysis of those pedigree data was concluded that different breeding programs are necessary to increase the genetic variation, namely outcrossing with other breeds for the Nova Scotia Duck Tolling Retriever [7]. Already available data for the Dalmatian are a pedigree inbreeding coefficient of 0.024 and a realized effective population size of 120 [2]. An overview of the analyzed data and results obtained with the studies above is given in Table 1.
As demonstrated above, the calculation of demographic measures and genetic variability from pedigree data and SNP arrays has been proven a useful tool for critically rethinking the breeding strategies of purebred dogs. While inbreeding measures derived from SNP array data more accurately depict, amongst other, the individual inbreeding [8], pedigree data can provide valuable insights into distant animals where no DNA samples were obtained, and therefore into the population history. No study specifically addressing the demography of the Dalmatian has been published yet. Therefore, the objectives of the present study were to estimate the effective population size (N e ) of the Dalmatian from data on linkage disequilibria (LD) obtained by genotyping 307 Dalmatians with the Illumina Canine high density BeadChip (Illumina Inc., San Diego, CA, USA) and genealogical data, as well as identifying runs of homozygosity (ROH) as regions with local loss of genetic variation. We then identified the genes located in those consensus ROHs. Inbreeding coefficients were calculated based on pedigree data (F PED and fixation coefficient F IS ) and genotype information (genomic inbreeding coefficient, F ROH and F IS ) to compare the results of these

Inbreeding measures by pedigree analysis
We identified highly inbred matings in the whole population. There were 31 (0.12%) matings between full siblings

Inbreeding measures from SNP array data
The mean r 2 in the Dalmatian was 0.56 at its maximum value (Fig. 1). The LD decreased to values below r 2 = 0.11 for SNPs 1 Mb apart (Fig. S1). The 50% decay value of r 2 from its maximum was 0.28, corresponding to a mean marker distance of 37.5 kb. N e was calculated for the last 50 generations. Between 10 and 4 generations ago, N e remained stable at a level of 109 to 104 (Fig. 2).
In the very recent generations, N e decreased to a value of 69 (Fig. S2). The increase in inbreeding per generation (ΔF) reached a maximum of 0.007 in the actual generation (Fig. 3). From 50 to 10 generations ago, ΔF steadily rose from 0.0025 to 0.004. The mean F IS_GENO -value was 0.004 and ranged from − 0.141 to 0.228. The 10-to 358-SNP-thresholds resulted in ROHs of at least 120 to 5012 kb length. The mean F ROH358_SNP for all Dalmatians was 0.08 (0.002 to 0.23) and the F ROH65_SNP , F ROH50_SNP , F ROH40_SNP , F ROH30_SNP , F ROH20_SNP and F ROH10_SNP -values were 0.16 (0.06 to 0.31), 0.17 (0.07 to 0.32), 0.18 (0.08 to 0.33), 0.19 (0.09 to 0.34), 0.22 (0.12 to 0.36) and 0.28 (0.19 to 0.4), respectively. The Pearson correlation coefficient was at 0.675 among the inbreeding coefficients F PED and F IS_-GENO , and correlation coefficients among F IS_GENO and F ROH10 through F ROH358 ranged from 0.81 to 0.917. Correlations among the different F ROH were ranging from 0.81 to 0.99 (Table 4). We identified 13, 5, 4 and 1 consensus ROHs for the 10, 20, 30 and 40-SNP thresholds ( Table 5). There were 40 genes located in the consensus ROHs, with some genes associated to disease predispositions of the Dalmatian dog ( Table 6, Table S1).

PANTHER statistical overrepresentation analysis and functional classification test
Underrepresented biological processes were identified in the 50% fixed (ROHs common to 50% of the genotyped Table 3 Average inbreeding coefficient per generation. The average F PED per generation was calculated over the whole pedigree. Generation 0 is the founder generation and generation 24 the youngest generation dogs) 10-SNP ROHs. Most underrepresented were "immune response" (0.28 fold enrichment), "sensory perception of smell" (0.23 fold enrichment) and "sensory perception of chemical stimulus" (0.36 fold enrichment) (Table S2). Biological processes that were overrepresented were mainly found in the 75% fixed 20-SNP and 30-SNP ROHs, namely "digestive tract mesoderm development" with a 27.46 resp. 35.34 fold enrichment. The genes in the consensus ROH were mostly assigned to the functional classes "cellular process" and "metabolic process" (Table S3).

Discussion
The genomic inbreeding coefficient F ROH assumes different values depending on the length of the ROH used for calculation. Therefore, we calculated the F ROH over each of the seven ROH length thresholds in SNPs. Short ROHs represent inbreeding incidences of distant ancestors, as the homozygous chromosome segments are broken down over time due to the crossing-over in meiosis. Consequently, long ROH are an indication of recent mating of related individuals [9]. Short ROH are more numerous in the Dalmatian and cover a larger amount of the genome, thus resulting in a higher F ROH . This was expected as it was previously demonstrated in humans that even in inbred populations, the short ROH make up the majority of ROH [9]. The results, namely few detectable long ROH in the Dalmatian (F ROH358 = 0.08) indicate that there are only few recent inbreeding incidences. Choosing shorter ROH for inbreeding calculations shows that there are probably more distant inbreeding incidences (F ROH20 = 0.22). The estimation of inbreeding by pedigree analysis is subjected to the completeness and quality of pedigree data. Missing family members as well as the assumed unrelatedness of founders lead to a bias of F PED to lower values [10][11][12]. Additionally, the F PED cannot reliably predict the actual proportion of IBD genome that related individuals share. The actual amount of the genome that is identical by descent (IBD) varies around the value predicted by pedigree data because of Mendelian segregation [13][14][15][16]. Thus the estimation of inbreeding by F ROH is more accurate than by F PED [8]. This is demonstrated here by an almost 10 times higher F ROH of a moderate SNP length threshold (F ROH20 = 0.22) than F PED (0.026). The mean F PED for the reference population was 0.035. It appears that the genotyped Dalmatians were on average less inbred than the rest of the population. This is plausible since we chose distantly related and not highly inbred Dalmatians for genotyping. The few highly inbred matings were thus not included in the genomic analyses and may have contributed to a higher F PED for the total reference population. Nonetheless the Dalmatians chosen for genotyping are representative for the population, as they were collected from the best documented birth years and across all major German Dalmatian breed clubs. Additionally, we compared the average coancestries of the two reference populations. The average coancestry for the Beadchip sample was 0.019 and 0.022% for the reference population, which are very similar values. The assessment of ancestors and founders also shows that the founders and ancestors from the reference population are sufficiently represented in the Beadchip sample.
The effective population sizes derived from linkage disequilibria showed a significant drop from 102 to 69 in the last three generations. This equals to a rise in the increase in inbreeding from 0.004 per generation to 0.007 per generation. The breakdown of the average F PED per generation also shows an increase in inbreeding over the last generations. The Food and Agricultural Organization (FAO) of United Nations recommends that in order to maintain fitness in a population, the increase in inbreeding should not exceed 1% per generation, which equals an  effective population size of 50 [17]. The Dalmatian is still below that critical threshold, but the steep rise in ΔF is a cause to concern if the trend continues. Thus, the development of the increase in inbreeding should be monitored further, to confirm or reject the hypothesis of rapidly increasing inbreeding. The results of the pedigree-derived effective population size computation over the individual increase in inbreeding showed high conformity to the one derived from r 2 (N e 116.16). The calculation of N e over the paired increase in coancestry showed even more accordance (N e 70.93). This parameter was calculated additionally, as the N e over the increase in inbreeding tends to be inaccurate if there is substructure in the population [18], which our F IS value suggests. Nonetheless, both approaches to calculate pedigree N e appear appropriate for N ecalculation of this and other rather large, welldocumented pedigrees.

Comparison of decay of r 2 in different dog breeds
The Dalmatian featured a fast decay of LD. We compared the point of 50% decay of r 2 (defined as the point at which r 2 reaches 50% of its maximum value) to largesized dog breeds of another study which made use of SNP array data, too [4]. Golden Retrievers, Rottweilers and Newfoundland dogs had a point of 50% decay of r 2 at r 2 = 0.24 with corresponding marker distances of 714 kb, 833 kb and 344 kb, respectively. The Dalmatian had a steeper decrease of r 2 values, reaching the 50% decay point at a marker distance of 37.5 kb. When we chose the point at which r 2 reaches 0.24, we still found a much shorter marker distance of 62.5 kb than in the other breeds. Another study chose the arbitrary point of r 2 = 0.2 to compare wolves and domestic dog breeds [19].    [3,20]. The moderate genomic and pedigree inbreeding coefficients and the fast decay of r 2 , all in relation to comparable breeds, indicate that the Dalmatian belongs to the more diverse breeds. None of the over-or underrepresented biological processes of genes located in consensus ROH point to the contribution to frequently occurring diseases in the Dalmatian. The underrepresented "immune response" in the 50% fixed ROHs might be associated with the high prevalence of allergic dermatitis, though [21].

Analysis of genes located in consensus ROHs
Fourty genes were identified in ROHs shared by all genotyped Dalmatians. Among them were four genes associated with deafness [grainyhead like transcription factor 2 (GRHL2), BCL2 like 11 (BCL2L11), ELMO domain containing 3 (ELMOD3), usherin (USH2A)]. With the high prevalence of CCSD in the Dalmatian in mind, those genes could possibly harbor variants contributing to CCSD. From the length of those ROH, the time of origin of the ROH was estimated. Those consensus ROH were 169.156 to 420.395 bp long, which implies an origin from~119 to~337 generations ago. With a generation interval of 4.33 years, assuming the generation interval did not change much over the last centuries, this amounts to~515 to~1459 years ago. Deafnesscontributing variants in small consensus ROH may explain the widespread occurrence of CCSD in all Dalmatian breeding lines.
Also, shorter ROH allow for the identification of signatures of selection. In case of the Dalmatian, F ROH40 seems short enough, as with this threshold the a consensus ROH containing SLC2A9 was found. This gene harbors the mutation for hyperuricosuria, which all Dalmatians are afflicted with [22], except for low uric acid (LUA) Dalmatians. By outcrossing with a Pointer, the wild type allele was introduced to a line of Dalmatians [23]. None of those LUA Dalmatians were included in the Beadchip sample. From the length of the ROH (417.597 bp) we can assume the hyperuricosuria mutation became fixed in the Dalmatian~119 generations (~515 years) ago. The Dalmatians spots are another trait fixed in the breed. One gene responsible for this phenotype is MITF, which causes the extreme white spotting [24,25]. The colored spots result from the influence of a ticking locus [25]. Although the trait is fixed, MITF could not be found in any consensus ROH. MITF is located on CFA 20 and stretches from 21,612,927 to 21,870,578. It contains 23 Beadchip SNP, which is within the average SNP-density of the Illumina CanineHD BeadChip. Further inspection of the gene with PLINK could not identify consensus ROHs of all 307 animals even with a ROH threshold of 5 SNPs, 60 kb length and 2 missing SNPs. We therefore speculate, that the initial ROH created by the early trait fixation in the breed, is that old it got broken down over time and is now too small to be detected by our thresholds. This is plausible since the spotting is the main and therefore earliest breed defining trait.
It has already been hypothesized that the hyperuricosuria mutation became fixed in the Dalmatian population due to selection for large and solid pigmented spots [23,26]. Also it is widely accepted that the extreme white spotting plays a role in CCSD in the Dalmatian [27]. Together, the estimated age of consensus ROHs with genes coding for breed defining traits (hyperuricosuria and deafness), and the connection of those traits with the extreme white spotting, all point to a presumable breed formation about 500 years ago. These assumptions coincide with the earliest documentation of the Dalmatian breed in the 14th century [28].

Conclusions
In comparison to other dog breeds, the Dalmatian is a rather genetically diverse breed. Pedigree-and genomicbased inbreeding calculation results showed high conformity and therefore prove a good quality of the used pedigree and good reliability of both methods if a high quality dataset is given. In consensus ROHs of the investigated Dalmatian population four genes associated with deafness were identified. Those genes deserve further investigation for their possible contribution to Dalmatian CCSD. The short length of these ROHs indicate an early emergence of variants contributing to congenital hearing loss. This finding may explain the widespread prevalence of CCSD in all Dalmatian lines. Consensus ROH with genes coding for breed-defining traits point to a breed formation of approximately 500 years ago.

Materials and methods
Computation of inbreeding measures based on pedigree data ENDOG v4.8 [29] was utilized for the calculations. The whole pedigree encompassed 25.761 animals, including the 313 genotyped animals. A reference population including all animals born in the years 1995-2015, including the genotyped animals, was created.
As a measure of pedigree completeness, the mean equivalent generations (sum of (1/2) n terms over all known ancestors, where n is the number of generations separating the individual from the ancestor, [30]) were calculated.
The individual inbreeding coefficient F was computed according to Meuwissen and Luo [31]. F IS as a part of Fstatistics was calculated for the reference population and Beadchip sample as subpopulations [32]. The approach , withF being the mean inbreeding coefficient for the entire metapopulation, an f the average coancestry for the subpopulation [33,34]. Furthermore, we calculated the effective number of founders f e and the effective number of ancestors f a [10]. The effective number of founders is defined as the number of equally contributing founders that are expected to produce the same genetic diversity as the studied population. It is computed as f e ¼ 1 , with q k being the probability of gene origin of the k ancestor [35,36]. The effective number of ancestors fa states the minimum number of ancestors explaining the complete genetic diversity of the population. This parameter accounts for the unbalanced reproductive use of founders. The computation is similar to the effective number of founders: Here q j is the marginal contribution of an ancestor j, which is the genetic contribution by an ancestor that is not attributable to the other ancestors chosen before. N e was estimated from the individual increase in inbreeding ΔF i [18,37] and also by the increase in coancestry for all pairs of individuals j and k for the Beadchip sample. The parameter is calculated as following: , with c jk as the inbreeding value of an offspring from j and k, and g j and g k are the discrete equivalent generations of individuals j and k [38]. ΔF i is computed as , with F i = individual inbreeding coefficient and t = equivalent complete generations [30]. For the N e calculation over the increase in inbreeding the ΔF i s of the individuals in the reference population are averaged and N e calculated as N e ¼ 1 2Δ F . This method to estimate N e is also called Realized effective population size [39]. The generation intervals were defined as the average age of parents at the birth of their offspring kept for reproduction and were calculated over the four pathways (father-son, motherdaughter, father-daughter and mother-son) [40]. The average coancestries were calculated with PEDIG [41].

Statistical analysis
The dataset for the estimation of diversity measures consisted of 168,360 SNPs with a genotyping rate > 0.90 in 307 Dalmatians. 6 Dalmatians did not pass quality control, but were included as "genotyped animals" in the pedigree analysis. For the detection of ROHs we applied quality control with PLINK v.1.09 [42] and excluded all SNPs from sex chromosomes (n = 5295) and SNPs that could not be assigned to a chromosome (n = 523) resulting in a reduced dataset of 162,542 autosomal SNPs. We calculated r 2 as a measure of LD among SNP alleles per chromosome using PLINKv.1.09. The r 2 -values for SNP pairs with distances of 1 kb to 33.3 Mb were grouped into distance bins of 0.1 Mb. For each bin the mean r 2values were calculated and the effective population size was estimated as N e = (1-r 2 )/(4cr 2 ) with c = recombination rate in Morgan units [43]. Regarding the distance c between two SNPs we assumed that 100 Mb~1 Morgan. The number of generations in the past was estimated as t = 1/(2c). The increase in inbreeding was computed as ΔF = 1/(2N e ) [18].  [44,45]. We did not allow for heterozygous SNPs and only for five missing SNP genotypes per 50, 65 and 358 SNP-window, four missing SNP genotypes per 40 SNP-window, three per 30 SNPwindow and two for the 20 and 10 SNP-windows [45]. The matching proportions of ROHs overlapping in all Dalmatians were pooled to consensus ROHs. The time of origin of the consensus ROHs in generations was estimated as 1/(2c) [18]. We used SAS v.9.4 (Statistical Analysis System Institute Inc., Cary, NC, USA) to identify genes that are located in the consensus ROHs. We screened the NCBI database (National Center for Biotechnology Information, U.S. National Library of Medicine) for known genes for common diseases in the Dalmatian, according to [21] and searched for overlaps with consensus ROHs (Table S1). We also identified ROHs that were partially fixed in the population and therefore common to 50% or 75% of all Dalmatians. The inbreeding coefficient F ROH for each dog was estimated as the length of all ROH per threshold in the respective individual divided by the total length of all autosomes covered by SNPs: F ROH ¼ P L ROH .
L AUTO [46]. We used PLINK v.1.09 to calculate F IS -values for each individual i as F IS,I = (O i -E i )/(nSNP ,I -E i ), with E i = number of expected homozygous SNPs, O i = number of observed homozygous SNPs and nSNP ,i = number of all SNPs genotyped in the respective individual [47]. PANTHER (Protein Analysis Through Evolutionary Relationships) [48] was used to investigate all genes that are located in the consensus and partially fixed (ROHs common to 50 and 75% of all investigated dogs) ROHs. The gene lists were analyzed with the "functional classification" analysis and the "statistical overrepresentation" test.
For the screening of genes located in the consensus ROHs for important disease-associated genes, we applied a candidate gene list (Table S1). This table contains a list of candidate genes for frequent diseases of the Dalmatian dog, as stated by Bell et al. [21]. A candidate gene search was performed in the NCBI database, and the non-canine candidate genes were transformed into orthologous dog genes with g:Profiler Orthology search [49].