Selected canine abstracts from the Companion Animal Genetic Health conference 2018 (CAGH 2018): Canine Genetics and Epidemiology

About this supplement These abstracts have been published as Canine Genetics and Epidemiology Volume 5 Supplement 2, 2018: Selected canine abstracts from the Companion Animal Genetic Health conference 2018 (CAGH 2018): Canine Genetics and Epidemiology. A meeting report from the conference has been published as Canine Genetics and Epidemiology Volume 5 Supplement 1, 2018 and is available online at https://doi.org/10.1186/ s40575-018-0061-0. Selected feline abstracts from the conference have been published as Irish Veterinary Journal Volume 71 Supplement 1, 2018 and are available online at https://doi.org/10.1186/s13620-018-0126-0.

Human glioma are brain cancers with a dramatic 5 year survival time of 5% even applying the unique reference treatment based on radioand chemotherapy. Interestingly, among the many dog breeds prone to spontaneously develop cancers, brachycephalic breeds (Boxers, Bulldogs, Boston terriers…) are particularly affected by glial tumors. Dogs share the same environment as humans and have also anatomical and physiological similarities, thus constituting a relevant model for the genetics and therapies of brain tumors. Thanks to the national Cani-DNA biobank and its veterinary network (the 4 Veterinary Schools, Antagene, private practices and cancer centers) managed at CNRS Rennes (France), samples for 50 glioma affected and >100 control dogs, as well as 1400 brachycephalic dogs have been collected and DNA extracted and stored. With the goal to compare dog and human gliomas in mind, we performed a retrospective study of 100 canine glioma cases, allowing a clinical, epidemiological and histological characterization of these canine tumors. The predominant localization of glioma to the frontal lobe, predisposed breeds (mainly brachycephalic dogs from the European Mastiff line) and mean age of onset were revealed by the analysis of 20 cases with imaging and 15 cases with histology. We showed that dog gliomas present surprising anatomic and clinical homologies, with comparable histopathological subtypes as in human gliomas.
These results led us to analyze 2 cases for which brain tissue had been collected. We identified a BRAF-MBP gene fusion in one case using RNAseq and we are currently checking for recurrence in the collected samples, as well as for the presence of this translocation in human glioma cases. Using affected cases and controls of the same breeds, we plan to pursue the identification of somatic alterations by transcriptome analyzes (RNAseq) and exome sequencing (WES) and to carry out genetic linkage and/or genetic association studies (GWAS) to identify genomic regions involved in predisposition. We will also search if and how the artificial selection that led to specific morphological characteristics, such as the shape of the dog's skull for many other breeds reaching sufficient numbers within any national database is not likely. Further, collection of the data pertaining to diseases through national Kennel Clubs is usually limited to very few already established grading systems for specific diseases, such as British Veterinary Association's scheme for Hip and Elbow Dysplasia. Thus, databases created by independent breed societies combining records across countries and on breed-specific issues, could become sources of data for the genetic analyses and EBV predictions in numerically small breeds. We present a preliminary analysis of the heritability of longevity and heart-related deaths (HEART) in Doberman Pinscher, based on data collated by The Doberman Welfare Community (DWC)an independent group of breed enthusiasts. The data included over 350,000 dogs over 37 genera tions, born between 1890 and 2017, and from 18 regions. Phenotypic records on longevity and cause of death were recorded for 10,549 and 5,844 dogs respectively. Neither longevity nor causes of death are currently recorded by national kennel clubs, thus highlighting the role of the DWC in collecting this type of data. Among the causes of death, HEART were most common (48%), and more frequent in males than females (55% males, 45% females). The average longevity (LONGnumber of months between birth and death) was 89 months (7 years) for males and 100 months (8 years) for females. A number of mixed linear models were fitted to identify significant environmental factors affecting LONG and HEART, and to estimate heritability of the traits. LONG was Box-Cox transformed to improve normality of the data, and binomial models were fitted for the heritability estimation of the underlying liability for the HEART. Factors identified as significant for HEART were sex, region, season of birth, and year of death. LONG was affected by year and season of birth, as well as year of death. Heritability of the HEART and LONG was 0.29 ( 0.02) and 0.11 ( 0.02) respectively. To the best of our knowledge, these are the first published estimates of heritability of longevity and heartrelated deaths in Dobermans using owner-collated data. A significant genetic variance detected for both traits indicates that selection could bring improvement in these traits, which is particularly important for HEARTheart conditions are believed to affect as many as 20% of Dobermans, and the symptoms of the disease often appear after a dog has already been used for breeding. Further, significant estimates obtained in the presented analyses indicate validity of the data, thus opening a new window of opportunity for genetic analyses of complex traits in numerically small breeds through the recruitment and collation of data by breed enthusiasts. Through the French Cani-DNA biobank, developed in the team since 2005, we have collected over 3000 samples (blood and paired tumour/ normal tissues) form many dogs affected by breed specific cancers, as well as controls of the same breeds, for which there are specific issues in the human corresponding cancers. Indeed, naturally occurring canine cancers are recently receiving attention in comparative oncology because of their high similarity to human cancers both in their clinical and histological presentations as well as in their response to treatments. We have constituted large collections of cases and controls as well as large family pedigrees and through genome wide association studies (GWAS) and genetic linkage approaches, we have identified predisposition loci for Histiocytic Sarcoma (HS) and oral melanomas. In parallel, through the search of somatic genetic alterations in the tumour (whole exome sequence -WES-; capture/sequencing and RNAseq techniques), we have identified relevant genetic alterations in canine lymphomas, sarcomas, melanomas and gliomas. Either we found new genes implicated in dogs and we could identify the same genes in the corresponding human cancers, or we found already known genes, especially oncogenes with the same hotspots than in humans, as well as gene fusions with the same partners and the same over-expression mechanism than in humans, for lymphoma, sarcoma and glioma (Ulvé, Rault et al., 2017). We have identified such genes and their pathogenic somatic alterations for HS (TP53 and a MAPK oncogene), melanomas (over 50 genes, including PTEN, NRAS), lymphomas (26 genes, including cyclins) and gliomas (a BRAF-MBP fusion). For oral melanomas, we also have identified specific Copy Number alterations (CNA) that we showed to be significantly linked to survival. Finally, we developed cell lines for these canine cancers (8 for HS, 10 for oral and uveal melanomas, 2 for gliomas and 1 for lymphoma), and were able to demonstrate the effect on proliferation, of drugs targeting genes coding for MAPK pathway oncogenes and cyclin genes, for HS and lymphomas respectively. We thus showed that canine cancers might be highly useful for clinical trials, as in vitro and in vivo models to screen drugs in dog/human homologous cancers, prior to test them in humans, in the frame of the treatment of the dogs and with the owner partnership and consent. Finally, to also benefit breeders, we also developed a genetic risk test for Histiocytic Sarcoma, made of 9 markers predictive of a "protective" or "at risk" haplotype and status, available for breeders to help their selection against HS in the Bernese Mountain Dog breed. To conclude, these genetic findings bring a better understanding of the genetics and potential treatments for dog but also for human medicine. More widely, these results show the interest of the dog model to decipher the genetic bases and plan clinical trials in dogs for rare and/ or aggressive-refractory human cancers.

O4
Association of primary open angle glaucoma ADAMTS17 mutations with height in two domestic dog breeds Emily Jeanes 1 , James Oliver 2 , Sally L. Ricketts 2 , David Gould 3 , Cathryn Mellersh 2 There are currently five known ADAMTS17 mutations in the dog that are associated with the development of either primary open angle glaucoma or primary lens luxation. Interestingly, these mutations have been identified in breeds of generally short stature including terriers and Basset breeds. In humans, mutations in the ADAMTS17 gene are associated with Weill-Marchesani syndromea disorder whose clinical characteristics include ocular manifestations such as microspherophakia, myopia, glaucoma, and cataract, in addition to brachydactyly and short stature. This led us to hypothesise that these mutations may also be associated with height in these breeds. To test this, we conducted an association analysis between breedspecific ADAMTS17 mutations and height in two of these breedsthe Petit Basset Griffon Vendeen and Shar Pei. Two hundred and twenty-seven Petit Basset Griffon Vendeen and 65 Shar Pei were genotyped for their breed-specific ADAMTS17 mutations. The height of each dog was measured at the withers. We used linear per allele regression to assess the association between ADAMTS17 mutations and height as a continuous variable, and linear regression and loglikelihood ratio tests to assess the shape of the association by comparing a general model with a linear per allele model. The mean heights of affected (n=21), carrier (n=84) and clear (n=122) Petit Basset Griffon Vendeen were 33.41 cm, 34.78 cm and 34.93 cm, respectively. The mean heights of affected (n=9), carrier (n=30) and clear (n=26) Shar Pei were 43.32 cm, 47.93 cm and 48.38 cm, respectively. Each breed-specific ADAMTS17 mutation showed a strong association with height in both breeds: Petit Basset Griffon Vendeen (P=7.9 x 10-3); Shar Pei (P=6.9 x 10-5). The shape of the associations appeared similar between the two breeds. In humans, ADAMTS17 affects skeletal development by modulating the extracellular matrix. A similar mechanism may be present in the dog. We speculate that selection for short stature might have inadvertently increased ADAMTS17 mutant allele frequencies and thus increased prevalence of primary open angle glaucoma in these breeds. Keywords: Canine, Morphology In domestic dogs, the "flat-faced" brachycephalic head shape is a risk factor for developing the respiratory defect, Brachycephalic Obstructive Airway Syndrome (BOAS). As the popularity of breeds such as the French bulldog continues to increase in the UK, so too are the expected incidences of BOAS. For this reason, we became interested in the Norwich terrier, a non-brachycephalic breed which presents with Upper Airway Syndrome (UAS), a condition highly reminiscent of BOAS. Here, we have studied this single breed to identify genetic association(s) with UAS. Pathological assessments and grading from laryngoscopic examinations held at the Vetsuisse Faculty of the University of Bern, were used as phenotypes in conjunction with microarray genotypes to perform GWAS. In total, 233 Norwich terriers were examined. We identified the same QTL on canine chromosome (CFA) 13 to be associated with the abnormal positioning of laryngeal cartilage and everted saccules in the dogs most severely affected by UAS. We phased genotypes at the CFA13 QTL to conduct haplotype mapping, which led us to define a 413 kb critical interval which encompasses a single positional candidate gene. The derived haplotype within this interval is overrepresented: it is found to be homozygous in 61 of 81 (74%) severely affected cases. In contrast, this homozygous haplotype was identified among 7 of 86 (8.1%) mild/unaffected controls. We have resequenced four dogs representing phenotypic extremes to sixteen-fold depth to identify putatively causal variants. We will provide an update to this ongoing project, which is expected to guide Norwich terrier breeding and inspire additional exploration of the CFA13 locus to improve animal welfare. Keywords: Canine, Inherited Disease, Morphology *Authors contributed equally Inherited ataxias are typically incurable and lack disease-modifying treatments. Four Norwegian Buhunds were diagnosed with cerebellar ataxia at the Animal Health Trust. Pedigree analysis was suggestive of an autosomal recessive mode of inheritance, which is typical of inherited canine ataxias. The causal variant for ataxia in these dogs was hypothesised to be private to the breed. Whole genome sequence (WGS) was obtained for two sibling cases, which were compared to WGS from 405 dogs of other breeds. The WGS used included 44 which were generated for the study of other diseases in our laboratory, and 361 additional WGS which are part of the Dog Biomedical Variant Database Consortium. Filtering out benign variants left nine that were present only in the cases and predicted to directly affect a protein coding sequence or alter a transcript. These were assessed in 14 related and unrelated Buhunds, leaving one variant that fully segregated with the disease. Its association with ataxia was confirmed by typing in an extended set of 148 Buhunds containing two additional cases, and its absence in 359 dogs of 122 other breeds. This research has resulted in the development of a DNA test enabling breeders to avoid producing affected dogs. Importantly, the causal mutation is within a gene not previously reported to be associated with ataxia in any species. A combination of approaches was used to characterise this gene in the dog, as the current CanFam 3.1 annotation is incorrect. The gene is highly conserved and, in humans and mice, encodes multiple transcripts with alternative first exons. Expression of multiple transcripts in the canine cerebellum was confirmed through RNA sequencing, and through RT-PCR of samples from two Norwegian Buhund cases and five unaffected dogs of other breeds. RT-qPCR analysis and in-silico protein modelling have been used to further investigate the mutation's effect on RNA expression and protein stability. Keywords: Canine, Inherited Disease Chronic degenerative diseases (CDGs) are a major welfare concern in canine medicine with myxomatous mitral valve disease (MMVD) being an important example. For some breeds CDGs can have an inherited basis, but often this is a polygenic trait and so understanding the mechanisms that drive disease pathogenesis requires examining molecular events in tissue. Specifically for CDGs this requires examination in both temporal and spatial terms changes in gene and protein expression. In this study we have examined the valvular gene expression at different stages of disease (temporal), different locations (spatial) and in different cell culture models of MMVD.

Methods
Transcriptomic profiling (Affymetrix canine 1.1ST microarray), with validation using RT-qPCR for selected genes, was performed on, whole valves from normal and the 4 grades of MMVD (n=6),normal and diseased regions of grade 2 valves (n=7), and cultured (all experiments n=3) normal and diseased valve interstitial cells (VICs), normal cells treated with 5ng/ μL TGFβ1 and diseased cells treated with 10μM of the TGFβ pathway inhibitor SB431542. Microarray data were analysed using a range of bioinformatics platforms (Affymetrix Console, IPA, Miru (Biolayout Express)).

Results
Significantly differentially expressed genes (DEG) were identified comparing: 1) normal and the 4 grades of MMVD (1002 genes); 2) diseased and normal tissue within the same valve (315 genes); 3) normal and diseased VICs (1027 genes); 4) normal VICs and normal VICs treated with TGF-β1 (302 genes); 5) diseased VICs and diseased treated with VICs SB431542 (269 genes). Grade-dependent up and down regulated gene clusters were identified, and microarray data were validated by RT-PCR for ACTA2, TAGLN and 5HTR2B. Important GO-terms were found to be associated with myofibroblast differentiation and extracellular matrix homeostasis. In all data sets altered DEGs implicated TGF-β1 as the important up-stream regulator of disease pathogenesis, with minor contributions from TNF and IFNG. 75 DEGs were shared in common between grade 4 whole valve and the diseased sections of the dissected valves. Cultured cell data, in addition to TGFβ1, predicted genes involved in cell cycle and apoptosis as important up-stream regulators. Conclusions This study shows how transcriptomic profiling of chronic degenerative disease over an entire lifetime, in tandem with cell culture models, can identify the signalling pathways important in disease pathogenesis. TGFβ1 signalling has been identified as the fundamentally important pathway in MMVD initiation and development, and progression to eventual end-stage valve pathology. The authors gratefully acknowledge funding from Dogs Trust Keywords: Canine Canine progressive retinal atrophy (PRA) is a degenerative retinal disease characterised by photoreceptor degeneration over time, increasing in severity and ultimately leading to vision loss. PRA affects multiple breeds and significantly impacts welfare. In the Lhasa Apso (LA) dog, PRA manifests typically as a mid-late onset form. Utilisation of whole-exome sequencing (WES) data previously generated in our laboratory from three PRA-affected LA (cases) and three PRAunaffected LA (controls) did not reveal any obvious exonic or splice site polymorphisms segregating with the disease, indicating a noncoding mutation. This presented the opportunity for further investigations using a genome-wide association study (GWAS) and wholegenome sequencing (WGS) approach to identify the genetic cause of PRA and develop a DNA test. A GWAS was conducted by genotyping 44 LA dogs (17 cases, 27 controls) on the Illumina Canine HD 170K chip. Allelic association statistics were adjusted for multiple testing using the PLINK Max(T) permutation procedure, and for population stratification and relatedness using Efficient Mixed-Model Association eXpedited (EMMAX). After stringent filtering and quality control, we tested 108,263 SNPs on 42 dogs, comprising 15 cases and 27 controls (call rate ≥97%; minor allele frequency ≥95%; genotype calls ≥90%). Analysis revealed a genome-wide significant association on canine chromosome 33 (-log praw = 2.2 x 10-16) which remained significant after correcting for multiple testing (pgenome = 0.9 x 10-5) and population substructure (p=raw1.6 x 10-17). A 1.3 megabase homozygous disease-associated region was defined, harbouring two candidate genes previously associated with human retinal degeneration. WGS was undertaken on a single PRA affected LA, and manual interrogation of the critical region in identified a long interspersed nuclear element-1 (LINE-1) insertion, situated within the predicted promotor region of a retinal candidate gene. Due to the position of the LINE-1 insertion, it was not detected in the original WES data of the same case. The LINE-1 insertion was genotyped in 447 dogs across 122 breeds, including 63 LA dogs, and is private to the LA. Seventeen LA dogs (all clinically affected with PRA) were homozygous for the LINE-1 insertions, eight were heterozygous and thirtyeight were homozygous for the wildtype allele. As a result of this study a DNA test for this form of PRA, termed PRA4, has been developed at the Animal Health Trust. To date, 457 LA from 15 countries have been tested for PRA4 (354 UK dogs; carrier frequency 17%; allele frequency 0.9054). This study highlights the power of utilising several genetic approaches to identify a PRA mutation and develop a diagnostic test to help dog breeders make informed breeding choices, minimising the risk of producing PRAaffected LA dogs. Keywords: Canine, Inherited Disease O9 A dog spontaneous model for human sensory neuropathies: identification of a mutation in a regulatory region of GDNF and DNA screening in human patients Solenne Correard 1* , Jocelyn Plassais 1* , Laëtitia Lagoutte 1 , Manon Paradis Similar neuropathies are diagnosed in dogs and several breeds are at risk to develop certain forms. Neuropathy has been described in hunting dogs, where the condition results in progressive mutilation of the distal extremities of the paws (Paradis et al., 2005). Pedigree analysis led to conclude to a monogenic autosomal recessive mode of inheritance. Blood samples from affected and unaffected hunting dogs from France and from Canada were collected through the French Cani-DNA biobank (dog-enetics.genouest.org). Genetic studies (GWAS and sequencing) led to the identification of a locus on canine chromosome 4, and to a mutation located 90kb upstream GDNF, a gene encoding a neurotrophic factor involved in the survival of dopaminergic neurons. This mutation segregates as expected in 300 hunting dogs of known clinical status and is not found in 900 dogs of 90 other non-predisposed breeds. Functional experiments have shown that the mutation causes a decrease of GDNF expression in the dorsal root ganglia and also a decrease in the affinity of a regulatory complex for the DNA sequence to which it binds (Plassais et al., 2016). This gene had not previously been involved in human forms of sensory neuropathy and appears a good candidate. Through French and Belgium reference centers, we collected 111 DNAs of patients affected with different forms of sensory neuropathies and we sequenced GDNF exons as well as two regions predicted as regulatory, orthologous to the mutated regulatory region in the dog. 23 variants were identified and classified: i. New variants (not listed in human databases). ii. Rare variants (listed in databases with a minor allele frequency inferior to 1%).
No new variants have been found in the coding parts of the gene, however, 6 new variants have been identified in the UTRs and regulatory regions upstream GDNF and 17 rare variants were also identified.
In conclusion, the dog model has allowed to identify a new gene for canine and potentially human sensory neuropathies. New and rare variants in this gene are being analyzed to tentatively identify their potential role in human neuropathies.  Data collection by breed on demography: gender, age, country of origin, measures: of weight, BodyConditionScore (BCS) and conformational measurements (i.e. width of Nares (WN), craniofacial ratio (CFR), neck girth ratio (NGR), photo in standardised position (whole body and skull)and surveying general and specific health conditions by a survey to owners and a veterinary examination (clinical data including BOAS) Cheek swabs will be collected from each dog for genomic analyses Analyses of data is intended to compare variation within and between breeds regarding age, gender, origin and the indicated measures Out of all dogs described and sampled, 100 individuals of each breed from each country will be selected for more extensive studies across-breed quantitative trait locus. By performing genotyping using the Illumina high density 170K SNP array of around 400 brachycephalic dogs we will estimate the genomic variation in each breed. 4. Control measures by a reporting form where deaths and performed surgical procedures regarding BOAS are registered as well as an obligatory puppy health certificate to be issued at time of delivery to a new home. 5. The development of a screening procedure for evaluation of breathing capacity , thermoregulation and anatomical features relevant for breathing in adult dogs to initially be used and registered voluntarily but intended to serve in the future as a mandatory request for breeding animals. The advent of whole genome sequencing (WGS) has promised to revolutionise genetic research, and the rapid fall in per-sample costs in recent years has made the revolution an affordable reality for geneticists. The technology is especially useful for the study of simple Mendelian conditions where disease-causing mutations have the potential to be identified from the WGS of a single case. However, when comparing a typical canine genome with the reference sequence (Can-Fam3.1) or a control genome, at least 2-3 million variants will typically be identified. Many of these variants are likely common polymorphisms which could be excluded by comparing with multiple control genomes. We devised the Give a Dog a Genome (GDG) project to build a resource of canine genetic variants across the genome using WGS; currently projected to contain 90 genomes from 78 breeds, and investigate genetic diseases in at least 69 breeds. We used a crowd-funding approach, with the costs of the project being shared between multiple stakeholders. To date (two years after GDG was launched), 74 samples from 69 breeds have been sequenced comprising 62 dogs affected with a suspected genetic condition (27 conditions in total) and 12 apparently healthy older dogs. The GDG variant bank has been used to validate several disease-associated mutations and DNA tests have been developed to improve the health and wellbeing of dogs. All of the WGS data generated through GDG will be shared with the Dog Biomedical Variant Database Consortium (DBVDC), and specific sequences will be shared with at least 20 scientists from Europe and the USA to contribute to their research. Keywords: Canine, Genomics and Variation, Inherited Disease Recent efforts have extended the dog genome annotation with the discovery of thousands of long non coding RNAs (lncRNAs) using the machine-learning based tool FEELnc [1]. Although lncRNAs have been shown to play important roles in many biological processes, and particularly in cancers [2], it remains challenging to assign functions and classify lncRNAs in order to interpret their impact on cancers and genetic diseases. Here, we integrated genomic and transcriptomic features from the extended canFam3.1-plus annotation to perform bioinformatic functional predictions of lncRNAs. We first characterized expression patterns of 10,444 canine lncRNAs in 26 distinct tissues representing various histological and anatomical localizations. We defined tissue specificity profiles of lncRNAs and deduced potential functionality and evolutionary origins through comparative genomic and transcriptomic analysis with human data from the ENCODE project (ENCyclopedia Of DNA Elements).
As in human, we show that canine lncRNAs are lower expressed than protein coding genes (mRNAs). Among the 26 tissues, we detected 4,600 tissue-specific lncRNAs. Unsupervised hierarchical clustering based on lncRNA expression levels recapitulates tissue origins and pinpoint candidate lncRNAs likely associated with specialized functions, such as nervous and integumentary clusters. Furthermore, we identified more than 900 conserved dog-human lncRNAs for which we show their overall reproducible expression patterns in both species through comparative transcriptomics. We then constructed coexpression networks and found significant correlations (| rho | > 0.5 and adjusted p-value (BH) < 0.05) for 7,615 lincRNA:mRNA and 524 antisense:mRNA pairs. These results revealed co-expressed modules that may predict regulatory relationships and/or the evolutionary origin of subsets of lncRNAs. Using functional annotations based on GO biological processes terms, we found 23 clusters significantly enriched (adjusted p-value (BH) < 0.05) corresponding to developmental processes 'sensory organ development' , 'axon development' or 'hindbrain development'. We conducted a pilot study of melanoma in dogs, we performed differential expression analysis using matched tumour/control RNA-seq samples from canine buccal melanomas. We identified 930 lncRNAs with significant differential expression between tumour and control samples (FDR < 0.01). These lncRNAs represent potential biomarkers and/or candidate to study tumorigenesis of melanomas in dogs. Moreover, more than 100 of the 930 lncRNAs are conserved in human and can be used for further evaluating their therapeutic potential in both human and veterinary medicine. Altogether, this genomic and transcriptomic integrative study of lncRNAs constitutes a major resource for biomedical research in the dog species.
document the process reproducibly and values identified as 'outliers' are commonly deleted without reporting the possible causes of error. Our aim was to develop a novel, automated data cleaning algorithm for growth (height and weight) that could be applied to large datasets. Dogslife is an internet-based, longitudinal cohort study of Kennel Club registered Labrador Retrievers living in the UK, which was launched in 2010 and has over 7500 registered dogs to date. The main objective of Dogslife is to identify risk factors for canine health and disease by collecting information from owners via regular questionnaires. In addition to questionnaire data, the study has collected DNA and faecal samples from subsets of the cohort, which has produced genomic and microbiome data. We developed our data cleaning pipeline in R software and used rule-based approaches, non-linear mixed-effects mathematical models and text analysis to identify common errors such as duplicate entries, typing, decimal point, unit, menu/option, intentional, website-generated and measurement errors. Individuals were permitted to differ from the population by making use of repeated measurements and alternative data sources. The method avoids the modification of unusual but biologically plausible values, prioritise data repair over removal and explicitly report the decision making process behind why a particular data entry is modified or deleted. We validated our cleaning algorithm for growth variables (weight and height) on three other independent data sources from studies with fundamentally different designs; veterinary consultation Labrador Retriever weight records from the SAVSNET (Small Animal Veterinary Surveillance Network), clinical Labrador Retriever weight records from a veterinary hospital network and a publically available (via the UK Data Service) human weight and height data from CLOSER (Cohort & Longitudinal Studies Enhancement Resources) with varying proportions of artificially simulated errors. We found that our algorithm could be reproducibly applied as an effective data cleaning method on all of the validation datasets. We also compared our method to uncleaned data and six different cleaning methods and found that our algorithm out-performed these with greater accuracy and fewer unnecessary data deletions. There is an increasing demand for data cleaning methodologies to be thoroughly reported so that they can be reproduced, tested and adapted by the wider research community. In the future, it is vital that data cleaning is considered an integral part of study design and should be considered as early as possible in order to ensure that the quality of the data is conserved. Our methods have broad applicability to longitudinal and cross-sectional growth data and we propose that they could be adapted for use in other breeds, species and fields. Keywords: Canine, Morphology, Technical Advances Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.