An Ibero-American inter-laboratory trial to evaluate serological tests for the detection of anti-Neospora caninum antibodies in cattle

We carried out an inter-laboratory trial to compare the serological tests commonly used for the detection of specific Neospora caninum antibodies in cattle in Ibero-American countries. A total of eight laboratories participated from the following countries: Argentina (n = 4), Brazil (n = 1), Peru (n = 1), Mexico (n = 1), and Spain (n = 1). A blind panel of well-characterized cattle sera (n = 143) and sera representative of the target population (n = 351) was tested by seven in-house indirect fluorescent antibody tests (IFATs 1–7) and three enzyme-linked immunosorbent assays (ELISAs 1–3; two in-house and one commercial). Diagnostic performance of the serological tests was calculated and compared according to the following criteria: (1) the “Pre-test information,” which uses previous epidemiological and serological data; (2) the “Majority of tests,” which classifies a serum as positive or negative according to the results obtained by most tests evaluated. Unexpectedly, six tests showed either sensitivity (Se) or specificity (Sp) values lower than 90%. In contrast, the best tests in terms of Se, Sp, and area under the ROC curve (AUC) values were IFAT 1 and optimized ELISA 1 and ELISA 2. We evaluated a high number of IFATs, which are the most widely used tests in Ibero-America. The significant discordances observed among the tests regardless of the criteria employed hinder control programs and urge the use of a common test or with similar performances to either the optimized IFAT 1 and ELISAs 1 and 2.


Introduction
Neospora caninum is a protozoan parasite that is considered one of the main bovine abortifacient pathogens worldwide . Notably, South American countries and Mexico account for more than 386 million cattle, representing one of the main agricultural activities in this region (Moore 2005). Economic losses associated with bovine neosporosis may exceed more than 1 billion dollars worldwide, and for South America and Mexico, the global disease cost was estimated at 403 million dollars (Reichel et al. 2013). Different studies have shown that N. caninum is widespread with high seroprevalence rates in dairy cattle (Moore 2005). In Argentina, a study performed in La Pampa region reported 9 and 20.5% seroprevalences in beef and dairy cattle, respectively (Fort et al. 2015). In Uruguay, an overall seroprevalence of 13.9% in beef cattle was estimated (Bañales et al. 2006). Seroprevalences of 46.7 and 10.6-21.6% were reported for dairy herds in Peru and Brazil, respectively (Granados et al. 2014;Boas et al. 2015). Similarly, in Mexico, seroprevalence rates varied from 11.6 to 42% for beef and dairy cattle, respectively (García-Vázquez et al., 2005. However, these studies are not comparable due to different experimental designs, serological tests, and cutoff values employed. Cutoff values are particularly relevant since the diagnostic performance may significantly influence the success of control programes. Unfortunately, there is no vaccine currently available for N. caninum. Therefore, the control of neosporosis relies on management measures coupled with diagnosis (McAllister 2016; Reichel et al. 2015). At this stage, serological monitoring is the most useful tool for decision-making during disease control .
Even though there are many serological assays available, there is no appropriate reference test to define a true-positive or true-negative animal . Agglutination tests (NAT) have the advantage of not requiring specific conjugates and, therefore, are suitable for wildlife species (Almería 2013;Donahoe et al. 2015). However, false-positive results are a major drawback Moraveji et al. 2012). Immunoblot (IB) is highly sensitive and specific, but laborious and time consuming, therefore used as a confirmatory test for doubtful results . Hence, the most commonly used techniques for the detection of anti-N. caninum antibodies in cattle in Mexico and South America are the indirect fluorescent antibody test (IFAT) and ELISA, the latter being suitable for large-scale investigations and more objective in result interpretation compared to IFAT. There are many commercial ELISAs available with wide distribution in Europe and North America (Álvarez-García et al. 2013). Unfortunately, the high costs of acquiring them and the endless importation process are significant obstacles for many Ibero-American countries. In this scenario, many local laboratories use in-house serological tests for anti-N. caninum antibody detection and, more frequently, IFAT .
Unfortunately, comparisons and interpretations of data are less reliable and more difficult due to the lack of standardization inter-laboratory trials among South American countries as well as other countries in the Americas, such as Mexico. This type of study has been performed in Europe and North America (von Blumröder et al. 2004;Wapenaar et al., 2007;Álvarez-García et al. 2013), where a comparison of the diagnostic performances of the most routinely used in-house and commercial tests caused a readjustment in the techniques. Moreover, a constant reassessment and adaptation of diagnostic tests to different epidemiological situations are highly recommended (World Organization for Animal Health (OIE) 2013).
To address this issue, the aim of the present study was to compare the serological tests commonly used for the detection of anti-N. caninum specific antibodies (i.e., seven IFATs and three ELISAs) in Ibero-American countries with the ultimate goal of standardizing the serological tests to obtain comparable results. Each laboratory provided serum samples that were submitted to the Immunoparasitology Laboratory in Argentina, where the panel was blind coded, and aliquots were shipped on dry ice to each participating laboratory. Most sampled animals were older than 6 months to avoid the presence of colostral antibodies, and precolostral sera from newborn calves were also included.

Materials and methods
Sera from Group 1 came from Spain, whereas sera from Groups 2 and 3 came from Mexico, Brazil, Argentina, and Spain. The serum panel (n = 523) comprised the following three categories: Group 1 comprised sera from animals infected with Besnoitia besnoiti (n = 29). The cross-reactivity with the apicomplexan parasite B. besnoiti was studied to determine the analytical specificity (Sp). Sera came from herds with clinically affected animals that showed the clinical signs of chronic besnoitiosis, such as pathognomonic tissue cysts in scleral conjunctiva, hyperkeratosis, and alopecia. Besnoitia besnoiti infection was confirmed by immunoblot (García-Lunar et al. 2013).
Group 2 included well-characterized sera (n = 143) from naturally and experimentally infected cattle. From naturally infected cattle, 80 serum samples from dairy cattle (38 positive and 42 negative sera) were analyzed; of these, 56 samples came from mother-calf pairs (n = 28 pairs); 6 samples came from precolostral calves; and 18 sera came from cows. The criterion to classify the sera as positive or negative was based on a combination of clinical data and a well-defined serostatus as follows: (1) mother and their corresponding calves were both either seropositive (n = 11 pairs) or seronegative (n = 17 pairs); (2) 4 seropositive precolostral and 2 seronegative precolostral calves were born from either seropositive or seronegative cows, respectively; (3) 12 seropositive and 6 seronegative cows. In addition, positive sera came from herds with a previous history of Neospora-associated abortions and three seropositive cows had previously aborted due to N. caninum infection. The serostatus was assessed by two complementary tests (ELISA or IFAT by the submitting laboratory and by a complementary immunoblot)  to discriminate between positive and negative results, and all samples showed repetitive serological results in at least two month-consecutive samplings.
In addition, 63 samples were collected from experimentally infected heifers. Twenty-three heifers were intravenously infected (iv) with 10 7 live N. caninum tachyzoites of NC-7 (n = 6) and NC-8 (n = 6) isolate (Regidor-Cerrillo et al. 2014) and with 10 8 live tachyzoites of NC-1 isolate (n = 11). Eleven heifers received phosphate-buffered saline (PBS) iv and remained as negative controls. Sequential serum samples were collected twice a week until 13 days post infection (dpi), then once a week until the end of the experiment (35 dpi). These samples were assayed by CIVTEST ELISA (de Yaniz et al. 2007, Hecker et al. 2013. Infected animals seroconverted from 14 dpi and all samples from 21 dpi were positive and included in the present study. In summary, 45 positive and 18 negative sera were analyzed. Group 2 was considered as reference sera according to BPre-test information^(see BStatistical analysis of data^section).

Serological assays
Serum samples were analyzed by seven in-house IFATs, two inhouse ELISAs, and one commercial ELISA. The tests were performed following the laboratory protocols for the in-house tests and the manufacturer's instructions for the commercial test.

IFATs
A similar procedure was carried out by all participants. The most relevant differences relied on the secondary antibodies and the fluorescence microscope employed (see Table 1). Sera were diluted by two-fold serial dilutions starting at a 1:50 dilution in PBS to the endpoint titer. Suspensions of intact formalin fixed N. caninum (NC-1 isolate) tachyzoites (10 7 /mL) and tachyzoites purified by a Percoll gradient (IFAT 3) were airdried on glass slides (10 μL/well) and fixed with either ice-cold acetone or methanol. Sera diluted in PBS were added and incubated for 30 min (37°C). Then, the slides were gently rinsed with carbonate buffer at pH 9 and washed for 10 min. A fluorescein isothiocyanate (FITC)-labeled affinity-purified rabbit anti-bovine IgG antibody conjugate was incubated with the samples at the appropriate dilution in PBS. After a 30-min incubation (37°C), the washing step was repeated. Slides were observed with a fluorescence microscope. Unbroken fluorescence of the tachyzoite membrane was considered a positive reaction. A cutoff value of 1:100 was applied for all IFATs.

ELISAs
The ELISA procedures were carried out as previously described by others (see Table 1). The three ELISAs employed sonicate lysate of NC-1 tachyzoites as antigen to coat the wells. The major differences relied on the secondary antibodies and cutoff values employed. The test results for ELISA 1 were expressed as percentage of positivity (PP), calculated as follows: PP = (OD 405 sample × 100)/(OD 405 positive control). ELISA 2 and ELISA 3 results were expressed as follows: the optical density (OD) was converted into a relative index percent (RIPC) by the following formula: RIPC = (OD 405 sample) − (OD 405 negative control)/(OD 405 positive control) − (OD 405 negative control) × 100. The cutoffs employed for ELISA 1, ELISA 2, and ELISA 3 were PP ≥ 25, RIPC > 10 (RIPC values between 6 and 10 were considered doubtful), and RIPC > 8.2 (RIPC values between 6 and 12 were considered doubtful), respectively.

Statistical analysis of data
Diagnostic performance of serological tests for the detection of antibodies to N. caninum was calculated according to the following criteria based on previous works (von Blumröder et al. 2004;Álvarez-García et al. 2013). The first criterion was based on the pre-test information (BPre-test information^). This information was only available for samples from Group 2 and the criteria to consider a sample as positive or negative have been thoroughly described in BExperimental design and serum panel conformation^section. The second criterion was based on the results of the majority of the tests here evaluated (BMajority^). Samples from Groups 2 and 3 were analyzed using the last criterion. For Group 3, BMajority^values were defined by combining all ten tests, only the seven IFATs or only the three ELISAs in separate analyses (see Tables 4, 5, and 6). Two-graph receiver operating characteristic (TG-ROC) analyses were carried out relative to the Pre-test information criterion (SigmaPlot 12.0 software, Systat Software, Inc., San José, CA, USA). According to an arbitrary guideline for the ROC analysis, the area under the curve (AUC) was evaluated as follows: non-informative (AUC = 0.5), less accurate (0.5 < AUC ≤ 0.7), moderately accurate (0.7 < AUC ≤ 0.9), highly accurate (0.9 < AUC < 1), and perfect tests (AUC = 1) (Swets 1988). According to TG-ROC analyses, improved cutoff values (recalculated cutoffs) were applied when plausible.

Sensitivity (Se), specificity (Sp), and agreement (k values) of the tests and TG-ROC analyses according to the Pre-test information and Majority gold standard criteria
Group 2: Well-characterized sera The Se, Sp, and k values were calculated for each test based on the original cutoff values recommended by each laboratory relative to the Pre-test information criterion for Group 2. In addition, the cutoff values for those tests in which Sp could be increased without a significant reduction in Se were recalculated ( Table 2). The results showed variability among the tests with Se or Sp values lower than 90% in five of the ten evaluated tests. Moreover, two IFATs and one ELISA showed Se values lower than 80%. Initially, the best test in terms of Se, Sp, and k values was IFAT 1. Improved cutoff values were suggested for IFAT 1 and ELISAs 1, 2, and 3 (see Table 2) according to TG-ROC analyses. The ROC curves were calculated for each test, and the resulting AUCs were almost perfect for IFAT 1 and ELISA 1 and ELISA 2; highly accurate for ELISA 3, IFAT 3, IFAT 4, IFAT 5, and IFAT 6; and moderately accurate for IFAT 2 and IFAT 7 ( Table 2). The AUCs of IFAT 2 and IFAT 7 showed significant differences compared to the AUCs of the other tests (p < 0.05). With the recalculated cutoffs, the performance and k values of the ELISAs improved. Thus, the best tests in terms of Se, Sp, k, and AUC values were IFAT 1, IFAT 4, and ELISA 1.
Finally, the Se, Sp, and k values were calculated for each test based on the cutoff values recommended by each laboratory relative to the Majority criterion (Table 3). In general, most tests experienced a moderate improvement with few exceptions as follows: IFAT 5 Sp and ELISA 3 Sp decreased compared to Table 2. The highest performance corresponded to IFAT 1, IFAT 6, and ELISA 1. When the recalculated cutoff values were employed, the results improved to those obtained by the Pre-test information criterion. IFAT 1, IFAT 3, IFAT 6, ELISA 1, and ELISA 2 showed the highest Se, Sp, and k values.

Group 3: Field sera
The Se, Sp, and k values were calculated for each test based either on the original cutoff values recommended by the laboratory or the recalculated cutoff values relative to the Majority criterion (Tables 4, 5, and 6). When all the tests were compared by considering the original cutoff values, the performance of the following five tests notably diminished: IFAT 1 Sp, IFAT 2 Se, IFAT 4 Sp, IFAT 5 Se, and IFAT 6 Se compared to Table 3. In contrast, the performances of the remaining five tests were barely affected (IFAT 3, IFAT 7, ELISA 1, ELISA 2, and ELISA 3). The highest k values corresponded to ELISA 1 and ELISA 2 (Table 4). IFAT 6 and ELISA 2 Se improved notably when either the IFATs (n = 7) (Table 5) or the ELISAs (n = 3) ( Table 6) were compared separately.

Test agreement (k values)
The agreement between the tests was calculated prior to and after the TG-ROC analyses for Group 2 (Supplemental Table 1) and Group 3 (Supplemental Table 2).
For Group 2, considering the original cutoff values, 7 of the 45 pairs of tests showed a moderate agreement, 33 of the 45 pairs of tests showed a substantial agreement, and 5 of the 45 pairs of tests showed an almost perfect agreement. When considering the agreement for the 30 pairs of tests with recalculated cutoff values (after the TC-ROC analyses), 1 of the 30 pairs of tests showed a moderate agreement, 22 of the 30 pairs of tests showed a substantial agreement, and 7 of the 30 pairs of tests showed an almost perfect agreement.
For Group 3, considering the original cutoff values, 15 of the 45 pairs of tests showed a moderate agreement, 29 of the 45 pairs of tests showed a substantial agreement, and 1 of the 45 pairs of tests showed an almost perfect agreement. When considering the agreement for the 30 pairs of tests with recalculated cutoff values, 7 of the 30 pairs of tests showed a moderate agreement and 23 of the 30 pairs of tests showed a substantial agreement.
As expected, the agreement for the 30 pairs of tests with recalculated cutoff values increased for Group 2 (28/30) and Group 3 (26/30).

Discussion
We carried out an inter-laboratory trial among eight Ibero-American laboratories from Argentina, Brazil, Peru, Mexico, and Spain. The purpose of this study was to compare a wide   (Jacobson 1998). Few comparative studies of serological tests have been carried out in Europe and North America for anti-N. caninum antibody detection, most of them consisting of one participating laboratory evaluating several serological tests (Wu et al. 2002;Álvarez-García et al. 2003;Frössling et al. 2003;Waldner et al. 2004;Björkman et al. 2006;Hall et al. 2006;Álvarez-García et al. 2013;Roelandt et al. 2015). However, only one interlaboratory trial has been performed with the purpose of standardizing the serological tests used for antibody detection (von Blumröder et al. 2004). This type of study showed the usefulness of a continuous validation process to provide an accurate diagnosis and standardize different seroprevalence studies to obtain comparable results in Europe (Bartels et al. 2006). In the present study, most of the evaluated assays (9/10) were in-house tests unlike the previous comparative studies of von Blumröder et al. (2004) and Wapenaar et al. (2007) where half of the evaluated tests were commercially available. Additionally, to our knowledge, this study evaluated the greatest number of IFATs.
The standardization approach herein started with selecting a panel of well-characterized sera composed of experimentally infected and naturally exposed animals. Furthermore, a serum panel that was geographically representative and reflected the spectrum of disease was analyzed to avoid bias resulting from host responses and overestimating the Sp, which is crucial in chronic infections (Nielsen et al. 2011). This last issue is relevant since there is no perfect reference serological assay. In a previous study, IFAT was ruled out as a true reference test (Frössling et al. 2003). Therefore, instead of using a single test as a reference test, we relied on Pre-test information and Majority criteria for reducing bias, as reported previously by von Blumröder et al. (2004) and García-Lunar et al. (2013). In the present work, the congruent results obtained from sera of Groups 2 and 3 suggest that a well-characterized population reflects field population conditions.  We found an unexpected high variability among the tests. As stated by Álvarez-García et al. (2013), it is widely known that discrepancies among serological tests exist. However, different validation studies managed to overcome this limitation. As expected, analysis of Group 2 sera had a stronger diagnostic performance compared to that of Group 3. For most tests evaluated, diagnostic characteristics worsened when analyzed using the Majority criterion. Agreement among tests increased after the application of the recalculated cutoff values for Group 2, whereas the k values hardly varied for Group 3. In the present study, initially, only IFAT 1 showed good diagnostic performance. All ELISAs improved after the application of recalculated cutoff values, and ELISAs 1 and 2 performed similarly to IFAT 1. The performance of these tests is comparable to the performances of commercial ELISAs with excellent Se and Sp values (> 95%) and were supported by high AUC values (Álvarez-García et al. 2013). In addition, IFAT 3 also performed well when the Majority test criterion was applied. In contrast, the performance of IFAT 6 and ELISA 3 showed inconsistent results when using different analyses in both groups and should be improved prior to use for routine diagnosis since they frequently demonstrated low Se or Sp values (< 90%). Furthermore, three IFATs (IFATs 2, 5, and 7) had unacceptable Se values. In the present study, Sp was not a major drawback since only ELISA 3 showed a low Sp, which increased significantly after the TG-ROC analysis. As a result, the prevalence of N. caninum infection might be notably underestimated using methods with low sensitivity.
The discrepancies among IFATs found in this work could be related with technical procedures rather than with the antigen used. The low Se values evidenced by IFATs 2, 5, 6, and 7 are most likely linked to methodological issues in the laboratories rather than the existence of false-negative reactors as stated below. Moreover, results could be influenced by interoperator variability since subjective interpretation is a major disadvantage of IFAT. In addition, it is difficult to adjust IFAT cutoff values since the results are expressed as a discrete variable obtained through double serial dilutions. Thus, only one IFAT cutoff value could be recalculated (IFAT 1) to improve Sp without a detrimental effect on Se. In previous studies, validated IFATs gave variable results compared to ELISAs (Frössling et al. 2003;von Blumröder et al. 2004;Wapenaar et al. 2007). In the present study, all three ELISAs showed acceptable diagnostic performance, although slightly lower Se and Sp were recorded in comparison with those of European and American studies (von Blumröder et al. 2004;Wapenaar et al. 2007).
However, two main limitations arise from the approach followed in the present study. First, the existence of falsenegative results that have been often attributed to either antibody fluctuations through pregnancy below the cutoff value or persistently infected seronegative animals (Aguado-Martínez et al. 2008;Guido et al. 2016) cannot be ruled out. Persistently infected cattle may remain undetected when using tachyzoitebased serological tests and could be detected by tests that employ bradyzoite stage-specific proteins (Guido et al. 2016). Thus, in order to avoid false-negative results, we employed a very restrictive criterion to select the negative population. Second, if a test is more specific and does not agree with most other tests, it does not mean that it is not a good test. In order to minimize this major drawback, we only compared tests based on whole tachyzoite antigens that are expected to behave similarly, as in previous studies (von Blumröder et al. 2004;Álvarez-García et al. 2013). In fact, main findings did not vary regardless of the criterion employed herein. However, when comparing tests based on different parasite-stage antigens the criterion Majority might lead to confusing results and a combination of sequential serological analyses and sensitive and specific complementary serological tests based on tachyzoite and bradyzoite antigens should be used to define the reference cattle populations.
We also investigated if cross-reaction with the closely related apicomplexan parasite B. besnoiti existed, since bovine besnoitiosis is a reemergent disease that is spreading in Europe (Álvarez-García 2016). Moreover, a cross-reaction between anti-N. caninum antibodies and the B. besnoiti antigen has been recorded (Shkap et al. 2002;García-Lunar et al. 2015). However, we do not know whether specific anti-B. besnoiti antibodies may cross-react with N. caninum antigens. Although the disease is not present in American cattle, countries in the Americas should test cattle for besnoitiosis to avoid its entrance. Notably, the results showed that IFAT 1 (regardless of the cutoff value employed), IFAT 6, and readjusted ELISA 1 should not be used in areas where bovine besnoitiosis is present since they showed a rate of 10-30% of false positives. For IFAT 1, the number of false positives may increase up to almost 50% with the original cutoff value. Therefore, the analytical Sp of serological tests for the detection of anti-N. caninum antibodies should be evaluated in areas where B. besnoiti is present. Veterinary laboratory diagnosticians from Ibero-American countries should take into consideration the discordant results obtained herein among labs. Thus, there is a need to adopt a common test or at least tests with similar performances to either optimized IFAT 1 and ELISAs 1 and 2. For IFAT, operator training and microscope and reagent quality should be carefully reviewed as they greatly affect the results. In the future, the implementation of a commercial test may help to harmonize the diagnosis among labs to guarantee control program success. This recommendation is supported by the study performed by Álvarez-García et al. (2013) where commercially available ELISAs were compared. Moreover, these ELISAs are routinely employed in voluntary control programs for bovine neosporosis developed in Spain that contributed to reduce seroprevalence after a few years of monitoring the epidemiological situation (Guido et al. 2016). This study might set the basis for creating inter-laboratory control and monitoring networks for serological diagnosis of N. caninum infection to overcome the discrepancies and lack of consistent results. Additionally, the present study reinforces the need of regional validation of serological assays. A pending issue for animal health authorities is the accreditation of laboratories, which use validated assays based on multicenter studies as the one presented here.