Ground-and space-based GPS data ingestion into the NeQuick model

This paper presents a technique for ingesting ground-and space-based dual-frequency GPS observations into a semi-empirical global electron density model. The NeQuick-2 model is used as the basis for describing the global electron density distribution. This model is mainly driven by the F2 ionosphere layer parameters (i.e. the electron density, N m F 2, and the height, h m F 2 of the F2 peak), which, in the absence of directly measured values, are computed from the ITU-R database (ITU-R 1997). This data-base was established using observations collected from 1954 to 1958 by a network of around 150 ionospheric sounders with uneven global coverage. It allows computing monthly median values of N m F 2 and h m F 2 (intra-month variations are averaged), for low and high solar activity. For intermediate solar activity a linear interpolation must be performed. Ground-based GNSS observations from a global network of ∼ 350 receivers are pre-processed in order to retrieve slant total electron content (sTEC) information, and space-based GPS observations (radio occultation data from the FORMOSAT-3/COSMIC constellation) are pre-processed to retrieve electron density (ED) information. Both, sTEC and ED are ingested into the NeQuick-2 model in order to adapt N m F 2 and h m F 2, and reduce simultaneously both, the observed minus computed sTEC and ED differences. The ﬁrst experimental results presented in this paper suggest that


Introduction
Researches exploring the potentialities of GPS observations to study the Earth's ionosphere can be traced back to the eighties (e.g.Kleusberg 1986;Feess and Stephens 1987;Lanyi and Roth 1988;Wild et al. 1989).A milestone in that field of research was the establishment of the International GNSS Service (IGS) Ionospheric Working Group (IGS-IWG), in May 1998, which has been producing uninterrupted time series of Global Ionospheric Maps (GIMs) (Feltens and Schaer 1998;Schaer et al. 1996;Manucci et al. 1998;Feltens 1998;Hernández-Pajares et al. 1999).Based on dual-frequency observations from the IGS network, the GIMs provide a world-wide map of the vertical total electron content (vTEC) with a time resolution of 2 h.As a new product, IGS GIMs were cautiously considered by the Aeronomy community but, at present, they are well valued and routinely used for both, scientific and technological studies of the Earth's ionosphere (Hernández-Pajares et al., 2009).
Most TEC models developed within the geodetic community can be characterized as "data-driven models", meaning that they rely mostly on the information provided by the GNSS data rather than on the chemical and physical processes that actually drive the vTEC distribution.A common way to avoid the formulation of those complex models is to adopt the so-called "thin shell approximation", in which the whole ionosphere is approximated by one (in some cases two) shell(s) of infinitesimal thickness(es) with equivalent vTEC, and a geometrical mapping function is used to relate the satellite-to-receiver slant total electron content (sTEC) to the vTEC on the shell(s) (Manucci et al. 1999).Spatial and temporal variations of vTEC on the shell are reproduced using different kinds of mathematical techniques, e.g.spherical harmonic expansion (Azpilicueta et al. 2005), kriging (Orús et al. 2005), B-spline expansion (Schmidt et al. 2008), etc.The numerical coefficients involved in the mathematical description of the vTEC are estimated from the GNSS observations using some optimization technique (e.g.least squares or Kalman filter when real-time results are desired).Simultaneously to the mathematical coefficients, the so-called inter frequency biases (IFBs) must be estimated in order to account for frequencydependent delays that are not produced by the ionosphere but by the GNSS satellites and receivers hardware and firmware (Sardon et al. 1994).An alternative to the thin shell approach is the "ionospheric tomography" (Hernández-Pajares et al. 1999).In this case the ionosphere is parcelled in 3dimensional cells or "voxels", and the electron density (ED) inside each voxel is estimated from the satellite-to-receiver ray paths that pass through it.Mathematical constraints must be imposed to fill up voxels that are not crossed by any ray path.
Another milestone for the ionospheric research based on GPS observations was the GPS/MET experiment, conducted by the University Corporation for Atmospheric Research (UCAR), which placed a dual-frequency GPS receiver on board a low-Earth orbiter (LEO) satellite (Hajj et al. 1994).This and other dual-frequency GPS receivers flying on LEO satellites (e.g.SAC-C, CHAMP, GRACE, FORMOSAT-3/COSMIC, etc.) put the so-called "radio occultation techniques" into practice and made possible the use of GPS observations to derive information about the ED distribution at different heights in the ionosphere (Jakowski et al. 2004).These techniques are also based on "data-driven model" which can be classified into two main groups: the already mentioned "ionospheric tomography" (Leitinger et al. 1997) and the "Abel transform" (Schreiner et al. 1999).The last one retrieves the ED from either the bending angle or the sTEC of the LEO-GPS ray path.The classical formulation of the Abel transform technique relies upon the assumption of spherical symmetry of the ED distribution, but an improved version of it overcame that assumption by including information about the horizontal gradients present in the neighboring area of the radio occultation event.Such a kind of information can be extracted, for instance, from the vTEC distribution provided by the IGS GIMs (Hernández-Pajares et al. 2000).
In spite of its relative simplicity, GNSS-data driven models are able to produce a reliable representation of both, ED and TEC, even with better accuracy than other models founded in more sophisticated empirical or chemical and physical basements.Even if true, this fact does not implies at all that the efforts for improving the last kind of models lack of sense: far from this, the aeronomy community is convinced that only a full synergy between models and data can help to understand the great complexity of ionospheric processes and predict the main ionospheric variables.The techniques being developed to exploit that synergy can be classified into two major groups: "data assimilation" and "data ingestion".The first one is based on the use of the so-called "first principle ionospheric models", which have to solve a non linear and coupled system of differential equations for each one of the most abundant ion species in the Earth's atmosphere (Schunk 2002;Wang et al. 2004).That system accounts for the mass conservation principle through the continuity equation, the momentum conservation principle through the Navier-Stokes equation, the energy conservation principle through the temperature equation, etc. Solving the system requires the use of models for computing the main driving forces, such as the neutral winds, the gravity and Lorenz forces, etc.The integration of the differential equation system originates a number of initial and boundary conditions that are estimated by means of the data assimilation process, which is usually done, state by state of the model, by means of a Kalman filter.Data ingestion differs from data assimilation by the fact that physical and chemical models are simplified and parameterized in terms of a given set of model parameters, which are estimated in order to minimize the deviations between observed and computed values.Simplified models can be based either on empirical facts (e.g.IRI (Bilitza 2001), Ne-Quick (Radicella and Leitinger 2001), etc.) or physical laws (SUPIM (Bailey et al. 1997), SAMI2 (Huba et al. 2000), etc.).
In this contribution, we present a technique to ingest GPS-derived sTEC and FORMOSAT-3/COSMIC-derived ED into the NeQuick empirical model.In Sect. 2 we describe how the NeQuick model is parametrized in order to fulfil the requirements of the data ingestion technique; in Sect. 3 we describe the data ingestion technique itself; in Sect. 4 we present the obtained results which constitute the first assessment regarding the goodness of the data ingestion technique; finally, we summarize the main conclusions learned from this research, including some interesting possibilities to further improve the data ingestion technique here presented that will be investigated in forthcoming works.

Base model for data ingestion
Version 2 of the NeQuick model (Nava et al. 2008) is used as the basis to describe the time varying global ED distribution at global scale.This version is a recent evolution of the one that is being implemented by the Galileo GNSS for correcting the ionospheric range delay error for single-frequency operation.NeQuick-2 describes the vertical profile of ED by means of the superposition of six semi-Epstein functions that represent the lower and upper parts of the E and F1 layers, the lower part of the F2 layer, and the topside ionospheric profile.The functions are anchored to the ED, N m , and the height, h m , of the corresponding layer peak, which can be either measured (with ionospheric sounders) or modelled.Besides, the thickness of each layer is modelled with different functions for its lower and upper parts.
Despite the existence of three anchor points in the NeQuick-2 formulation, the shape of the profile is dominated by the F2 parameters, N m F2 and h m F2.In the absence of measurements NeQuick-2 proceeds in the same way as the IRI model and computes values for those parameters based on a climatological database.As it will be soon explained, this database allows computing monthly mean values of the critical frequency, f 0 F2, and the transfer parameter, M 3000 F2 and, from them, NeQuick-2 computes N m F2 using the simple relation: and h m F2 using the Dudeney (1974) formulae in connection with M 3000 F2 and the f 0 F2/ f 0 E ratio: where the M factor is given by The critical frequency and the transfer parameter in the Eqs.
(2) are given in MHz, the ED in electrons/m 3 and the height in km.
According to the mapping technique developed by Jones and Gallet (1965), the diurnal variation of f 0 F2 and M 3000 F2 is given by Fourier series: where is the parameter to be mapped; t is UT; and J is the maximum number of harmonics for mapping the diurnal variation (J = 6 for f 0 F2 and J = 4 for M 3000 F2).The geographic variation of these parameters is accounted for the Fourier coefficients in the form where K = 75 for f 0 F2 and K = 49 for M 3000 F2; U are the numerical coefficients of the expansion [998 U f coefficients for f 0 F2 (13 from Eq. 3.a by 76 from Eq. 3.b) and 450 U M coefficients for M 3000 F2(9 from Eq. 3.a by 50 from Eq. 3.b)]; and G are special functions whose explicit form depends on the k index, for example where ϕ and λ are the geographic latitude and longitude and μ is the so-called "modip" latitude defined as I is the magnetic dip at 350 km above the Earth's surface.
The maximum number of terms for mapping the diurnal and geographic variations of f 0 F2 and M 3000 F2 (J and K in Eqs.3.a, 3.b) was established by fitting a set of measured values with an increasing number of terms until the root mean square difference between data and model did not further improve.The final remaining root mean square was about 0.5 MHz for f 0 F2 and 0.5 for M 3000 F2 (see Jones and Gallet 1965 for details).
The particular G functions chosen by Jones and Gallet as bases for mapping the geographical variability of f 0 F2 and M 3000 F2 seem to be very well adapted for f 0 F2.With only 998 coefficients, the technique is able to map very well the sharp peaks and the deep valley between these peaks that f 0 F2 exhibits in response to the Appleton anomaly (features that are not so well mapped with other base-function such as the associated Legendre functions).Besides, the use of the modip latitude helps to cope with the distortion that the Earth's magnetic field causes on this complex structure.

Data ingestion technique
NeQuick-2 relies upon the climatological database provided by the Radio Communication Sector of the International Telecommunication Union, namely the ITU-R database (ITU-R 1997), to compute the U coefficients required by the f 0 F2 and M 3000 F2 expansions given in Eq. (3).This database was established using observations collected from 1954 to 1958 by a network of around 150 ionospheric sounders unevenly distributed around the world and provides two sets of U coefficients, one for low and another for high solar activity, for every month of the year.The solar activity is characterized by the 12-month running mean value of the monthly mean sunspot number, R 12 .For a given month and R 12 value, the U coefficients must be linearly interpolated from the tabulated values, which correspond to R 12 = 0 (low) and R 12 = 100 (high solar activity) The outdated ITU-R database, the monthly median averages that do not account for the day-by-day variations, the linear interpolation used to account for the solar activity, and the intrinsic errors of the database, cause significant differences between the computed and the actual values of f 0 F2 and M 3000 F2, thus inducing large errors on NeQuick-2 (e.g.Jodogne et al. 2004).Under this premise we look for a data ingestion technique that allows using GPS-sTEC and FORMOSAT-3/COSMIC-ED to compute corrections U f and U M to the U f,0 and U M,0 values computed from the ITU-R database, so that the corrected values, can reduce the differences between observed and computed values.With this idea in mind, the NeQuick-2 model is parametrized in terms of the 998 + 450 = 1, 448 U coefficients: where the ∂ N eN ∂ f 0 F 2 and ∂ N eN ∂ M 3000 F 2 derivatives are numerically computed and the other derivatives are analytically computed from Eq. (3): k = 1, . . ., 75 and k = 1, . . ., 49 and j = 1, . . ., 6; j = 1, . . ., 4 (4.c) The data ingestion is performed by means of an adaptative and robust Kalman filter through which the set of 998 + 450 = 1, 448U coefficients is updated every hour.Two kinds of observations were simultaneously ingested by the filter into the NeQuick-2 model: • GPS derived sTEC, which give rise to the following equation of observation: where γ is the satellite-to-receiver line-of-sight, and β S and β R are the GPS satellite and receiver inter-frequency biases.
• FORMOSAT-3/COSMIC derived ED, which give rise to the following equation of observation: It is worth pointing out that Eq. (4.d) does not contain corrections to the M 3000 F2 parameter.This is because the ground-based GPS sTEC does not provide enough information on the ED variation with height and are hence almost insensitive to M 3000 F2.

Goodness assessment of the data ingestion technique
The GPS observations were pre-processed with the La Plata ionospheric model (LPIM) (Azpilicueta et al. 2005) in order to derive unambiguous but uncalibrated sTEC.LPIM computes the geometry-free linear combination from both, carrier-and code-phase dual-frequency GPS observations; detects and corrects (when possible) carrier-phase cycle slip; and estimates and reduces carrier-phase ambiguities by levelling the carrier-to the code-phase geometry-free linear combination.These pre-processes produce carrier-phase derived sTEC, free from ambiguities and cycle slips but affected by the GPS satellite and receiver inter-frequency biases, β S and β R .
FORMOSAT-3/COSMIC radio occultation data were processed by means of the improved Abel inversion technique to retrieve ED from the dual frequency GPS observations collected by the receiver on board the LEOs (see details in Aragón-Ángel et al. 2009).The technique used in this research overcomes the limitation imposed by the spherical symmetry assumption that underlies on the classical Abel inversion technique, which is equivalent to neglect the horizontal gradients of the actual ED distribution and considering only its variation with height.According to the "separability hypothesis" introduced by Hernández-Pajares et al. (2000), the ED distribution can be expressed as the product of the vTEC distribution, which accounts for the horizontal ED gradients, and the "shape function", which accounts for the ED dependence on height.The IGS GIMs supply the necessary information regarding the vTEC distribution while the shape function, and hence the ED, is estimated from the L 1 carrier-phase excess.Just to provide an example, Fig.
(1) illustrates the geographical variability of the corrections to f 0 F2 (upper panel) and M 3000 F2 (bottom panel), at 12 h UT of one of the 10 days computed in this research.More specifically, the figure shows the percentile value of these corrections in terms of the values computed from the ITU-R database, i.e. 100 × ( f 0 F2 − f 0 F2 0 ) / f 0 F2 0 and 100 × (M 3000 F2 − M 3000 F2 0 ) /M 3000 F2 0 .Dashed white lines represent the modip parallels of ±60 • , ±30 • and 0 • that roughly delimit the high-, mid-and low-latitude ionosphere.The solar terminator (i.e. the imaginary line that divides the day from the night-sector) is also depicted with a dashed white line.We do not attempt here to discuss the physical significance of the corrections computed for a particular day.
Nevertheless, some general remarks may be interesting at this point: • the f 0 F2 correction reaches values around −100% at high North latitude and +100% at high South latitude; at mid latitude and day-time the correction is around −15% in the Northern and +20% in the Southern Hemisphere; at low latitude and night-time the correction is around −40%. • the M 3000 F2 correction is around +5% at high North latitude and reaches values greater than +20% at high South latitude; at mid and low latitude and day-time the correction is around −15% in the Southern Hemisphere; at mid and low latitude and night-time the correction is around +15% in the Northern Hemisphere.
Figure (2) shows typical examples extracted from the ∼9, 000 radio occultations analysed in this work.The figures show the ED (electron/m 3 ) in the x axe and the height (kilometres) in the y axes.The three panels on the lefthand-side correspond to night-time radio occultations and the three panels in the right-hand-side correspond to daytime radio occultations.From top to bottom the panels correspond to high-, mid-and low-latitude radio occultations.The points represent the FORMOSAT-3/COSMIC derived ED, the dashed line represents the ED from NeQuick before data ingestion, and the solid line represents the ED from NeQuick after data ingestion.In all the cases presented in this figure, the data ingestion technique is able to significantly reduce the deviations between the observed and computed ED.This fact is confirmed by the statistics presented in Fig. (3), which show the distribution of the observed minus computed ED deviations (in electron/m 3 in the x axis) and percentage of samples (in the y axis) for the ∼9000 radio occultations analysed in this work.The goodness of the technique is evident when the distributions after (solid line) and before (dashed line) data ingestion are compared: both, the mean value and the standard deviation are reduced from −2.9×10 10 ± 7.0× 10 10 to 0.1 × 10 10 ± 2.1 × 10 10 electron/m 3 .Figure (4) shows typical examples extracted from the ∼ 350 ground-based GPS stations considered in this study.Each group of three panels of this figure shows sTEC variations (in the y axis, in TECu) as function of the local time (x axis, in hour) for selected stations located, from top to bottom, in high-, mid-and low-latitude regions, for one of the 10 days analysed in this work.The upper panel of each group shows the sTEC computed from NeQuick-2 after data ingestion, while the middle and bottom panels show the observed minus computed deviations before and after data ingestion.In all the cases presented in this figure, the data ingestion technique is able to significantly reduce the deviations between the observed and computed sTEC.This fact is confirmed by the statistics presented in

Conclusions and further works
This paper presented a global-scale technique to ingest ground-based GPS-derived sTEC and space-based FORMOSAT-3/COSMIC derived ED into the NeQuick-2 model.The technique relies upon the estimation of corrections to the ITU-R climatological database coefficients in order to simultaneously minimize the observed minus computed sTEC and ED differences.The first experimental results presented in this paper suggested that the technique is self consistent and able to reduce the observed minus computed differences to ∼ 25 − 30% of the values computed from the ITU-R database for both, ground-and space-based data.It is worth mentioning that ground-based sTEC and space-based ED were derived from different algorithms and models.
In the future we intend to analyse the performance of the technique under different solar and geomagnetic conditions; validate the obtained sTEC and ED by comparing them to different data sources (e.g.TOPEX, Jason, ionospheric sounders, etc.); and validate the f 0 F2 and M 3000 F2 parameters computed with this technique by comparing them to ionospheric sounders and incoherent scatter radar determinations.Besides, we intend to change the parameterization of the NeQuick-2 model using h m F2 instead of M 3000 F2.As it was already mentioned in the second section of this paper, the base functions used by the Jones and Gallet's technique is well suited for mapping the sharp peaks and deep valley of f 0 F2 in the region of the Appleton anomaly, but M 3000 F2 exhibits a different behaviour that could be better mapped using different base functions (e.g.Legendre's associated functions).In addition, the derivation of h m F2 from M 3000 F2 using the Dudeney's formulae (Eqs.2) makes the estimation the M 3000 F2 corrections rather unstable.

Fig. 1
Fig. 1 Percentile value of the corrections to the f 0 F2 (upper panel) and M 3000 F2 (bottom panel) values computed from the ITU-R database

FigFig. 2 Fig. 3
Figure (2)  shows typical examples extracted from the ∼9, 000 radio occultations analysed in this work.The figures show the ED (electron/m 3 ) in the x axe and the height (kilometres) in the y axes.The three panels on the lefthand-side correspond to night-time radio occultations and the three panels in the right-hand-side correspond to daytime radio occultations.From top to bottom the panels correspond to high-, mid-and low-latitude radio occultations.The points represent the FORMOSAT-3/COSMIC derived ED, the dashed line represents the ED from NeQuick before data ingestion, and the solid line represents the ED from NeQuick after data ingestion.In all the cases presented in this figure, the data ingestion technique is able to significantly reduce the deviations between the observed and computed ED.This fact is confirmed by the statistics presented in Fig.(3), which show the distribution of the observed minus computed ED deviations (in electron/m 3 in the x axis) and percentage of samples (in the y axis) for the ∼9000 radio occultations analysed in this work.The goodness of the technique is evident when the distributions after (solid line) and before (dashed line) data ingestion are compared: both, the mean value and the standard deviation are reduced from −2.9×10 10 ± 7.0× 10 10 to 0.1 × 10 10 ± 2.1 × 10 10 electron/m 3 .Figure (4)  shows typical examples extracted from the ∼ 350 ground-based GPS stations considered in this study.Each group of three panels of this figure shows sTEC variations (in the y axis, in TECu) as function of the local time (x axis, in hour) for selected stations located, from top to bottom, in high-, mid-and low-latitude regions, for one of the 10 days analysed in this work.The upper panel of each group shows the sTEC computed from NeQuick-2 after data ingestion, while the middle and bottom panels show the observed minus computed deviations before and after data ingestion.In all the cases presented in this figure, the data ingestion technique is able to significantly reduce the deviations between the observed and computed sTEC.This fact is confirmed by the statistics presented in Fig.(5), which shows

Fig. 4
Fig.4Daily sTEC variation for selected stations at high, mid and low latitude (from top to bottom); the upper panel of each group shows the sTEC from NeQuick-2 after data ingestion, while the middle and bottom panels show the observed minus computed deviations before and after data ingestion (data gaps in the plots are due to limitations of the processing software to handle the transition between 24 to 0 UT)

Fig. 5
Fig. 5 Percentile distribution of the observed minus computed sTEC deviations before (dashed line) and after (solid lines) data ingestion