TY - JOUR
T1 - Transferring spectroscopic stellar labels to 217 million Gaia DR3 XP stars with SHBoost
AU - Khalatyan, A.
AU - Anders, F.
AU - Chiappini, C.
AU - Queiroz, A. B.A.
AU - Nepal, S.
AU - Dal Ponte, M.
AU - Jordi, C.
AU - Guiglion, G.
AU - Valentini, M.
AU - Torralba Elipe, G.
AU - Steinmetz, M.
AU - Pantaleoni-González, M.
AU - Malhotra, S.
AU - Jiménez-Arranz, null
AU - Enke, H.
AU - Casamiquela, L.
AU - Ardèvol, J.
N1 - Publisher Copyright:
© The Authors 2024.
PY - 2024/11/1
Y1 - 2024/11/1
N2 - With Gaia Data Release 3 (DR3), new and improved astrometric, photometric, and spectroscopic measurements for 1.8 billion stars have become available. Alongside this wealth of new data, however, there are challenges in finding efficient and accurate computational methods for their analysis. In this paper, we explore the feasibility of using machine learning regression as a method of extracting basic stellar parameters and line-of-sight extinctions from spectro-photometric data. To this end, we built a stable gradient-boosted random-forest regressor (xgboost), trained on spectroscopic data, capable of producing output parameters with reliable uncertainties from Gaia DR3 data (most notably the low-resolution XP spectra), without ground-based spectroscopic observations. Using Shapley additive explanations, we interpret how the predictions for each star are influenced by each data feature. For the training and testing of the network, we used high-quality parameters obtained from the StarHorse code for a sample of around eight million stars observed by major spectroscopic stellar surveys, complemented by curated samples of hot stars, very metal-poor stars, white dwarfs, and hot sub-dwarfs. The training data cover the whole sky, all Galactic components, and almost the full magnitude range of the Gaia DR3 XP sample of more than 217 million objects that also have reported parallaxes. We have achieved median uncertainties of 0.20 mag in V-band extinction, 0.01 dex in logarithmic effective temperature, 0.20 dex in surface gravity, 0.18 dex in metallicity, and 12% in mass (over the full Gaia DR3 XP sample, with considerable variations in precision as a function of magnitude and stellar type). We succeeded in predicting competitive results based on Gaia DR3 XP spectra compared to classical isochrone or spectral-energy distribution fitting methods we employed in earlier works, especially for parameters AV and Teff, along with the metallicity values. Finally, we showcase some potential applications of this new catalogue, including extinction maps, metallicity trends in the Milky Way, and extended maps of young massive stars, metal-poor stars, and metal-rich stars.
AB - With Gaia Data Release 3 (DR3), new and improved astrometric, photometric, and spectroscopic measurements for 1.8 billion stars have become available. Alongside this wealth of new data, however, there are challenges in finding efficient and accurate computational methods for their analysis. In this paper, we explore the feasibility of using machine learning regression as a method of extracting basic stellar parameters and line-of-sight extinctions from spectro-photometric data. To this end, we built a stable gradient-boosted random-forest regressor (xgboost), trained on spectroscopic data, capable of producing output parameters with reliable uncertainties from Gaia DR3 data (most notably the low-resolution XP spectra), without ground-based spectroscopic observations. Using Shapley additive explanations, we interpret how the predictions for each star are influenced by each data feature. For the training and testing of the network, we used high-quality parameters obtained from the StarHorse code for a sample of around eight million stars observed by major spectroscopic stellar surveys, complemented by curated samples of hot stars, very metal-poor stars, white dwarfs, and hot sub-dwarfs. The training data cover the whole sky, all Galactic components, and almost the full magnitude range of the Gaia DR3 XP sample of more than 217 million objects that also have reported parallaxes. We have achieved median uncertainties of 0.20 mag in V-band extinction, 0.01 dex in logarithmic effective temperature, 0.20 dex in surface gravity, 0.18 dex in metallicity, and 12% in mass (over the full Gaia DR3 XP sample, with considerable variations in precision as a function of magnitude and stellar type). We succeeded in predicting competitive results based on Gaia DR3 XP spectra compared to classical isochrone or spectral-energy distribution fitting methods we employed in earlier works, especially for parameters AV and Teff, along with the metallicity values. Finally, we showcase some potential applications of this new catalogue, including extinction maps, metallicity trends in the Milky Way, and extended maps of young massive stars, metal-poor stars, and metal-rich stars.
KW - Catalogs
KW - Galaxy: general
KW - Galaxy: stellar content
KW - Galaxy: structure
KW - Stars: general
KW - Stars: statistics
UR - http://www.scopus.com/inward/record.url?scp=85208658873&partnerID=8YFLogxK
U2 - 10.1051/0004-6361/202451427
DO - 10.1051/0004-6361/202451427
M3 - Article
AN - SCOPUS:85208658873
SN - 0004-6361
VL - 691
JO - Astronomy and Astrophysics
JF - Astronomy and Astrophysics
M1 - A98
ER -