TY - JOUR
T1 - Characterising dataset search—An analysis of search logs and data requests
AU - Kacprzak, Emilia
AU - Koesten, Laura
AU - Ibáñez, Luis Daniel
AU - Blount, Tom
AU - Tennison, Jeni
AU - Simperl, Elena
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2019/3
Y1 - 2019/3
N2 - Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
AB - Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
KW - Dataset search
KW - Search logs
KW - Vertical search
UR - http://www.scopus.com/inward/record.url?scp=85057246152&partnerID=8YFLogxK
U2 - 10.1016/j.websem.2018.11.003
DO - 10.1016/j.websem.2018.11.003
M3 - Article
AN - SCOPUS:85057246152
VL - 55
SP - 37
EP - 55
JO - Journal of Web Semantics. Science, Services and Agents on the World Wide Web
JF - Journal of Web Semantics. Science, Services and Agents on the World Wide Web
SN - 1570-8268
ER -