<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Heg Genève on SONAR</title>
    <link>/tags/heg-gen%C3%A8ve/</link>
    <description>Recent content in Heg Genève on SONAR</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 08 Oct 2020 12:00:00 +0200</lastBuildDate>
    
        <atom:link href="/tags/heg-gen%C3%A8ve/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Acquisition of Swiss-affiliated research records: first evaluation from the field</title>
      <link>/post/acquisition-of-swiss-affiliated-research-records-first-evaluation/</link>
      <pubDate>Thu, 08 Oct 2020 12:00:00 +0200</pubDate>
      
      <guid>/post/acquisition-of-swiss-affiliated-research-records-first-evaluation/</guid>
      <description>&lt;p&gt;One of the tasks of the SONAR project is to assess the feasibility of a pipeline for automatically retrieving publications of researchers affiliated to Swiss publicly funded institutions, from third-party international databases. Almost 500,000 candidate bibliographic records were collected from CrossRef, MEDLINE, and PubMed Central, for building the first version of the SONAR dataset.&lt;/p&gt;

&lt;p&gt;See current result on &lt;a href=&#34;http://candy.hesge.ch/SONAR/&#34;&gt;candy.hesge.ch/SONAR/&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;comparative-analysis&#34;&gt;Comparative analysis&lt;/h3&gt;

&lt;p&gt;We then requested, in May 2020, the cooperation of Swiss Institutional Repository (IR) managers, in order to perform a comparative analysis between the SONAR dataset and their own records, and to provide qualitative feedback.&lt;/p&gt;

&lt;p&gt;The questions asked were the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many publications provided by the SONAR retrieval process are already registered in your institutional repository or in your internal publication database? (coverage rate)&lt;/li&gt;
&lt;li&gt;How many new publications does SONAR provide? (potential increase rate)&lt;/li&gt;
&lt;li&gt;How many are missing?&lt;/li&gt;
&lt;li&gt;What is the procedure you apply in order to perform the comparison?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We have collected 9 answers from different Swiss IR managers, including 3 universities. We would like to deeply thank all respondents for their highly valuable work.&lt;/p&gt;

&lt;h3 id=&#34;coverage-rate&#34;&gt;Coverage rate&lt;/h3&gt;

&lt;p&gt;We &lt;a href=&#34;http://candy.hesge.ch/SONAR/SONAR%20D5.1.pdf&#34;&gt;previously reported&lt;/a&gt; coverage rates ranging from 51% to 60% for our pipeline, evaluated on two benchmarks linked to the SNSF publication database. But the coverage rates reported by IR managers are lower (on average 47% for universities, and 21% for others). There are several reasons explaining this suboptimal coverage rate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we only consider publications (1) with a DOI in CrossRef (2) with the institution explicitly mentioned in the affiliation metadata. Some publications in Swiss IRs are thus not reachable via this approach.&lt;/li&gt;
&lt;li&gt;For example, some respondents mentioned missing publications that were written by authors before they joined their institution (thus with a different affiliation in the metadata), but that are nevertheless deposited in their IR. These are out of the scope of SONAR.&lt;/li&gt;
&lt;li&gt;Another strong limitation is the publication type. In particular with MEDLINE, we focus on scientific articles in peer-reviewed journals, which is just one of the publication types contained in Swiss IRs (we do not collect press articles or PhD theses, for instance). But we consider this as the most valuable type and the best starting point for such an evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fact that the reported improvement is below previous estimates is a particular cause for concern for some respondents, in a scenario where the monitoring of Swiss publication production was to be done via SONAR. However, besides the above-mentioned reasons, it is important to keep in mind that the objective of SONAR is not to replace existing IRs, but to help fill up content gaps. And indeed, a SONAR-based publication monitoring process must rely on the conjunction between the set of publications gathered from external sources and those harvested from Swiss IRs.&lt;/p&gt;

&lt;h3 id=&#34;potential-increase-rate&#34;&gt;Potential increase rate&lt;/h3&gt;

&lt;p&gt;Beyond coverage rate, we were interested in the publications that are present in the SONAR dataset but not in Swiss IRs, which represents a potential improvement for IRs. We &lt;a href=&#34;http://candy.hesge.ch/SONAR/SONAR%20D5.1.pdf&#34;&gt;previously reported&lt;/a&gt; a potential increase of the IRs coverage by +30-40% for some universities. Once again, in the answers we collected, this potential increase was bigger for universities (on average +25%) than for others (+8%). There were some concerns about the quality of these data. For one university, a manual analysis of 100 potential new records (amongst 24,000) showed that more than 80% of them were candidates for populating the IR. This ratio can be improved, by correcting some pointed out errors (~bugs).&lt;/p&gt;

&lt;p&gt;We cannot assume or expect that such an automatic pipeline collecting thousands of records can be perfect. At the same time, it cannot be expected that human validation provides better precision. Our opinion is that interested institutions should reflect on the cost/benefit ratio of our dataset. The fundamental question to be asked is: is it acceptable to tolerate a couple of false positives to (semi-)automatically acquire several thousands of valid DOIs automatically?&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Short insight into the content tracking results</title>
      <link>/post/content-tracking-first-insight/</link>
      <pubDate>Thu, 31 Oct 2019 08:00:00 +0100</pubDate>
      
      <guid>/post/content-tracking-first-insight/</guid>
      <description>&lt;p&gt;The work done by HEG Genève for the content tracking, one of the most exploratory tasks of SONAR project, already led to interesting data.&lt;/p&gt;

&lt;p&gt;According to both the literature and our preliminary analyses, more than half of Swiss scientific publications belong to the biomedical domain.&lt;/p&gt;

&lt;h3 id=&#34;medline-and-pubmed-central&#34;&gt;MEDLINE and PubMed Central&lt;/h3&gt;

&lt;p&gt;Since 1879, the US National Library of Medicine indexes articles published in reference journals dealing with life sciences and medicine, and a bibliographic database has been made publicly available online since 1971: the Medical Literature Analysis and Retrieval System Online, or &lt;a href=&#34;https://www.nlm.nih.gov/bsd/medline.html&#34;&gt;MEDLINE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In 2019, MEDLINE has passed the mark of 30 million bibliographic records. In parallel, 2.5 million of these publications are available as Open Access, and stored in &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/&#34;&gt;PubMed Central&lt;/a&gt;, a free full-text archive maintained by the US National Center for Biotechnology Information. These resources are thus a place of choice for recovering a large portion of Swiss publications.&lt;/p&gt;

&lt;p&gt;Both MEDLINE and PubMed Central ressources are accessible via the &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pubmed/&#34;&gt;PubMed interface&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;how-the-data-is-processed-by-sonar&#34;&gt;How the data is processed by SONAR&lt;/h3&gt;

&lt;p&gt;For SONAR, the HEG Genève exploits MEDLINE and PubMed Central for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Gathering all bibliographic records with a Swiss institution mentioned in the affiliation metadata, thanks to a manually designed authority list of 75 institutions written in 523 ways&lt;/li&gt;
&lt;li&gt;On those, gathering all Open Access full-texts and thus measuring the percentage of Open Access.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The following graph displays the number of records in MEDLINE and the percentage of Open Access, for top publishing Swiss institutions since 2015.&lt;/p&gt;

&lt;p&gt;&lt;img class=&#34;image&#34; src=&#34;/images/20191015_doi_oa_graph.svg&#34; alt=&#34;Number of records and OA rate for Swiss affiliations in MEDLINE since 2015&#34; title=&#34;Number of records and OA rate for Swiss affiliations in MEDLINE since 2015&#34; width=&#34;100%&#34;&gt;&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>First tests to recover full-text files of Swiss publications</title>
      <link>/post/first-tests-to-recover-full-text-files-of-swiss-publications/</link>
      <pubDate>Tue, 09 Apr 2019 11:33:58 +0200</pubDate>
      
      <guid>/post/first-tests-to-recover-full-text-files-of-swiss-publications/</guid>
      <description>&lt;p&gt;On the 3rd of April took place a common working meeting of the 4 SONAR partners (HES-SO/HEG Genève, HTW Chur, USI and RERO) in Bern.&lt;/p&gt;

&lt;p&gt;HEG Genève is conducting a feasibility study of ways of recovering full-text files of Swiss publications, having achieved some important conclusions.&lt;/p&gt;

&lt;p&gt;Experiments tried the following process (see the animated GIF below):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Harvest metadata from third parties to cover as many publications as possible (Crossref was tested in a first step)&lt;/li&gt;
&lt;li&gt;Take the subset of them containing affiliations

&lt;ol&gt;
&lt;li&gt;Among them, select publications related to Swiss research institutions&lt;/li&gt;
&lt;li&gt;Among them, check if they are OA (Unpaywall was tested)&lt;/li&gt;
&lt;li&gt;Recover the full-text file&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;
&lt;li&gt;Take the subset of them without affiliations  (still work in progress)

&lt;ol&gt;
&lt;li&gt;Among them, search Swiss affiliations or reference to SNF funding (Medline was tested)&lt;/li&gt;
&lt;li&gt;Among them, check if they are OA (Unpaywall was tested)&lt;/li&gt;
&lt;li&gt;Recover the full-text file&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can find more information directly in the slides: &lt;a href=&#34;/documents/HEG-SONAR-ppt-March-28_v3.pdf&#34;&gt;Download PDF&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img class=&#34;image fit&#34; src=&#34;/images/gif_recovering_fulltext.gif&#34; alt=&#34;Steps to recover full-text files of Swiss publications&#34; title=&#34;Steps to recover full-text files of Swiss publications&#34; &gt;&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>