Hacker News new | ask | show | jobs
by westurner 2374 days ago
This looks great: I like the search timeline, the ability to easily search for free full-text meta-analyses (a selection bias we should all be aware of), the MeSH term listing in a reasonably-sized font, and that there's schema.org/ImageObject metadata within the page, but there's no [Medical]ScholarlyArticle metadata?

I've worked with Google Scholar (:o) [1], Semantic Scholar (Allen Institute for AI) [2], Meta (Chan Zuckerberg Institute) [3], Zotero, Mendeley and a number of other tools for indexing and extracting metadata and graph relations from https://schema.org/ScholarlyArticle and MedicalScholarlyArticles . Without RDFa (or Microdata, or JSON-LD) in PDF, there's a lot of parsing that has to go down in order to get a graph from the citations in the article. Each service adds value to this graph of resources. Pushing forward on publishing linked research that's reproducible (#LinkedResearch, #LinkedReproducibility) is a worthwhile investment in meta-research that we have barely yet addressed:

> http://Schema.org/NewsArticle .citation: https://schema.org/citation ... Wouldn't it be great if NewsArticles linked to the ScholarlyArticle and/or Notebook CreativeWorks that they're .about (with reified relations)?

> A practical use case: Alice wants to publish a ScholarlyArticle [1] (in HTML with structured data, as a PDF) predicated upon Datasets [2] (as CSV, CSVW JSONLD, XLSX (DataDownload)) with static HTML (and no special HTTP headers). 1 https://schema.org/ScholarlyArticle 2 https://schema.org/Dataset*

> B wants to build a meta analysis: to collect a # of ScholarlyArticles and Dataset DataDownloads; review study controls and data; merge, join, & concatenate Datasets if appropriate, and inductively or deductively infer a conclusion and suggestions for further studies of variance*

The Linked Open Data Cloud shows the edges, the relations, the structured data links between very many (life sciences) datasets: https://lod-cloud.net/ . https://5stardata.info/en/ lists TimBL's suggested 5-start deployment schema for Open Data; which culuminates in publishing linked open data in non-proprietary formats that uses URIs to describe and link to things.

Could any of these [1][2][3][4][5] services cross-link the described resources, given a common URI identifier such as https://schema.org/identifier and/or https://schema.org/url ? ORCID is a service for generating stable identifiers for researchers and publishers who have names in common but different emails. W3C DID solves for this need in a different way.

When I check an article result page with the OpenLink OSDS extension (or any of a number of other tools for extracting structured data from HTML pages (and documents!) https://github.com/CodeForAntarctica/codeforantarctica.githu... ), there could be quite a bit more data there for search engines, browser extensions, and meta-research tools.

Is this something like ElasticSearch on the backend? It is possible to store JSON-LD documents in the search index. I threw together elasticsearchjsonld to "Generate JSON-LD @contexts from ElasticSearch JSON Mappings" for the OpenFDA FAERS data a few year ago. That's not GraphQL or SPARQL, but it's something and it's Linked Data.

re: "Canada's Decision To Make Public More Clinical Trial Data Puts Pressure On FDA" https://news.ycombinator.com/item?id=21232183

> We really could get more out of this data through international collaboration and through linked data (e.g. URIs for columns). See: "Open, and Linked, FDA data" https://github.com/FDA/openfda/issues/5#issuecomment-5392966... and "ENH: Adverse Event Count / 'Use' Count Heatmap" https://github.com/FDA/openfda/issues/49 . With sales/usage counts, we'd have a denominator with which we could calculate relative hazard.

W3C Web Annotations handle threaded comments and highlights; reviewing the reviewers is left as an exercise for the reader. Does Zotero still make it easy to save the bibliographic metadata for one or more ScholarlyArticles from PubMed to a collection in the cloud (and add metadata/annotations)?

Sorry to toot my own horn here. Great job on this. This opens up many new opportunities for research.

[1] https://scholar.google.com

[2] https://www.semanticscholar.org/

[3] https://www.meta.org/

[4] https://zotero.org/

[5] https://mendeley.org/

2 comments

Pubmed publishes its dataset for download. Its rather large but update files come frequently. Its amazing. I beleive NIH adds the MESH terms.

ftp://ftp.ncbi.nlm.nih.gov/pubmed/

We had someone do a project with it. downloaded the dataset and used it and create a tool to do some searches that we found useful to find colaborators: (last author, working on a specific gene, paper counts, most recent).

Searching by Mesh Terms across species, and search with orthologs.

The dataset sometimes has a hard time disambiguating names (I think the european dataset assigns Ids to names)

To make sure your feedback is heard, please use "Feedback button" found in bottom right corner of https://pubmed.gov/labs