Skip to content

Commit

Permalink
Change authors to list[dict['name' | 'url' | ..., str]] (#70)
Browse files Browse the repository at this point in the history
* insert shebang and script check

* pre-commit run

* convert to dict of name and url

* convert more to dict

* convert publications

* remove comma

* Update data/packages.yml

Co-authored-by: Meesum Qazalbash <[email protected]>

* fix __main__ loop and define class Author(TypedDict)

TODO author validation

* make script path absolute

* revert reposition of module level vars

* reapply somehow disappeared abs dir change

* raise ValueError on non-https author URLs

fix yaml whitespace

---------

Co-authored-by: Meesum Qazalbash <[email protected]>
Co-authored-by: Janosh Riebesell <[email protected]>
  • Loading branch information
3 people authored Sep 22, 2024
1 parent 1293330 commit a59d334
Show file tree
Hide file tree
Showing 8 changed files with 577 additions and 234 deletions.
68 changes: 59 additions & 9 deletions data/applications.yml
Original file line number Diff line number Diff line change
@@ -1,47 +1,97 @@
- title: Latent Space Policies for Hierarchical Reinforcement Learning
url: https://arxiv.org/abs/1804.02808
date: 2018-04-09
authors: Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine
authors:
- name: Tuomas Haarnoja
- name: Kristian Hartikainen
- name: Pieter Abbeel
- name: Sergey Levine
description: Uses normalizing flows, specifically RealNVPs, as policies for reinforcement learning and also applies them for the hierarchical reinforcement learning setting.

- title: Analyzing Inverse Problems with Invertible Neural Networks
url: https://arxiv.org/abs/1808.04730
date: 2018-08-14
authors: Lynton Ardizzone, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein, Carsten Rother, Ullrich Köthe
authors:
- name: Lynton Ardizzone
- name: Jakob Kruse
- name: Sebastian Wirkert
- name: Daniel Rahner
- name: Eric W. Pellegrini
- name: Ralf S. Klessen
- name: Lena Maier-Hein
- name: Carsten Rother
- name: Ullrich Köthe
description: Normalizing flows for inverse problems.

- title: NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport
url: https://arxiv.org/abs/1903.03704
date: 2019-03-09
authors: Matthew Hoffman, Pavel Sountsov, Joshua V. Dillon, Ian Langmore, Dustin Tran, Srinivas Vasudevan
authors:
- name: Matthew Hoffman
- name: Pavel Sountsov
- name: Joshua V. Dillon
- name: Ian Langmore
- name: Dustin Tran
- name: Srinivas Vasudevan
description: Uses normalizing flows in conjunction with Monte Carlo estimation to have more expressive distributions and better posterior estimation.

- title: 'SRFlow: Learning the Super-Resolution Space with Normalizing Flow'
- title: "SRFlow: Learning the Super-Resolution Space with Normalizing Flow"
url: https://arxiv.org/abs/2006.14200
date: 2020-06-25
authors: Andreas Lugmayr, Martin Danelljan, Luc Van Gool, Radu Timofte
authors:
- name: Andreas Lugmayr
- name: Martin Danelljan
- name: Luc Van Gool
- name: Radu Timofte
description: Uses normalizing flows for super-resolution.

- title: Faster Uncertainty Quantification for Inverse Problems with Conditional Normalizing Flows
url: https://arxiv.org/abs/2007.07985
date: 2020-07-15
authors: Ali Siahkoohi, Gabrio Rizzuti, Philipp A. Witte, Felix J. Herrmann
authors:
- name: Ali Siahkoohi
- name: Gabrio Rizzuti
- name: Philipp A. Witte
- name: Felix J. Herrmann
description: Uses conditional normalizing flows for inverse problems. [[Video](https://youtu.be/nPvZIKaRBkI)]

- title: Targeted free energy estimation via learned mappings
url: https://aip.scitation.org/doi/10.1063/5.0018903
date: 2020-10-13
authors: Peter Wirnsberger, Andrew J. Ballard, George Papamakarios, Stuart Abercrombie, Sébastien Racanière, Alexander Pritzel, Danilo Jimenez Rezende, Charles Blundell
authors:
- name: Peter Wirnsberger
- name: Andrew J. Ballard
- name: George Papamakarios
- name: Stuart Abercrombie
- name: Sébastien Racanière
- name: Alexander Pritzel
- name: Danilo Jimenez Rezende
- name: Charles Blundell
description: Normalizing flows used to estimate free energy differences.

- title: On the Sentence Embeddings from Pre-trained Language Models
url: https://aclweb.org/anthology/2020.emnlp-main.733
date: 2020-11-02
authors: Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei Li
authors:
- name: Bohan Li
- name: Hao Zhou
- name: Junxian He
- name: Mingxuan Wang
- name: Yiming Yang
- name: Lei Li
description: Proposes to use flows to transform anisotropic sentence embedding distributions from BERT to a smooth and isotropic Gaussian, learned through unsupervised objective. Demonstrates performance gains over SOTA sentence embeddings on semantic textual similarity tasks. Code available at <https://github.com/bohanli/BERT-flow>.

- title: Normalizing Kalman Filters for Multivariate Time Series Analysis
url: https://assets.amazon.science/ea/0c/88b7bdd54eae8c08983fa4cc3e06/normalizing-kalman-filters-for-multivariate-time-series-analysis.pdf
date: 2020-12-06
authors: Emmanuel de Bézenac, Syama Sundar Rangapuram, Konstantinos Benidis, Michael Bohlke-Schneider, Richard Kurle, Lorenzo Stella, Hilaf Hasson, Patrick Gallinari, Tim Januschowski
authors:
- name: Emmanuel de Bézenac
- name: Syama Sundar Rangapuram
- name: Konstantinos Benidis
- name: Michael Bohlke-Schneider
- name: Richard Kurle
- name: Lorenzo Stella
- name: Hilaf Hasson
- name: Patrick Gallinari
- name: Tim Januschowski
description: Augments state space models with normalizing flows and thereby mitigates imprecisions stemming from idealized assumptions. Aimed at forecasting real-world data and handling varying levels of missing data. (Also available at [Amazon Science](https://amazon.science/publications/normalizing-kalman-filters-for-multivariate-time-series-analysis).)
201 changes: 112 additions & 89 deletions data/make_readme.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,25 +1,36 @@
#!/usr/bin/env python3

"""Script to generate readme.md from data/*.yml files."""

import datetime
import os
import re
from os.path import dirname
from typing import TypedDict

import yaml

ROOT = dirname(dirname(__file__))
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))


class Author(TypedDict):
"""An author of a paper or application."""

name: str
url: str | None
affiliation: str | None
github: str | None
orcid: str | None


class Item(TypedDict):
"""An item in a readme section like a paper or package."""

title: str
authors: str
authors: list[Author]
date: datetime.date
lang: str
url: str
description: str
authors_url: str | None
repo: str | None
date_added: datetime.date | None

Expand All @@ -44,7 +55,7 @@ class Section(TypedDict):

def load_items(key: str) -> list[Item]:
"""Load list[Item] from YAML file."""
with open(f"{ROOT}/data/{key}.yml", encoding="utf8") as file:
with open(f"{ROOT_DIR}/data/{key}.yml", encoding="utf8") as file:
return yaml.safe_load(file.read())


Expand All @@ -53,10 +64,9 @@ def load_items(key: str) -> list[Item]:
for key in titles # markdown is set below
}


seen_titles: set[tuple[str, str]] = set()
required_keys = {"title", "url", "date", "authors", "description"}
optional_keys = {"authors_url", "lang", "repo", "docs", "date_added", "last_updated"}
optional_keys = {"lang", "repo", "docs", "date_added", "last_updated"}
valid_languages = {"PyTorch", "TensorFlow", "JAX", "Julia", "Other"}
et_al_after = 2

Expand All @@ -72,7 +82,7 @@ def validate_item(itm: Item, section_title: str) -> None:
else:
seen_titles.add((title, section_title))

if section_title in ("packages", "repos") and itm["lang"] not in valid_languages:
if section_title in {"packages", "repos"} and itm["lang"] not in valid_languages:
errors += [
f"Invalid lang in {title}: {itm['lang']}, must be one of {valid_languages}"
]
Expand Down Expand Up @@ -101,87 +111,100 @@ def validate_item(itm: Item, section_title: str) -> None:
raise ValueError("\n".join(errors))


for key, section in sections.items():
# Keep lang_names inside sections loop to refill language subsections for each new
# section. Used by both repos and Packages. Is a list for order and mutability.
lang_names = ["PyTorch", "TensorFlow", "JAX", "Julia", "Other"]

# sort first by language with order determined by lang_names (only applies to
# Package and repos sections), then by date
section["items"].sort(key=lambda x: x["date"], reverse=True)
if key in ("packages", "repos"):
section["items"].sort(key=lambda itm: lang_names.index(itm["lang"]))

# add item count after section title
section["markdown"] += f" <small>({len(section['items'])})</small>\n\n"

for itm in section["items"]:
if (lang := itm.get("lang")) in lang_names:
lang_names.remove(lang)
# print language subsection title if this is the first item with that lang
section["markdown"] += (
f'<br>\n\n### <img src="assets/{lang.lower()}.svg" alt="{lang}" '
f'height="20px"> &nbsp;{lang} {key.title()}\n\n'
if __name__ == "__main__":
for key, section in sections.items():
# Keep lang_names inside sections loop to refill language
# subsections for each new section. Used by both repos and Packages.
# Is a list for order and mutability.
lang_names = ["PyTorch", "TensorFlow", "JAX", "Julia", "Other"]

# sort first by language with order determined by lang_names (only applies to
# Package and repos sections), then by date
section["items"].sort(key=lambda x: x["date"], reverse=True)
if key in ("packages", "repos"):
section["items"].sort(key=lambda itm: lang_names.index(itm["lang"]))

# add item count after section title
section["markdown"] += f" <small>({len(section['items'])})</small>\n\n"

for itm in section["items"]:
if (lang := itm.get("lang")) in lang_names:
lang_names.remove(lang)
# print language subsection title if this is the first item
# with that language
section["markdown"] += (
f'<br>\n\n### <img src="assets/{lang.lower()}.svg" alt="{lang}" '
f'height="20px"> &nbsp;{lang} {key.title()}\n\n'
)

validate_item(itm, section["title"])

authors = itm["authors"]
date = itm["date"]
description = itm["description"]
title = itm["title"]
url = itm["url"]

if key in ("publications", "applications"):
# only show people's last name for papers
authors = [
auth | {"name": auth["name"].split(" ")[-1]} for auth in authors
]

def auth_str(auth: Author) -> str:
"""Return a markdown string for an author."""
auth_str = auth["name"]
if url := auth.get("url"):
if not url.startswith("https://"):
raise ValueError(
f"Invalid author {url=}, must start with https://"
)
auth_str = f"[{auth_str}]({url})"
return auth_str

authors_str = ", ".join(map(auth_str, authors[:et_al_after]))
if len(authors) > et_al_after:
authors_str += " et al."

md_str = f"1. {date} - [{title}]({url}) by {authors_str}"

if key in ("packages", "repos") and url.startswith("https://github.com"):
gh_login, repo_name = url.split("/")[3:5]
md_str += (
f'\n&ensp;\n<img src="https://img.shields.io/github/stars/'
f'{gh_login}/{repo_name}" alt="GitHub repo stars"'
' valign="middle" />'
)

md_str += "<br>\n " + description.removesuffix("\n")
if docs := itm.get("docs"):
md_str += f" [[Docs]({docs})]"
if repo := itm.get("repo"):
md_str += f" [[Code]({repo})]"

section["markdown"] += md_str + "\n\n"

with open(f"{ROOT_DIR}/readme.md", "r+", encoding="utf8") as file:
readme = file.read()

for section in sections.values():
# look ahead without matching
section_start_pat = f"(?<={section['title']})"
# look behind without matching
next_section_pat = "(?=<br>\n\n## )"

# match everything up to next heading
readme = re.sub(
rf"{section_start_pat}[\s\S]+?\n\n{next_section_pat}",
section["markdown"],
readme,
)

validate_item(itm, section["title"])

authors = itm["authors"]
date = itm["date"]
description = itm["description"]
title = itm["title"]
url = itm["url"]

author_list = authors.split(", ")
if key in ("publications", "applications"):
# only show people's last name for papers
author_list = [author.split(" ")[-1] for author in author_list]
authors = ", ".join(author_list[:et_al_after])
if len(author_list) > et_al_after:
authors += " et al."

if authors_url := itm.get("authors_url"):
authors = f"[{authors}]({authors_url})"

md_str = f"1. {date} - [{title}]({url}) by {authors}"

if key in ("packages", "repos") and url.startswith("https://github.com"):
gh_login, repo_name = url.split("/")[3:5]
md_str += (
f'\n&ensp;\n<img src="https://img.shields.io/github/stars/'
f'{gh_login}/{repo_name}" alt="GitHub repo stars" valign="middle" />'
)

md_str += "<br>\n " + description.removesuffix("\n")
if docs := itm.get("docs"):
md_str += f" [[Docs]({docs})]"
if repo := itm.get("repo"):
md_str += f" [[Code]({repo})]"

section["markdown"] += md_str + "\n\n"


with open(f"{ROOT}/readme.md", "r+", encoding="utf8") as file:
readme = file.read()

for section in sections.values():
# look ahead without matching
section_start_pat = f"(?<={section['title']})"
# look behind without matching
next_section_pat = "(?=<br>\n\n## )"

# match everything up to next heading
readme = re.sub(
rf"{section_start_pat}[\s\S]+?\n\n{next_section_pat}",
section["markdown"],
readme,
)

file.seek(0)
file.write(readme)
file.truncate()
file.seek(0)
file.write(readme)
file.truncate()

section_counts = "\n".join(
f"- {key}: {len(sec['items'])}" for key, sec in sections.items()
)
print(f"finished writing {len(seen_titles)} items to readme:\n{section_counts}") # noqa: T201
section_counts = "\n".join(
f"- {key}: {len(sec['items'])}" for key, sec in sections.items()
)
print(f"finished writing {len(seen_titles)} items to readme:\n{section_counts}") # noqa: T201
Loading

0 comments on commit a59d334

Please sign in to comment.