This repository has been archived by the owner on Aug 15, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
gwern/archive-text-urls
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
It occurred to me once that it might be neat to have a CLI tool which would parse a text file for strings like "http://", find the longest valid URL (eg "see http://www.google.com " can be easily turned into the correct URL just by starting at "http" and eating until you reach " ", which is not good in a URL without having been escaped as "%20"). So I did a little work on such a tool. It didn't work well. My ultimate solution was to realize that I only cared about the URLs in my Markdown files, and to sit down and write a Pandoc script to parse the Markdown and extract URLs. See <http://www.gwern.net/haskell/link-extractor.hs>.
About
Parse freeform text files looking for plausible URLs to archive (abandoned)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published