GitHub - EthanRosenthal/rec-a-sketch: content discovery... IN 3D

Repo for building Sketchfab recommendations. Collecting data, training algorithms, and serving recommendations on a website will all be here.

This repo will likely not work for python 2 due to various encoding issues.

For some of the crawling processes, Selenium is used. You must provide a path to your browser driver in config.yml for this to work. See here for links to download the driver binary.

Collecting data

crawl.py

Use this script to crawl the Sketchfab site and collect data. Currently supports 4 processes as specified by --type argument:

urls - Grab the url of every sketchfab model with number of likes >= LIKE_LIMIT as defined in the config.
likes - Given collected model urls, collect users who have liked those models.
features - Given collected model urls, collect categories and tags associated with those models.
thumbs - Given collected model urls, collect 200x200 pixel thumbnails of each model.

Run like

python crawl.py config.yml --type urls

I ran into lots of issues with timeouts when crawling features. To pick back up on a particular row of the urls file pass --start row_number as an optional argument.

anonymize.py

Used to anonymize user_id's in likes data. Granted, one could probably back this out, but this serves as a small barrier of privacy.

To run, you must define a secret key for hashing the user_id's

python anonymize.py unanonymized_likes.csv anonymized_likes.csv "SECRET KEY"

The data

Model urls, likes, and features are all in the /data directory. These were roughly collected around October 2016.

All data are pipe-separated csv files with headers and with pandas read_csv() keyword arguments quoting=csv.QUOTE_MINIMAL and escapechar='\\'

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
flask_app		flask_app
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
anonymize.py		anonymize.py
config.yml.example		config.yml.example
crawl.py		crawl.py
environment.yml		environment.yml
helpers.py		helpers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collecting data

crawl.py

anonymize.py

The data

About

Releases

Packages

Languages

License

EthanRosenthal/rec-a-sketch

Folders and files

Latest commit

History

Repository files navigation

Collecting data

crawl.py

anonymize.py

The data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages