VRSE (Visual Research Search Engine)

VRSE (Visual Research Search Engine) is a Visual Search Engine Application run through a docker container on your local machine. Depending on time and financial constraints the application will be deployed on a server and available at vrse.app.

Setup Documentation

This project uses a large amount of data obtained from the Semantic Scholar Open Research Corpus. To run this project locally requires the installation of 170 GB of zip files, which are then programmatically unzipped and written to the elasticsearch database and deleted one by one.

WARNING: running this code is not recommended unless you are sure you want to use a large amount of space.

Steps

Download the data set

Download the full research corpus from semantic scholar. This requires the Amazon AWS CLI

aws s3 cp --no-sign-request --recursive s3://ai2-s2-research-public/open-corpus/2020-11-06/ destinationPath

You can alternatively download the manifest via http and use it to download all the archive files via http as well. Note that this is noticeably slower and requires wget:

wget https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/2020-11-06/manifest.txt
wget -B https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/2020-11-06/ -i manifest.txt

Configure Server, Client, and ES (elasticsearch) (With docker)

All dependencies related this project are managed using docker. You can start the docker container for this project by running

docker-compose up -d --build

This command both sets up the containers and builds the images. Once complete the logs in your Docker Compose application should show that all three components of this application have started and been set up. You should be able to visit part of the application at its related post on localhost.

Note that this command may take some time to run, especially the first time (where it can sometimes take a few minutes).

Importing the data to elasticsearch

Given the volume of data that is imported to elasticsearch (approx 170 GB) this process can take a long time to execute and requires some manual adjustment. From the root of the server directory you can run the test.js file using the following command. Should you wish to reset the index in ES, there is a function at the top of the file called resetIndex which can be un-commented.

node test.js

Configure the server and client to ensure ES (elasticsearch) and all related dependencies function in Docker container.

For set up of other dependencies and project set up see relevenat README documentation in each folder. (these will be unified at one point into one comprehensive installation README.md)

Useful Sources

https://blog.logrocket.com/full-text-search-with-node-js-and-elasticsearch-on-docker/

Next to try:

Meeting w/ Bob 1st December

Sketches update
Update on data access: 175 million papers with links - downloaded 175 GB of zip files
Visualised a sampled subset of the dataset but there were no connections because the 100 papers had no citations in common
Meeting with Bibliometric Mapping specialist: discussed Scopus and Web of Science - APIs where we can query data on papers
Set up project dependencies with Docker for elasticsearch (database), GatsbyJS (frontend), and ExpressJS/NodeJS (api/server)
Started bibliograpy/references index and set up document for interim report and final report
Early collection of thoughts for the background section
Research into deployment: deploying the tool will require hosting the dB and server on a virtual machine -> there are college resources that do this and after speaking to ICT they said it is a typical use case to rent server space from the college so I should speak with my supervisor about it (virtual machines are not free) -> there are external services that could also be effective (some cheaper/others not) but I am not sure whether it would be easier/harder to get funding approved -> deploying the tool would make testing with users and evaluating its utility much more feasible because it would actually be possible to run user tests with both known and unknown users to collect and analyse data on user flows and experiences -> I will likely take at least until February/early March until all the code is structured and secure enough to deploy

Plan for next week

Create database design for how the data will be stored (this could take some time because there is a lot of data and it would be very wasteful to make a mistake)
Load static data (downloaded data) into elasticsearch database
Set up endpoints on API to return nodes and links to the front end
Return results based on search input from the front-end as a list -> the visual layer will be the next layer of complexity to set up and then the plan is to iterate on each aspect of the system -> More complete wireframes/mockups of the system -> Color palette + design layout system to create both a product and a tool (branding and the look and feel of the interface could significantly affect user experience and conversion)

Core links for the next phase of the project

Digital Ocean(how-to-build-a-real-time-search-engine-with-node-vue-and-elasticsearch)
medium
search engine node elasticsearch
github repo
https://softwareontheroad.com/ideal-nodejs-project-structure/

This Weekend

Set up elasticsearch database
import test dataset of files and then set up full database

Done

Migrate Koa app to express and get it to work exactly the same way

These dependencies seem to be pulled in from the package lock

npm WARN deprecated @hapi/[email protected]: Switch to 'npm install joi' npm WARN deprecated @hapi/[email protected]: Moved to 'npm install @sideway/address' npm WARN deprecated @hapi/[email protected]: This version has been deprecated and is no longer supported or maintained npm WARN deprecated @hapi/[email protected]: This version has been deprecated and is no longer supported or maintained npm WARN deprecated @hapi/[email protected]: This version has been deprecated and is no longer supported or maintained npm WARN deprecated [email protected]: Version no longer supported. Upgrade to @latest npm WARN deprecated [email protected]: core-js@<3 is no longer maintained and not recommended for usage due to the number of issues. Please, upgrade your dependencies to the actual version of core-js@3.

npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@~2.1.2 (node_modules/gatsby-cli/node_modules/chokidar/node_modules/fsevents): npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

https://opensourceconnections.com/blog/2019/05/29/falsehoods-programmers-believe-about-search/

We are currently having so many problems with elastic it may be worth using Mongo for a smaller MVP so that something is working and making queries although this is in a way pushing more work further back in the project

A mongo db version of this project will likely also be far easier to deploy and set up but it will certainly be far too slow - to do something truly amazing this speed decrease will be a bottleneck but do we think you can get a great grade without making something truly amazing?

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
client		client
server		server
.dockerignore		.dockerignore
.gitignore		.gitignore
ES_Server_Instructions.md		ES_Server_Instructions.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VRSE (Visual Research Search Engine)

Setup Documentation

Steps

Download the data set

Configure Server, Client, and ES (elasticsearch) (With docker)

Importing the data to elasticsearch

Useful Sources

Meeting w/ Bob 1st December

Plan for next week

Core links for the next phase of the project

This Weekend

Done

We are currently having so many problems with elastic it may be worth using Mongo for a smaller MVP so that something is working and making queries although this is in a way pushing more work further back in the project

A mongo db version of this project will likely also be far easier to deploy and set up but it will certainly be far too slow - to do something truly amazing this speed decrease will be a bottleneck but do we think you can get a great grade without making something truly amazing?

About

Releases

Packages

Languages

VRSE-app/vrse-search

Folders and files

Latest commit

History

Repository files navigation

VRSE (Visual Research Search Engine)

Setup Documentation

Steps

Download the data set

Configure Server, Client, and ES (elasticsearch) (With docker)

Importing the data to elasticsearch

Useful Sources

Meeting w/ Bob 1st December

Plan for next week

Core links for the next phase of the project

This Weekend

Done

We are currently having so many problems with elastic it may be worth using Mongo for a smaller MVP so that something is working and making queries although this is in a way pushing more work further back in the project

A mongo db version of this project will likely also be far easier to deploy and set up but it will certainly be far too slow - to do something truly amazing this speed decrease will be a bottleneck but do we think you can get a great grade without making something truly amazing?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages