Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ipfsapi dependency to new name, adapt ipwb tests to Py3 #609

Merged
merged 14 commits into from
Jun 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
dist: xenial
language: python
python:
- "2.7"
- "3.7"
before_script:
- wget "https://dist.ipfs.io/go-ipfs/v0.4.19/go-ipfs_v0.4.19_linux-amd64.tar.gz" -O /tmp/ipfs.tar.gz
- wget "https://dist.ipfs.io/go-ipfs/v0.4.21/go-ipfs_v0.4.21_linux-amd64.tar.gz" -O /tmp/ipfs.tar.gz
#- mkdir $HOME/bin
- pushd . && cd $HOME/bin && tar -xzvf /tmp/ipfs.tar.gz && popd
- export PATH="$HOME/bin/go-ipfs:$PATH"
Expand All @@ -17,6 +18,5 @@ script:
- ipfs init
- ipfs daemon & sleep 10
- py.test --cov=./
dist: trusty
after_success:
- codecov
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ RUN mkdir -p /data/{warc,cdxj,ipfs}

# Download and install IPFS
ENV IPFS_PATH=/data/ipfs
ARG IPFS_VERSION=v0.4.19
ARG IPFS_VERSION=v0.4.21
RUN cd /tmp \
&& wget -q https://dist.ipfs.io/go-ipfs/${IPFS_VERSION}/go-ipfs_${IPFS_VERSION}_linux-amd64.tar.gz \
&& tar xvfz go-ipfs*.tar.gz \
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Both *Service Worker* and *Custom Element* APIs are new and only supported in mo

## Installing

InterPlanetary Wayback (ipwb) requires Python 2.7+ though we are working on having it work on Python 3 as well (see [#51](https://github.com/oduwsdl/ipwb/issues/51)). ipwb can also be used with Docker ([see below](#user-content-using-docker)).
InterPlanetary Wayback (ipwb) requires Python 3.6+. ipwb can also be used with Docker ([see below](#user-content-using-docker)).

For conventional usage, the latest release of ipwb can be installed using pip:

Expand Down
12 changes: 9 additions & 3 deletions ipwb/indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import sys
import os
import json
import ipfsapi
import ipfshttpclient as ipfsapi
import argparse
import zlib
import surt
Expand All @@ -26,7 +26,7 @@
from warcio.recordloader import ArchiveLoadFailed

from requests.packages.urllib3.exceptions import NewConnectionError
from ipfsapi.exceptions import ConnectionError
from ipfshttpclient.exceptions import ConnectionError
# from requests.exceptions import ConnectionError

from six.moves import input
Expand All @@ -52,7 +52,7 @@

DEBUG = False

IPFS_API = ipfsapi.Client(IPFSAPI_HOST, IPFSAPI_PORT)
IPFS_API = ipfsapi.Client(f"/dns/{IPFSAPI_HOST}/tcp/{IPFSAPI_PORT}/http")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is time to retire IPFSAPI_HOST and IPFSAPI_PORT and replace them with a single config called APFSAPI_ADDR that will hold the multaddr value as described in the new API. The rationale behind this is the fact that the formation of address now depends on the format of the host. This change assumes that the IPFSAPI_HOST will be a hostname, but if it's an IP address, then this will render invalid. Consolidating complete ADDR in one string will make sure that the user modifies everything accordingly (and any misconfiguration is users' fault, not of the system).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibnesayeed I agree that the semantics are off on HOST vs. IP. Let's update that in a separate optimization request with respect to #349, where it will be most useful. Please see #610. Until then, for the sake of integrating with the new version of the module, let's keep HOST and PORT for now, despite the lack of semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have made this change in this PR, but if you would prefer doing it in yet another PR, that is alright too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in #610 will require additional parsing logic and assuring that they have been applied across more files. The ticket ought to be representative of the need to do it and will hopefully allow us to build some test cases around it, which seems like it would be scope creep for this PR/issue.



def s2b(s): # Convert str to bytes, cross-py
Expand Down Expand Up @@ -382,10 +382,16 @@ def pushBytesToIPFS(bytes):
try:
res = IPFS_API.add_bytes(bytes) # bytes)
except TypeError as err:
print('fail')
logError('IPFS_API had an issue pushing the item to IPFS')
logError(sys.exc_info())
logError(len(bytes))
traceback.print_tb(sys.exc_info()[-1])
except ipfsapi.exceptions.ConnectionError as connErr:
print('ConnErr')
logError(sys.exc_info())
traceback.print_tb(sys.exc_info()[-1])
return

# TODO: verify that the add was successful

Expand Down
14 changes: 6 additions & 8 deletions ipwb/replay.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from __future__ import print_function
import sys
import os
import ipfsapi
import ipfshttpclient as ipfsapi
import json
import subprocess
import pkg_resources
Expand All @@ -28,7 +28,7 @@
from flask import abort
from flask import render_template

from ipfsapi.exceptions import StatusError as hashNotInIPFS
from ipfshttpclient.exceptions import StatusError as hashNotInIPFS
from bisect import bisect_left
from socket import gaierror
from socket import error as socketerror
Expand Down Expand Up @@ -68,7 +68,7 @@
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.debug = False

IPFS_API = ipfsapi.Client(IPFSAPI_HOST, IPFSAPI_PORT)
IPFS_API = ipfsapi.Client(f"/dns/{IPFSAPI_HOST}/tcp/{IPFSAPI_PORT}/http")
machawk1 marked this conversation as resolved.
Show resolved Hide resolved


@app.context_processor
Expand Down Expand Up @@ -722,13 +722,11 @@ def handler(signum, frame):
k, v = hLine.split(':', 1)

if k.lower() == 'transfer-encoding' and \
re.search(r'\bchunked\b', v, re.I):
re.search(r'\bchunked\b', v, re.I):
try:
unchunkedPayload = extractResponseFromChunkedData(payload)
except Exception as e:
print('Error while dechunking')
print(sys.exc_info()[0])
continue # Data may have no actually been chunked
continue # Data not chunked
resp.set_data(unchunkedPayload)

if k.lower() not in ["content-type", "content-encoding", "location"]:
Expand Down Expand Up @@ -909,7 +907,7 @@ def getIndexFileContents(cdxjFilePath=INDEX_FILE):
cdxjFilePath))
return fetchRemoteCDXJFile(cdxjFilePath) or ''

indexFilePath = '/{0}'.format(cdxjFilePath).replace('ipwb.replay', 'ipwb')
indexFilePath = cdxjFilePath.replace('ipwb.replay', 'ipwb')
print('getting index file at {0}'.format(indexFilePath))

indexFileContent = ''
Expand Down
6 changes: 3 additions & 3 deletions ipwb/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import os
import sys
import requests
import ipfsapi
import ipfshttpclient as ipfsapi

import re
# Datetime conversion to rfc1123
Expand All @@ -22,7 +22,7 @@
from pkg_resources import parse_version

# from requests.exceptions import ConnectionError
from ipfsapi.exceptions import ConnectionError
from ipfshttpclient.exceptions import ConnectionError


IPFSAPI_HOST = 'localhost'
Expand All @@ -41,7 +41,7 @@

def isDaemonAlive(hostAndPort="{0}:{1}".format(IPFSAPI_HOST, IPFSAPI_PORT)):
"""Ensure that the IPFS daemon is running via HTTP before proceeding"""
client = ipfsapi.Client(IPFSAPI_HOST, IPFSAPI_PORT)
client = ipfsapi.Client(f"/dns/{IPFSAPI_HOST}/tcp/{IPFSAPI_PORT}/http")

try:
# ConnectionError/AttributeError if IPFS daemon not running
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
warcio>=1.5.3
ipfsapi>=0.4.2
ipfshttpclient>=0.4.12
Flask==0.12.3
pycryptodome>=3.4.11
requests>=2.19.1
Expand Down
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
],
install_requires=[
'warcio>=1.5.3',
'ipfsapi>=0.4.2',
'ipfshttpclient>=0.4.12',
'Flask==0.12.3',
'pycryptodome>=3.4.11',
'requests>=2.19.1',
Expand Down Expand Up @@ -56,7 +56,8 @@

'Environment :: Web Environment',

'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',

'License :: OSI Approved :: MIT License',

Expand Down
10 changes: 6 additions & 4 deletions tests/testUtil.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,19 @@ def createUniqueWARC():
warcInPath = os.path.join(os.path.dirname(__file__) +
'/../samples/warcs/' + warcInFilename)

stringToChange = 'abcdefghijklmnopqrstuvwxz'
stringToChange = b'abcdefghijklmnopqrstuvwxz'
randomString = getRandomString(len(stringToChange))
randomBytes = str.encode(randomString)

with open(warcInPath, 'r') as warcFile:
newContent = warcFile.read().replace(stringToChange, randomString)
with open(warcInPath, 'rb') as warcFile:
newContent = warcFile.read().replace(stringToChange, randomBytes)

warcOutFilename = warcInFilename.replace('.warc', '_' +
randomString + '.warc')
warcOutPath = os.path.join(os.path.dirname(__file__) +
'/../samples/warcs/' + warcOutFilename)
with open(warcOutPath, 'w') as warcFile:
print(warcOutPath)
with open(warcOutPath, 'wb') as warcFile:
warcFile.write(newContent)

return warcOutPath
Expand Down
2 changes: 1 addition & 1 deletion tests/test_indexing.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Number of entries in CDXJ == number of response records in WARC

import pytest
import testUtil as ipwbTest
from . import testUtil as ipwbTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change in the import style seems unrelated to this particular PR as it is more about Py3 support. If this is a dependency, then these changes should be made independently and merged prior to this PR. Consider this comment for the rest of the changes in test files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did not originally intend of your code review of this, but because I used f-strings above, which are a Py3 feature, and the tests are still in Py2, we could not have the original changes in this PR without updating the tests to also be Py3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with Py3 changes and understand the dependency situation here, but my primary concern was the title is not reflective of changes. That's why I suggested that we make these Py3 changes in a separate PR, merge it in this branch, and then test this branch. However, if that is too much work, we can go ahead with this one as it is.

Copy link
Member Author

@machawk1 machawk1 May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibnesayeed I had started to do this...3 months ago in the issue-51-tests branch. The effort was never completed and I ended up redoing some of that here with additional fixes for the test.

I have updated the title of the PR to better reflect the changes. 😃

import os

from ipwb import indexer
Expand Down
38 changes: 19 additions & 19 deletions tests/test_memento.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import pytest

import testUtil as ipwbTest
from . import testUtil as ipwbTest
from ipwb import replay
from ipwb import indexer
from ipwb import __file__ as moduleLocation
from time import sleep
import os
import subprocess
import urllib2
from urllib.request import urlopen
import requests
import random
import string
Expand All @@ -19,7 +19,7 @@ def getURIMsFromTimeMapInWARC(warcFilename):
ipwbTest.startReplay(warcFilename)

tmURI = 'http://localhost:5000/timemap/link/memento.us/'
tm = urllib2.urlopen(tmURI).read()
tm = urlopen(tmURI).read().decode('utf-8')

urims = []
for line in tm.split('\n'):
Expand All @@ -38,7 +38,7 @@ def getRelsFromURIMSinWARC(warc):
# Get Link header values for each memento
linkHeaders = []
for urim in urims:
linkHeaders.append(urllib2.urlopen(urim).info().getheader('Link'))
linkHeaders.append(urlopen(urim).info().get('Link'))
ipwbTest.stopReplay()

relsForURIMs = []
Expand Down Expand Up @@ -76,7 +76,7 @@ def test_acceptdatetime_status(warc, lookup, acceptdatetime, status):
def test_mementoRelations_one():
relsForURIMs = getRelsFromURIMSinWARC('1memento.warc')

relsForURIMs = filter(lambda k: 'memento' in k, relsForURIMs[0])
relsForURIMs = list(filter(lambda k: 'memento' in k, relsForURIMs[0]))
m1_m1 = relsForURIMs[0].split(' ')

onlyOneMemento = len(relsForURIMs) == 1
Expand All @@ -98,8 +98,8 @@ def test_mementoRelations_two():
cond_firstPrevMemento = False
cond_lastMemento = False

relsForURIMs1of2 = filter(lambda k: 'memento' in k, relsForURIMs[0])
relsForURIMs2of2 = filter(lambda k: 'memento' in k, relsForURIMs[1])
relsForURIMs1of2 = list(filter(lambda k: 'memento' in k, relsForURIMs[0]))
relsForURIMs2of2 = list(filter(lambda k: 'memento' in k, relsForURIMs[1]))

# mX_mY = URI-M requested, Y-th URIM-M in header
m1_m1 = relsForURIMs1of2[0].split(' ')
Expand Down Expand Up @@ -132,9 +132,9 @@ def test_mementoRelations_three():
cond_m3m2_prevMemento = False
cond_m3m3_lastMemento = False

relsForURIMs1of3 = filter(lambda k: 'memento' in k, relsForURIMs[0])
relsForURIMs2of3 = filter(lambda k: 'memento' in k, relsForURIMs[1])
relsForURIMs3of3 = filter(lambda k: 'memento' in k, relsForURIMs[2])
relsForURIMs1of3 = list(filter(lambda k: 'memento' in k, relsForURIMs[0]))
relsForURIMs2of3 = list(filter(lambda k: 'memento' in k, relsForURIMs[1]))
relsForURIMs3of3 = list(filter(lambda k: 'memento' in k, relsForURIMs[2]))

# mX_mY = URI-M requested, Y-th URIM-M in header
m1_m1 = relsForURIMs1of3[0].split(' ')
Expand Down Expand Up @@ -189,10 +189,10 @@ def test_mementoRelations_four():
cond_m4m3_prevMemento = False
cond_m4m4_lastMemento = False

relsForURIMs1of4 = filter(lambda k: 'memento' in k, relsForURIMs[0])
relsForURIMs2of4 = filter(lambda k: 'memento' in k, relsForURIMs[1])
relsForURIMs3of4 = filter(lambda k: 'memento' in k, relsForURIMs[2])
relsForURIMs4of4 = filter(lambda k: 'memento' in k, relsForURIMs[3])
relsForURIMs1of4 = list(filter(lambda k: 'memento' in k, relsForURIMs[0]))
relsForURIMs2of4 = list(filter(lambda k: 'memento' in k, relsForURIMs[1]))
relsForURIMs3of4 = list(filter(lambda k: 'memento' in k, relsForURIMs[2]))
relsForURIMs4of4 = list(filter(lambda k: 'memento' in k, relsForURIMs[3]))

# mX_mY = URI-M requested, Y-th URIM-M in header
m1_m1 = relsForURIMs1of4[0].split(' ')
Expand Down Expand Up @@ -277,11 +277,11 @@ def test_mementoRelations_five():
cond_m5m4_prevMemento = False
cond_m5m5_lastMemento = False

relsForURIMs1of5 = filter(lambda k: 'memento' in k, relsForURIMs[0])
relsForURIMs2of5 = filter(lambda k: 'memento' in k, relsForURIMs[1])
relsForURIMs3of5 = filter(lambda k: 'memento' in k, relsForURIMs[2])
relsForURIMs4of5 = filter(lambda k: 'memento' in k, relsForURIMs[3])
relsForURIMs5of5 = filter(lambda k: 'memento' in k, relsForURIMs[4])
relsForURIMs1of5 = list(filter(lambda k: 'memento' in k, relsForURIMs[0]))
relsForURIMs2of5 = list(filter(lambda k: 'memento' in k, relsForURIMs[1]))
relsForURIMs3of5 = list(filter(lambda k: 'memento' in k, relsForURIMs[2]))
relsForURIMs4of5 = list(filter(lambda k: 'memento' in k, relsForURIMs[3]))
relsForURIMs5of5 = list(filter(lambda k: 'memento' in k, relsForURIMs[4]))

# mX_mY = URI-M requested, Y-th URIM-M in header
m1_m1 = relsForURIMs1of5[0].split(' ')
Expand Down
2 changes: 1 addition & 1 deletion tests/test_randomized_add.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from ipwb import indexer

import testUtil as ipwbTest
from . import testUtil as ipwbTest


def isValidSURT(surt):
Expand Down
8 changes: 4 additions & 4 deletions tests/test_replay.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import pytest

import testUtil as ipwbTest
from . import testUtil as ipwbTest
from ipwb import replay
from time import sleep

import requests

import urllib2
import urllib

# Successful retrieval
# Accurate retrieval
Expand Down Expand Up @@ -162,8 +162,8 @@ def test_unit_commandDaemon():
replay.commandDaemon('start')
sleep(10)
try:
urllib2.urlopen('http://localhost:5001')
except urllib2.HTTPError as e:
urllib.request.urlopen('http://localhost:5001')
except urllib.error.HTTPError as e:
assert e.code == 404
except Exception as e:
assert False
Expand Down