You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to bring up the idea of replacing pandas with polars. I can think of three reasons why this would be beneficial:
Processing speed
polars is much faster. @khoroshevskyi has been investigating this and adoption of polars could drastically speed up the time it takes to process PEPs on the PEPhub server, enabling real-time edits to PEPs.
It's hard to find unbiased, fair comparisons especially considering the polars hype, but this post does a pretty good job highlighting some of the large improvements.
Import speed
From my own experimentation, importing polars is almost 4 times faster than importing pandas. This would work to improve things like the looper cli import issues: pepkit/looper#476
Interface with genimtools
Genimtools is native-Rust with pyo3 bindings. polars follows this model as well. Because of this, the integration of peppy objects with genimtools becomes seamless. In fact, there is an entire crate maintained by the polars group dedicated to this interface.
This sets the stage for processing PEPs and their data in genimtools, further improving server speeds for real time PEP editing. eido comes to mind as a potential bottleneck with real-time PEP editing.
Potential downsides
I think some downsides to such a switch are:
polars is new, and not as "battle-tested" as pandas.
polars breaks down when you want to do data visualization as libraries like matplotlib don't natively support it.
time invested in a refactor of the sample table in peppy
The text was updated successfully, but these errors were encountered:
I want to bring up the idea of replacing
pandas
withpolars
. I can think of three reasons why this would be beneficial:Processing speed
polars
is much faster. @khoroshevskyi has been investigating this and adoption ofpolars
could drastically speed up the time it takes to process PEPs on the PEPhub server, enabling real-time edits to PEPs.It's hard to find unbiased, fair comparisons especially considering the
polars
hype, but this post does a pretty good job highlighting some of the large improvements.Import speed
From my own experimentation, importing
polars
is almost 4 times faster than importingpandas
. This would work to improve things like thelooper
cli import issues: pepkit/looper#476Interface with
genimtools
Genimtools is native-Rust with
pyo3
bindings.polars
follows this model as well. Because of this, the integration ofpeppy
objects withgenimtools
becomes seamless. In fact, there is an entire crate maintained by thepolars
group dedicated to this interface.This sets the stage for processing PEPs and their data in
genimtools
, further improving server speeds for real time PEP editing.eido
comes to mind as a potential bottleneck with real-time PEP editing.Potential downsides
I think some downsides to such a switch are:
polars
is new, and not as "battle-tested" aspandas
.polars
breaks down when you want to do data visualization as libraries likematplotlib
don't natively support it.peppy
The text was updated successfully, but these errors were encountered: