Releases: databricks/spark-xml
v0.18.0
What's Changed
- Use defined timezone on write for formats that need TZ info by @srowen in #665
- Add notes about file extensions and _corrupt_record to documentation by @dolfinus in #674
- Fix for xml expression to not parse arbitrary strings by @xanderbailey in #679
- Update for 0.18.0, move CICD configs to supported Spark versions by @srowen in #680
New Contributors
- @dolfinus made their first contribution in #674
- @xanderbailey made their first contribution in #679
Full Changelog: v0.17.0...v0.18.0
Version 0.17.0
- Improve handling of XSD complex type, decimal (#631, #638)
- Restore behavior of ignoreSurroundingSpaces (#637)
- Improve schema inference performance (#660)
- Fix corner case of double/float type inference (#644)
See https://github.com/databricks/spark-xml/milestone/14?closed=1
Note that this is intended to be the final stand-alone release of spark-xml, as it is being incorporated into Apache Spark 4.0.
Version 0.16.0
- Minor bug fixes
- Custom timestamp formats now use session timezone when not specified in the format/input (#621)
- Some "ref" elements work in XSD schemas now ((#619)
- 'arrayElementName' can be used to control the schema name used for array elements when writing (#603)
See https://github.com/databricks/spark-xml/milestone/13?closed=1
Version 0.15.0
This is a minor bug fix release, primarily for:
- #582 Fix a Hadoop conf bug that interferes with running multiple separate spark-xml reads/write jobs concurrently
See also:
Version 0.14.0
This release is primarily to support Spark 3.2.0 and Scala 2.13. Support for Scala 2.11, previously deprecated, is removed. Spark 2 is not officially supported now, but should continue to work with Scala 2.12 builds.
It includes one new feature, otherwise:
- Control XML declaration in XML output (#560)
See https://github.com/databricks/spark-xml/issues?q=is%3Aclosed+milestone%3A0.14.0
Version 0.13.0
This is a minor bug fix release; see https://github.com/databricks/spark-xml/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.13.0
Version 0.12.0
- Fixed schema inference for date types (#521)
- Fixed some type inferences of primitive types (int vs long) from XSDs) (#522)
- Fixed parsing of partial result when a row fails to parse (#518)
- Fixed bug in parsing missing optional child tags in certain situations (#513)
- Fixed parsing of non-UTF-8 XML data (#511)
- Added support for additional timestampFormat, dateFormat format for reading, writing timestamp / date in XML
https://github.com/databricks/spark-xml/milestone/9?closed=1
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.12.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.12.0/
Version 0.11.0
- Reading:
- Support for 'wildcard' columns (
wildcardColName
) matching anything, corresponding to XSDxs:any
types - Can optionally ignore namespace prefixes with
ignoreNamespace
- MapType columns now read attributes correctly
- Support for 'wildcard' columns (
- Writing:
- Root tag can have attributes
- Timestamp output format now follows XML standards
- Minor fixes and improvements to XSD schema support
Changes: https://github.com/databricks/spark-xml/milestone/8?closed=1
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.11.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.11.0/
Version 0.10.0
Highlights:
- Bug fix: in rare cases, parsing an uncompressed XML file could miss a record. (#468)
- Bug fix: parsing XML subtree as string field would lose attributes (#469)
- Feature: experimental support for inferring a Spark schema from an XSD (#457)
- Other minor bug fixes
Changes: https://github.com/databricks/spark-xml/milestone/7?closed=1
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.11/0.10.0/
https://repo1.maven.org/maven2/com/databricks/spark-xml_2.12/0.10.0/
Version 0.9.0
Highlights:
- Support XSD validation in
from_xml
(#433) - Don't ignore unclosed tag content (#437)
- Helper functions to support manually using
from_xml
, etc from Python (#438)
Changes: https://github.com/databricks/spark-xml/milestone/6?closed=1