Release Version 1.8.0 · databricks/koalas

Koalas 1.8.0 is the last minor release because Koalas will be officially included in PySpark in the upcoming Apache Spark 3.2. In Apache Spark 3.2+, please use Apache Spark directly.

Categorical type and `ExtensionDtype`

We added the support of pandas' categorical type (#2064, #2106).

>>> s = ks.Series(list("abbccc"), dtype="category")
>>> s
0    a
1    b
2    b
3    c
4    c
5    c
dtype: category
Categories (3, object): ['a', 'b', 'c']
>>> s.cat.categories
Index(['a', 'b', 'c'], dtype='object')
>>> s.cat.codes
0    0
1    1
2    1
3    2
4    2
5    2
dtype: int8
>>> idx = ks.CategoricalIndex(list("abbccc"))
>>> idx
CategoricalIndex(['a', 'b', 'b', 'c', 'c', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

>>> idx.codes
Int64Index([0, 1, 1, 2, 2, 2], dtype='int64')
>>> idx.categories
Index(['a', 'b', 'c'], dtype='object')

and ExtensionDtype as type arguments to annotate return types (#2120, #2123, #2132, #2127, #2126, #2125, #2124):

def func() -> ks.Series[pd.Int32Dtype()]:
    ...

Other new features, improvements and bug fixes

We added the following new features:

DataFrame:

first (#2128)
at_time (#2116)

Series:

at_time (#2130)
first (#2128)
between_time (#2129)

DatetimeIndex:

indexer_between_time (#2104)
indexer_at_time (#2109)
between_time (#2111)

Along with the following fixes:

Support tuple to (DataFrame|Series).replace() (#2095)
Check index_dtype and data_dtypes more strictly. (#2100)
Return actual values via toPandas. (#2077)
Add lines and orient to read_json and to_json to improve error message (#2110)
Fix isin to accept numpy array (#2103)
Allow multi-index column names for inferring return type schema with names. (#2117)
Add a short JDBC user guide (#2148)
Remove upper bound pandas 1.2 (#2141)
Standardize exceptions of arithmetic operations on Datetime-like data (#2101)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.8.0

Categorical type and `ExtensionDtype`

Other new features, improvements and bug fixes

Version 1.8.0

Categorical type and ExtensionDtype

Other new features, improvements and bug fixes

Categorical type and `ExtensionDtype`