tidypolars

Submodules

Package Contents

Classes

Tibble

A data frame object that provides methods familiar to R tidyverse users.

Functions

abs(x)

Absolute value

across(cols[, fn, names_prefix])

Apply a function across a selection of columns

case_when(expr)

Case when

coalesce(*args)

Coalesce missing values

floor(x)

Round numbers down to the lower integer

if_else(condition, true, false)

If Else

lag(x[, n, default])

Get lagging values

lead(x[, n, default])

Get leading values

log(x)

Compute the natural logarithm of a column

log10(x)

Compute the base 10 logarithm of a column

read_csv(file, *args, **kwargs)

Simple wrapper around polars.read_csv

read_parquet(source, *args, **kwargs)

Simple wrapper around polars.read_parquet

rep(x[, times])

Replicate the values in x

replace_null(x[, replace])

Replace null values

round(x[, decimals])

Get column standard deviation

row_number()

Return row number

sqrt(x)

Get column square root

cor(x, y[, method])

Find the correlation of two columns

cov(x, y)

Find the covariance of two columns

count(x)

Number of observations in each group

first(x)

Get first value

last(x)

Get last value

length(x)

Number of observations in each group

max(x)

Get column max

mean(x)

Get column mean

median(x)

Get column median

min(x)

Get column minimum

n()

Number of observations in each group

n_distinct(x)

Get number of distinct values in a column

quantile(x[, quantile])

Get number of distinct values in a column

sd(x)

Get column standard deviation

sum(x)

Get column sum

var(x)

Get column variance

between(x, left, right)

Test if values of a column are between two values

is_finite(x)

Test if values of a column are finite

is_in(x, y)

Test if values of a column are in a list of values

is_infinite(x)

Test if values of a column are infinite

is_nan(x)

Test if values of a column are nan

is_not(x)

Flip values of a boolean series

is_not_in(x, y)

Test if values of a column are not in a list of values

is_not_null(x)

Test if values of a column are not null

is_null(x)

Test if values of a column are null

as_boolean(x)

Convert to a boolean

as_float(x)

Convert to float. Defaults to Float64.

as_integer(x)

Convert to integer. Defaults to Int64.

as_string(x)

Convert to string. Defaults to Utf8.

cast(x, dtype)

General type conversion.

as_date(x[, fmt])

Convert a string to a Date

as_datetime(x[, fmt])

Convert a string to a Datetime

hour(x)

Extract the hour from a datetime

make_date([year, month, day])

Create a date object

make_datetime([year, month, day, hour, minute, second])

Create a datetime object

mday(x)

Extract the month day from a date from 1 to 31.

minute(x)

Extract the minute from a datetime

month(x)

Extract the month from a date

quarter(x)

Extract the quarter from a date

dt_round(x, rule, n)

Round the datetime

second(x)

Extract the second from a datetime

wday(x)

Extract the weekday from a date from sunday = 1 to saturday = 7.

week(x)

Extract the week from a date

yday(x)

Extract the year day from a date from 1 to 366.

year(x)

Extract the year from a date

paste(*args[, sep])

Concatenate strings together

paste0(*args)

Concatenate strings together with no separator

str_c(*args[, sep])

Concatenate strings together

str_detect(string, pattern[, negate])

Detect the presence or absence of a pattern in a string

str_extract(string, pattern)

Extract the target capture group from provided patterns

str_length(string)

Length of a string

str_remove_all(string, pattern)

Removes all matched patterns in a string

str_remove(string, pattern)

Removes the first matched patterns in a string

str_replace_all(string, pattern, replacement)

Replaces all matched patterns in a string

str_replace(string, pattern, replacement)

Replaces the first matched patterns in a string

str_ends(string, pattern[, negate])

Detect the presence or absence of a pattern at the end of a string.

str_starts(string, pattern[, negate])

Detect the presence or absence of a pattern at the beginning of a string.

str_sub(string[, start, end])

Extract portion of string based on start and end inputs

str_to_lower(string)

Convert case of a string

str_to_upper(string)

Convert case of a string

str_trim(string[, side])

Trim whitespace

desc(x)

Mark a column to order in descending

from_pandas(df)

Convert from pandas DataFrame to Tibble

from_polars(df)

Convert from polars DataFrame to Tibble

contains(match[, ignore_case])

Contains a literal string

ends_with(match[, ignore_case])

Ends with a suffix

everything()

Selects all columns

starts_with(match[, ignore_case])

Starts with a prefix

Attributes

__version__

col

exclude

lit

Expr

Series

Int8

Int16

Int32

Int64

UInt8

UInt16

UInt32

UInt64

Float32

Float64

Boolean

Utf8

List

Date

Datetime

Object

__all__

__version__
abs(x)[source]

Absolute value

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(abs_x = tp.abs('x'))
>>> df.mutate(abs_x = tp.abs(col('x')))
across(cols, fn=lambda x: ..., names_prefix=None)[source]

Apply a function across a selection of columns

Parameters:
  • cols (list) – Columns to operate on

  • fn (lambda) – A function or lambda to apply to each column

  • names_prefix (Optional - str) – Prefix to append to changed columns

Examples

>>> df = tp.Tibble(x = ['a', 'a', 'b'], y = range(3), z = range(3))
>>> df.mutate(across(['y', 'z'], lambda x: x * 2))
>>> df.mutate(across(tp.Int64, lambda x: x * 2, names_prefix = "double_"))
>>> df.summarize(across(['y', 'z'], tp.mean), by = 'x')
case_when(expr)[source]

Case when

Parameters:

expr (Expr) – A logical expression

Examples

>>> df = tp.Tibble(x = range(1, 4))
>>> df.mutate(
>>>    case_x = tp.case_when(col('x') < 2).then(1)
>>>             .when(col('x') < 3).then(2)
>>>             .otherwise(0)
>>> )
coalesce(*args)[source]

Coalesce missing values

Parameters:

args (Expr) – Columns to coalesce

Examples

>>> df.mutate(abs_x = tp.cast(col('x'), tp.Float64))
floor(x)[source]

Round numbers down to the lower integer

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(floor_x = tp.floor(col('x')))
if_else(condition, true, false)[source]

If Else

Parameters:
  • condition (Expr) – A logical expression

  • true – Value if the condition is true

  • false – Value if the condition is false

Examples

>>> df = tp.Tibble(x = range(1, 4))
>>> df.mutate(if_x = tp.if_else(col('x') < 2, 1, 2))
lag(x, n: int = 1, default=None)[source]

Get lagging values

Parameters:
  • x (Expr, Series) – Column to operate on

  • n (int) – Number of positions to lag by

  • default (optional) – Value to fill in missing values

Examples

>>> df.mutate(lag_x = tp.lag(col('x')))
>>> df.mutate(lag_x = tp.lag('x'))
lead(x, n: int = 1, default=None)[source]

Get leading values

Parameters:
  • x (Expr, Series) – Column to operate on

  • n (int) – Number of positions to lead by

  • default (optional) – Value to fill in missing values

Examples

>>> df.mutate(lead_x = tp.lead(col('x')))
>>> df.mutate(lead_x = col('x').lead())
log(x)[source]

Compute the natural logarithm of a column

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log('x'))
log10(x)[source]

Compute the base 10 logarithm of a column

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log10('x'))
read_csv(file: str, *args, **kwargs)[source]

Simple wrapper around polars.read_csv

read_parquet(source: str, *args, **kwargs)[source]

Simple wrapper around polars.read_parquet

rep(x, times=1)[source]

Replicate the values in x

Parameters:
  • x (const, Series) – Value or Series to repeat

  • times (int) – Number of times to repeat

Examples

>>> tp.rep(1, 3)
>>> tp.rep(pl.Series(range(3)), 3)
replace_null(x, replace=None)[source]

Replace null values

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = [0, None], y = [None, None])
>>> df.mutate(x = tp.replace_null(col('x'), 1))
round(x, decimals=0)[source]

Get column standard deviation

Parameters:
  • x (Expr, Series) – Column to operate on

  • decimals (int) – Decimals to round to

Examples

>>> df.mutate(x = tp.round(col('x')))
row_number()[source]

Return row number

Examples

>>> df.mutate(row_num = tp.row_number())
sqrt(x)[source]

Get column square root

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(sqrt_x = tp.sqrt('x'))
cor(x, y, method='pearson')[source]

Find the correlation of two columns

Parameters:
  • x (Expr) – A column

  • y (Expr) – A column

  • method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.

Examples

>>> df.summarize(cor = tp.cor(col('x'), col('y')))
cov(x, y)[source]

Find the covariance of two columns

Parameters:
  • x (Expr) – A column

  • y (Expr) – A column

Examples

>>> df.summarize(cor = tp.cov(col('x'), col('y')))
count(x)[source]

Number of observations in each group

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(count = tp.count(col('x')))
first(x)[source]

Get first value

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(first_x = tp.first('x'))
>>> df.summarize(first_x = tp.first(col('x')))
last(x)[source]

Get last value

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(last_x = tp.last('x'))
>>> df.summarize(last_x = tp.last(col('x')))
length(x)[source]

Number of observations in each group

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(length = tp.length(col('x')))
max(x)[source]

Get column max

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(max_x = tp.max('x'))
>>> df.summarize(max_x = tp.max(col('x')))
mean(x)[source]

Get column mean

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(mean_x = tp.mean('x'))
>>> df.summarize(mean_x = tp.mean(col('x')))
median(x)[source]

Get column median

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(median_x = tp.median('x'))
>>> df.summarize(median_x = tp.median(col('x')))
min(x)[source]

Get column minimum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.min('x'))
>>> df.summarize(min_x = tp.min(col('x')))
n()[source]

Number of observations in each group

Examples

>>> df.summarize(count = tp.n())
n_distinct(x)[source]

Get number of distinct values in a column

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.n_distinct('x'))
>>> df.summarize(min_x = tp.n_distinct(col('x')))
quantile(x, quantile=0.5)[source]

Get number of distinct values in a column

Parameters:
  • x (Expr, Series) – Column to operate on

  • quantile (float) – Quantile to return

Examples

>>> df.summarize(quantile_x = tp.quantile('x', .25))
sd(x)[source]

Get column standard deviation

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sd_x = tp.sd('x'))
>>> df.summarize(sd_x = tp.sd(col('x')))
sum(x)[source]

Get column sum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.sum('x'))
>>> df.summarize(sum_x = tp.sum(col('x')))
var(x)[source]

Get column variance

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.var('x'))
>>> df.summarize(sum_x = tp.var(col('x')))
between(x, left, right)[source]

Test if values of a column are between two values

Parameters:
  • x (Expr, Series) – Column to operate on

  • left (int) – Value to test if column is greater than or equal to

  • right (int) – Value to test if column is less than or equal to

Examples

>>> df = tp.Tibble(x = range(4))
>>> df.filter(tp.between(col('x'), 1, 3))
is_finite(x)[source]

Test if values of a column are finite

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = [1.0, float('inf')])
>>> df.filter(tp.is_finite(col('x')))
is_in(x, y)[source]

Test if values of a column are in a list of values

Parameters:
  • x (Expr, Series) – Column to operate on

  • y (list) – List to test against

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_in(col('x'), [1, 2]))
is_infinite(x)[source]

Test if values of a column are infinite

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = [1.0, float('inf')])
>>> df.filter(tp.is_infinite(col('x')))
is_nan(x)[source]

Test if values of a column are nan

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_nan(col('x')))
is_not(x)[source]

Flip values of a boolean series

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not(col('x') < 2))
is_not_in(x, y)[source]

Test if values of a column are not in a list of values

Parameters:
  • x (Expr, Series) – Column to operate on

  • y (list) – List to test against

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))
is_not_null(x)[source]

Test if values of a column are not null

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))
is_null(x)[source]

Test if values of a column are null

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))
as_boolean(x)[source]

Convert to a boolean

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(bool_x = tp.as_boolean(col('x')))
as_float(x)[source]

Convert to float. Defaults to Float64.

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(float_x = tp.as_float(col('x')))
as_integer(x)[source]

Convert to integer. Defaults to Int64.

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(int_x = tp.as_integer(col('x')))
as_string(x)[source]

Convert to string. Defaults to Utf8.

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(string_x = tp.as_string(col('x')))
cast(x, dtype)[source]

General type conversion.

Parameters:
  • x (Expr, Series) – Column to operate on

  • dtype (DataType) – Type to convert to

Examples

>>> df.mutate(abs_x = tp.cast(col('x'), tp.Float64))
as_date(x, fmt=None)[source]

Convert a string to a Date

Parameters:
  • x (Expr, Series) – Column to operate on

  • fmt (str) – “yyyy-mm-dd”

Examples

>>> df = tp.Tibble(x = ['2021-01-01', '2021-10-01'])
>>> df.mutate(date_x = tp.as_date(col('x')))
as_datetime(x, fmt=None)[source]

Convert a string to a Datetime

Parameters:
  • x (Expr, Series) – Column to operate on

  • fmt (str) – “yyyy-mm-dd”

Examples

>>> df = tp.Tibble(x = ['2021-01-01', '2021-10-01'])
>>> df.mutate(date_x = tp.as_datetime(col('x')))
hour(x)[source]

Extract the hour from a datetime

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.as_hour(col('x')))
make_date(year=1970, month=1, day=1)[source]

Create a date object

Parameters:
  • year (Expr, str, int) – Column or literal

  • month (Expr, str, int) – Column or literal

  • day (Expr, str, int) – Column or literal

Examples

>>> df.mutate(date = tp.make_date(2000, 1, 1))
make_datetime(year=1970, month=1, day=1, hour=0, minute=0, second=0)[source]

Create a datetime object

Parameters:
  • year (Expr, str, int) – Column or literal

  • month (Expr, str, int) – Column or literal

  • day (Expr, str, int) – Column or literal

  • hour (Expr, str, int) – Column or literal

  • minute (Expr, str, int) – Column or literal

  • second (Expr, str, int) – Column or literal

Examples

>>> df.mutate(date = tp.make_datetime(2000, 1, 1))
mday(x)[source]

Extract the month day from a date from 1 to 31.

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(monthday = tp.mday(col('x')))
minute(x)[source]

Extract the minute from a datetime

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.minute(col('x')))
month(x)[source]

Extract the month from a date

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(year = tp.month(col('x')))
quarter(x)[source]

Extract the quarter from a date

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(quarter = tp.quarter(col('x')))
dt_round(x, rule, n)[source]

Round the datetime

Parameters:
  • x (Expr, Series) – Column to operate on

  • rule (str) –

    Units of the downscaling operation. Any of:

    • ”month”

    • ”week”

    • ”day”

    • ”hour”

    • ”minute”

    • ”second”

  • n (int) – Number of units (e.g. 5 “day”, 15 “minute”.

Examples

>>> df.mutate(monthday = tp.mday(col('x')))
second(x)[source]

Extract the second from a datetime

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.minute(col('x')))
wday(x)[source]

Extract the weekday from a date from sunday = 1 to saturday = 7.

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(weekday = tp.wday(col('x')))
week(x)[source]

Extract the week from a date

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(week = tp.week(col('x')))
yday(x)[source]

Extract the year day from a date from 1 to 366.

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(yearday = tp.yday(col('x')))
year(x)[source]

Extract the year from a date

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(year = tp.year(col('x')))
col[source]
exclude[source]
lit[source]
Expr[source]
Series[source]
Int8[source]
Int16[source]
Int32[source]
Int64[source]
UInt8[source]
UInt16[source]
UInt32[source]
UInt64[source]
Float32[source]
Float64[source]
Boolean[source]
Utf8[source]
List[source]
Date[source]
Datetime[source]
Object[source]
paste(*args, sep=' ')[source]

Concatenate strings together

Parameters:

args (Expr, str) – Columns and or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(x_end = tp.paste(col('x'), 'end', sep = '_'))
paste0(*args)[source]

Concatenate strings together with no separator

Parameters:

args (Expr, str) – Columns and or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(xend = tp.paste0(col('x'), 'end'))
str_c(*args, sep='')[source]

Concatenate strings together

Parameters:

args (Expr, str) – Columns and/or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(x_end = str_c(col('x'), 'end', sep = '_'))
str_detect(string, pattern, negate=False)[source]

Detect the presence or absence of a pattern in a string

Parameters:
  • string (str) – Input series to operate on

  • pattern (str) – Pattern to look for

  • negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_detect('name', 'a'))
>>> df.mutate(x = str_detect('name', ['a', 'e']))
str_extract(string, pattern)[source]

Extract the target capture group from provided patterns

Parameters:
  • string (str) – Input series to operate on

  • pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_extract(col('name'), 'e'))
str_length(string)[source]

Length of a string

Parameters:

string (str) – Input series to operate on

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_length(col('name')))
str_remove_all(string, pattern)[source]

Removes all matched patterns in a string

Parameters:
  • string (str) – Input series to operate on

  • pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_remove_all(col('name'), 'a'))
str_remove(string, pattern)[source]

Removes the first matched patterns in a string

Parameters:
  • string (str) – Input series to operate on

  • pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_remove(col('name'), 'a'))
str_replace_all(string, pattern, replacement)[source]

Replaces all matched patterns in a string

Parameters:
  • string (str) – Input series to operate on

  • pattern (str) – Pattern to look for

  • replacement (str) – String that replaces anything that matches the pattern

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_replace_all(col('name'), 'a', 'A'))
str_replace(string, pattern, replacement)[source]

Replaces the first matched patterns in a string

Parameters:
  • string (str) – Input series to operate on

  • pattern (str) – Pattern to look for

  • replacement (str) – String that replaces anything that matches the pattern

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_replace(col('name'), 'a', 'A'))
str_ends(string, pattern, negate=False)[source]

Detect the presence or absence of a pattern at the end of a string.

Parameters:
  • string (Expr) – Column to operate on

  • pattern (str) – Pattern to look for

  • negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing'])
>>> df.filter(tp.str_ends(col('words'), 'ing'))
str_starts(string, pattern, negate=False)[source]

Detect the presence or absence of a pattern at the beginning of a string.

Parameters:
  • string (Expr) – Column to operate on

  • pattern (str) – Pattern to look for

  • negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing'])
>>> df.filter(tp.str_starts(col('words'), 'a'))
str_sub(string, start=0, end=None)[source]

Extract portion of string based on start and end inputs

Parameters:
  • string (str) – Input series to operate on

  • start (int) – First position of the character to return

  • end (int) – Last position of the character to return

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_sub(col('name'), 0, 3))
str_to_lower(string)[source]

Convert case of a string

Parameters:

string (str) – Convert case of this string

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_to_lower(col('name')))
str_to_upper(string)[source]

Convert case of a string

Parameters:

string (str) – Convert case of this string

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_to_upper(col('name')))
str_trim(string, side='both')[source]

Trim whitespace

Parameters:
  • string (Expr, Series) – Column or series to operate on

  • side (str) –

    One of:
    • ”both”

    • ”left”

    • ”right”

Examples

>>> df = tp.Tibble(x = [' a ', ' b ', ' c '])
>>> df.mutate(x = tp.str_trim(col('x')))
class Tibble(_data=None, **kwargs)[source]

Bases: tidypolars.reexports.pl.DataFrame

A data frame object that provides methods familiar to R tidyverse users.

property names

Get column names

Examples

>>> df.names
property ncol

Get number of columns

Examples

>>> df.ncol
property nrow

Get number of rows

Examples

>>> df.nrow
__repr__()[source]

Printing method

_repr_html_()[source]

Printing method for jupyter

Output rows and columns can be modified by setting the following ENVIRONMENT variables:

  • POLARS_FMT_MAX_COLS: set the number of columns

  • POLARS_FMT_MAX_ROWS: set the number of rows

__copy__()[source]
__str__()[source]

Printing method

__getattribute__(attr)[source]

Return getattr(self, name).

__dir__()[source]

Default dir() implementation.

arrange(*args)[source]

Arrange/sort rows

Parameters:

*args (str) – Columns to sort by

Examples

>>> df = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> # Arrange in ascending order
>>> df.arrange('x', 'y')
...
>>> # Arrange some columns descending
>>> df.arrange(tp.desc('x'), 'y')
bind_cols(*args)[source]

Bind data frames by columns

Parameters:

df (Tibble) – Data frame to bind

Examples

>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.Tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)})
>>> df1.bind_cols(df2)
bind_rows(*args)[source]

Bind data frames by row

Parameters:

*args (Tibble, list) – Data frames to bind by row

Examples

>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.Tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)})
>>> df1.bind_rows(df2)
clone()[source]

Very cheap deep clone

count(*args, sort=False, name='n')[source]

Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.

Parameters:
  • *args (str, Expr) – Columns to group by

  • sort (bool) – Should columns be ordered in descending order by count

  • name (str) – The name of the new column in the output. If omitted, it will default to “n”.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.count()
>>> df.count('b')
distinct(*args)[source]

Select distinct/unique rows

Parameters:

*args (str, Expr) – Columns to find distinct/unique rows

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.distinct()
>>> df.distinct('b')
drop(*args)[source]

Drop unwanted columns

Parameters:

*args (str) – Columns to drop

Examples

>>> df.drop('x', 'y')
drop_null(*args)[source]

Drop rows containing missing values

Parameters:

*args (str) – Columns to drop nulls from (defaults to all)

Examples

>>> df = tp.Tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)}
>>> df.drop_null()
>>> df.drop_null('x', 'y')
head(n=5, *, by=None)[source]

Alias for .slice_head()

fill(*args, direction='down', by=None)[source]

Fill in missing values with previous or next value

Parameters:
  • *args (str) – Columns to fill

  • direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': [1, None, 3, 4, 5],
...                 'b': [None, 2, None, None, 5],
...                 'groups': ['a', 'a', 'a', 'b', 'b']})
>>> df.fill('a', 'b')
>>> df.fill('a', 'b', by = 'groups')
>>> df.fill('a', 'b', direction = 'downup')
filter(*args, by=None)[source]

Filter rows on one or more conditions

Parameters:
  • *args (Expr) – Conditions to filter by

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.filter(col('a') < 2, col('b') == 'a')
>>> df.filter((col('a') < 2) & (col('b') == 'a'))
>>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')
frame_equal(other, null_equal=True)[source]

Check if two Tibbles are equal

inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]

Perform an inner join

Parameters:
  • df (Tibble) – Lazy DataFrame to join with.

  • left_on (str, list) – Join column(s) of the left DataFrame.

  • right_on (str, list) – Join column(s) of the right DataFrame.

  • on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.

  • suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.inner_join(df2)
>>> df1.inner_join(df2, on = 'x')
>>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')
left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]

Perform a left join

Parameters:
  • df (Tibble) – Lazy DataFrame to join with.

  • left_on (str, list) – Join column(s) of the left DataFrame.

  • right_on (str, list) – Join column(s) of the right DataFrame.

  • on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.

  • suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.left_join(df2)
>>> df1.left_join(df2, on = 'x')
>>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')
mutate(*args, by=None, **kwargs)[source]

Add or modify columns

Parameters:
  • *args (Expr) – Column expressions to add or modify

  • by (str, list) – Columns to group by

  • **kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']})
>>> df.mutate(double_a = col('a') * 2,
...           a_plus_b = col('a') + col('b'))
>>> df.mutate(row_num = row_number(), by = 'c')
full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]

Perform an full join

Parameters:
  • df (Tibble) – Lazy DataFrame to join with.

  • left_on (str, list) – Join column(s) of the left DataFrame.

  • right_on (str, list) – Join column(s) of the right DataFrame.

  • on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.

  • suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.full_join(df2)
>>> df1.full_join(df2, on = 'x')
>>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')
pivot_longer(cols=everything(), names_to='name', values_to='value')[source]

Pivot data from wide to long

Parameters:
  • cols (Expr) – List of the columns to pivot. Defaults to all columns.

  • names_to (str) – Name of the new “names” column.

  • values_to (str) – Name of the new “values” column

Examples

>>> df = tp.Tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]})
>>> df.pivot_longer(cols = ['a', 'b'])
>>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')
pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]

Pivot data from long to wide

Parameters:
  • names_from (str) – Column to get the new column names from.

  • values_from (str) – Column to get the new column values from

  • id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.

  • values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’

  • values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”

Examples

>>> df = tp.Tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]})
>>> df.pivot_wider(names_from = 'variable', values_from = 'value')
pull(var=None)[source]

Extract a column as a series

Parameters:

var (str) – Name of the column to extract. Defaults to the last column.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3))
>>> df.pull('a')
relocate(*args, before=None, after=None)[source]

Move a column or columns to a new position

Parameters:

*args (str, Expr) – Columns to move

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.relocate('a', before = 'c')
>>> df.relocate('b', after = 'c')
rename(*args, **kwargs)[source]

Rename columns

Parameters:
  • *args (dict) – Dictionary mapping of new names

  • **kwargs (str) – key-value pair of new name from old name

Examples

>>> df = tp.Tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']})
>>> df.rename(new_x = 'x') # dplyr interface
>>> df.rename({'x': 'new_x'}) # pandas interface
replace_null(replace=None)[source]

Replace null values

Parameters:

replace (dict) – Dictionary of column/replacement pairs

Examples

>>> df = tp.Tibble(x = [0, None], y = [None, None])
>>> df.replace_null(dict(x = 1, y = 2))
separate(sep_col, into, sep='_', remove=True)[source]

Separate a character column into multiple columns

Parameters:
  • sep_col (str) – Column to split into multiple columns

  • into (list) – List of new column names

  • sep (str) – Separator to split on. Default to ‘_’

  • remove (bool) – If True removes the input column from the output data frame

Examples

>>> df = tp.Tibble(x = ['a_a', 'b_b', 'c_c'])
>>> df.separate('x', into = ['left', 'right'])
set_names(nm=None)[source]

Change the column names of the data frame

Parameters:

nm (list) – A list of new names for the data frame

Examples

>>> df = tp.Tibble(x = range(3), y = range(3))
>>> df.set_names(['a', 'b'])
select(*args)[source]

Select or drop columns

Parameters:

*args (str, Expr) – Columns to select

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select('a', 'b')
>>> df.select(col('a'), col('b'))
slice(*args, by=None)[source]

Grab rows from a data frame

Parameters:
  • *args (int, list) – Rows to grab

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice(0, 1)
>>> df.slice(0, by = 'c')
slice_head(n=5, *, by=None)[source]

Grab top rows from a data frame

Parameters:
  • n (int) – Number of rows to grab

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_head(2)
>>> df.slice_head(1, by = 'c')
slice_tail(n=5, *, by=None)[source]

Grab bottom rows from a data frame

Parameters:
  • n (int) – Number of rows to grab

  • by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_tail(2)
>>> df.slice_tail(1, by = 'c')
summarise(*args, by=None, **kwargs)[source]

Alias for .summarize()

summarize(*args, by=None, **kwargs)[source]

Aggregate data with summary statistics

Parameters:
  • *args (Expr) – Column expressions to add or modify

  • by (str, list) – Columns to group by

  • **kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.summarize(avg_a = tp.mean(col('a')))
>>> df.summarize(avg_a = tp.mean(col('a')),
...              by = 'c')
>>> df.summarize(avg_a = tp.mean(col('a')),
...              max_b = tp.max(col('b')))
tail(n=5, *, by=None)[source]

Alias for .slice_tail()

to_dict(as_series=True)[source]

Aggregate data with summary statistics

Parameters:

as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists

Examples

>>> df.to_dict()
>>> df.to_dict(as_series = False)
to_pandas()[source]

Convert to a pandas DataFrame

Examples

>>> df.to_pandas()
to_polars()[source]

Convert to a polars DataFrame

Examples

>>> df.to_polars()
unite(col='_united', unite_cols=[], sep='_', remove=True)[source]

Unite multiple columns by pasting strings together

Parameters:
  • col (str) – Name of the new column

  • unite_cols (list) – List of columns to unite

  • sep (str) – Separator to use between values

  • remove (bool) – If True removes input columns from the data frame

Examples

>>> df = tp.Tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3))
>>> df.unite("united_col", unite_cols = ["a", "b"])
write_csv(file=None, has_headers=True, sep=',')[source]

Write a data frame to a csv

write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]

Write a data frame to a parquet

desc(x)[source]

Mark a column to order in descending

from_pandas(df)[source]

Convert from pandas DataFrame to Tibble

Parameters:

df (DataFrame) – pd.DataFrame to convert to a Tibble

Examples

>>> tp.from_pandas(df)
from_polars(df)[source]

Convert from polars DataFrame to Tibble

Parameters:

df (DataFrame) – pl.DataFrame to convert to a Tibble

Examples

>>> tp.from_polars(df)
contains(match, ignore_case=True)[source]

Contains a literal string

Parameters:
  • match (str) – String to match columns

  • ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select(contains('c'))
ends_with(match, ignore_case=True)[source]

Ends with a suffix

Parameters:
  • match (str) – String to match columns

  • ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.Tibble({'a': range(3), 'b_code': range(3), 'c_code': ['a', 'a', 'b']})
>>> df.select(ends_with('code'))
everything()[source]

Selects all columns

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select(everything())
starts_with(match, ignore_case=True)[source]

Starts with a prefix

Parameters:
  • match (str) – String to match columns

  • ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.Tibble({'a': range(3), 'add': range(3), 'sub': ['a', 'a', 'b']})
>>> df.select(starts_with('a'))
__all__