`tidypolars`¶

Submodules¶

Package Contents¶

Classes¶

Tibble

A data frame object that provides methods familiar to R tidyverse users.

Functions¶

`abs`(x)	Absolute value
`across`(cols[, fn, names_prefix])	Apply a function across a selection of columns
`case_when`(expr)	Case when
`coalesce`(*args)	Coalesce missing values
`floor`(x)	Round numbers down to the lower integer
`if_else`(condition, true, false)	If Else
`lag`(x[, n, default])	Get lagging values
`lead`(x[, n, default])	Get leading values
`log`(x)	Compute the natural logarithm of a column
`log10`(x)	Compute the base 10 logarithm of a column
`read_csv`(file, args, *kwargs)	Simple wrapper around polars.read_csv
`read_parquet`(source, args, *kwargs)	Simple wrapper around polars.read_parquet
`rep`(x[, times])	Replicate the values in x
`replace_null`(x[, replace])	Replace null values
`round`(x[, decimals])	Get column standard deviation
`row_number`()	Return row number
`sqrt`(x)	Get column square root
`cor`(x, y[, method])	Find the correlation of two columns
`cov`(x, y)	Find the covariance of two columns
`count`(x)	Number of observations in each group
`first`(x)	Get first value
`last`(x)	Get last value
`length`(x)	Number of observations in each group
`max`(x)	Get column max
`mean`(x)	Get column mean
`median`(x)	Get column median
`min`(x)	Get column minimum
`n`()	Number of observations in each group
`n_distinct`(x)	Get number of distinct values in a column
`quantile`(x[, quantile])	Get number of distinct values in a column
`sd`(x)	Get column standard deviation
`sum`(x)	Get column sum
`var`(x)	Get column variance
`between`(x, left, right)	Test if values of a column are between two values
`is_finite`(x)	Test if values of a column are finite
`is_in`(x, y)	Test if values of a column are in a list of values
`is_infinite`(x)	Test if values of a column are infinite
`is_nan`(x)	Test if values of a column are nan
`is_not`(x)	Flip values of a boolean series
`is_not_in`(x, y)	Test if values of a column are not in a list of values
`is_not_null`(x)	Test if values of a column are not null
`is_null`(x)	Test if values of a column are null
`as_boolean`(x)	Convert to a boolean
`as_float`(x)	Convert to float. Defaults to Float64.
`as_integer`(x)	Convert to integer. Defaults to Int64.
`as_string`(x)	Convert to string. Defaults to Utf8.
`cast`(x, dtype)	General type conversion.
`as_date`(x[, fmt])	Convert a string to a Date
`as_datetime`(x[, fmt])	Convert a string to a Datetime
`hour`(x)	Extract the hour from a datetime
`make_date`([year, month, day])	Create a date object
`make_datetime`([year, month, day, hour, minute, second])	Create a datetime object
`mday`(x)	Extract the month day from a date from 1 to 31.
`minute`(x)	Extract the minute from a datetime
`month`(x)	Extract the month from a date
`quarter`(x)	Extract the quarter from a date
`dt_round`(x, rule, n)	Round the datetime
`second`(x)	Extract the second from a datetime
`wday`(x)	Extract the weekday from a date from sunday = 1 to saturday = 7.
`week`(x)	Extract the week from a date
`yday`(x)	Extract the year day from a date from 1 to 366.
`year`(x)	Extract the year from a date
`paste`(*args[, sep])	Concatenate strings together
`paste0`(*args)	Concatenate strings together with no separator
`str_c`(*args[, sep])	Concatenate strings together
`str_detect`(string, pattern[, negate])	Detect the presence or absence of a pattern in a string
`str_extract`(string, pattern)	Extract the target capture group from provided patterns
`str_length`(string)	Length of a string
`str_remove_all`(string, pattern)	Removes all matched patterns in a string
`str_remove`(string, pattern)	Removes the first matched patterns in a string
`str_replace_all`(string, pattern, replacement)	Replaces all matched patterns in a string
`str_replace`(string, pattern, replacement)	Replaces the first matched patterns in a string
`str_ends`(string, pattern[, negate])	Detect the presence or absence of a pattern at the end of a string.
`str_starts`(string, pattern[, negate])	Detect the presence or absence of a pattern at the beginning of a string.
`str_sub`(string[, start, end])	Extract portion of string based on start and end inputs
`str_to_lower`(string)	Convert case of a string
`str_to_upper`(string)	Convert case of a string
`str_trim`(string[, side])	Trim whitespace
`desc`(x)	Mark a column to order in descending
`from_pandas`(df)	Convert from pandas DataFrame to Tibble
`from_polars`(df)	Convert from polars DataFrame to Tibble
`contains`(match[, ignore_case])	Contains a literal string
`ends_with`(match[, ignore_case])	Ends with a suffix
`everything`()	Selects all columns
`starts_with`(match[, ignore_case])	Starts with a prefix

Attributes¶

`__version__`
`col`
`exclude`
`lit`
`Expr`
`Series`
`Int8`
`Int16`
`Int32`
`Int64`
`UInt8`
`UInt16`
`UInt32`
`UInt64`
`Float32`
`Float64`
`Boolean`
`Utf8`
`List`
`Date`
`Datetime`
`Object`
`__all__`

__version__¶

abs(x)[source]¶

Absolute value

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(abs_x = tp.abs('x'))
>>> df.mutate(abs_x = tp.abs(col('x')))

across(cols, fn=lambda x: ..., names_prefix=None)[source]¶

Apply a function across a selection of columns

Parameters:

cols (list) – Columns to operate on
fn (lambda) – A function or lambda to apply to each column
names_prefix (Optional - str) – Prefix to append to changed columns

Examples

>>> df = tp.Tibble(x = ['a', 'a', 'b'], y = range(3), z = range(3))
>>> df.mutate(across(['y', 'z'], lambda x: x * 2))
>>> df.mutate(across(tp.Int64, lambda x: x * 2, names_prefix = "double_"))
>>> df.summarize(across(['y', 'z'], tp.mean), by = 'x')

case_when(expr)[source]¶

Case when

Parameters:: expr (Expr) – A logical expression

Examples

>>> df = tp.Tibble(x = range(1, 4))
>>> df.mutate(
>>>    case_x = tp.case_when(col('x') < 2).then(1)
>>>             .when(col('x') < 3).then(2)
>>>             .otherwise(0)
>>> )

coalesce(*args)[source]¶

Coalesce missing values

Parameters:: args (Expr) – Columns to coalesce

Examples

>>> df.mutate(abs_x = tp.cast(col('x'), tp.Float64))

floor(x)[source]¶

Round numbers down to the lower integer

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(floor_x = tp.floor(col('x')))

if_else(condition, true, false)[source]¶

If Else

Parameters:

condition (Expr) – A logical expression
true – Value if the condition is true
false – Value if the condition is false

Examples

>>> df = tp.Tibble(x = range(1, 4))
>>> df.mutate(if_x = tp.if_else(col('x') < 2, 1, 2))

lag(x, n: int = 1, default=None)[source]¶

Get lagging values

Parameters:

x (Expr, Series) – Column to operate on
n (int) – Number of positions to lag by
default (optional) – Value to fill in missing values

Examples

>>> df.mutate(lag_x = tp.lag(col('x')))
>>> df.mutate(lag_x = tp.lag('x'))

lead(x, n: int = 1, default=None)[source]¶

Get leading values

Parameters:

x (Expr, Series) – Column to operate on
n (int) – Number of positions to lead by
default (optional) – Value to fill in missing values

Examples

>>> df.mutate(lead_x = tp.lead(col('x')))
>>> df.mutate(lead_x = col('x').lead())

log(x)[source]¶

Compute the natural logarithm of a column

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log('x'))

log10(x)[source]¶

Compute the base 10 logarithm of a column

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log10('x'))

read_csv(file: str, *args, **kwargs)[source]¶: Simple wrapper around polars.read_csv

read_parquet(source: str, *args, **kwargs)[source]¶: Simple wrapper around polars.read_parquet

rep(x, times=1)[source]¶

Replicate the values in x

Parameters:

x (const, Series) – Value or Series to repeat
times (int) – Number of times to repeat

Examples

>>> tp.rep(1, 3)
>>> tp.rep(pl.Series(range(3)), 3)

replace_null(x, replace=None)[source]¶

Replace null values

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = [0, None], y = [None, None])
>>> df.mutate(x = tp.replace_null(col('x'), 1))

round(x, decimals=0)[source]¶

Get column standard deviation

Parameters:

x (Expr, Series) – Column to operate on
decimals (int) – Decimals to round to

Examples

>>> df.mutate(x = tp.round(col('x')))

row_number()[source]¶

Return row number

Examples

>>> df.mutate(row_num = tp.row_number())

sqrt(x)[source]¶

Get column square root

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(sqrt_x = tp.sqrt('x'))

cor(x, y, method='pearson')[source]¶

Find the correlation of two columns

Parameters:

x (Expr) – A column
y (Expr) – A column
method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.

Examples

>>> df.summarize(cor = tp.cor(col('x'), col('y')))

cov(x, y)[source]¶

Find the covariance of two columns

Parameters:

x (Expr) – A column
y (Expr) – A column

Examples

>>> df.summarize(cor = tp.cov(col('x'), col('y')))

count(x)[source]¶

Number of observations in each group

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(count = tp.count(col('x')))

first(x)[source]¶

Get first value

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(first_x = tp.first('x'))
>>> df.summarize(first_x = tp.first(col('x')))

last(x)[source]¶

Get last value

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(last_x = tp.last('x'))
>>> df.summarize(last_x = tp.last(col('x')))

length(x)[source]¶

Number of observations in each group

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(length = tp.length(col('x')))

max(x)[source]¶

Get column max

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(max_x = tp.max('x'))
>>> df.summarize(max_x = tp.max(col('x')))

mean(x)[source]¶

Get column mean

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(mean_x = tp.mean('x'))
>>> df.summarize(mean_x = tp.mean(col('x')))

median(x)[source]¶

Get column median

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(median_x = tp.median('x'))
>>> df.summarize(median_x = tp.median(col('x')))

min(x)[source]¶

Get column minimum

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.min('x'))
>>> df.summarize(min_x = tp.min(col('x')))

n()[source]¶

Number of observations in each group

Examples

>>> df.summarize(count = tp.n())

n_distinct(x)[source]¶

Get number of distinct values in a column

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.n_distinct('x'))
>>> df.summarize(min_x = tp.n_distinct(col('x')))

quantile(x, quantile=0.5)[source]¶

Get number of distinct values in a column

Parameters:

x (Expr, Series) – Column to operate on
quantile (float) – Quantile to return

Examples

>>> df.summarize(quantile_x = tp.quantile('x', .25))

sd(x)[source]¶

Get column standard deviation

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sd_x = tp.sd('x'))
>>> df.summarize(sd_x = tp.sd(col('x')))

sum(x)[source]¶

Get column sum

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.sum('x'))
>>> df.summarize(sum_x = tp.sum(col('x')))

var(x)[source]¶

Get column variance

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.var('x'))
>>> df.summarize(sum_x = tp.var(col('x')))

between(x, left, right)[source]¶

Test if values of a column are between two values

Parameters:

x (Expr, Series) – Column to operate on
left (int) – Value to test if column is greater than or equal to
right (int) – Value to test if column is less than or equal to

Examples

>>> df = tp.Tibble(x = range(4))
>>> df.filter(tp.between(col('x'), 1, 3))

is_finite(x)[source]¶

Test if values of a column are finite

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = [1.0, float('inf')])
>>> df.filter(tp.is_finite(col('x')))

is_in(x, y)[source]¶

Test if values of a column are in a list of values

Parameters:

x (Expr, Series) – Column to operate on
y (list) – List to test against

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_in(col('x'), [1, 2]))

is_infinite(x)[source]¶

Test if values of a column are infinite

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = [1.0, float('inf')])
>>> df.filter(tp.is_infinite(col('x')))

is_nan(x)[source]¶

Test if values of a column are nan

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_nan(col('x')))

is_not(x)[source]¶

Flip values of a boolean series

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not(col('x') < 2))

is_not_in(x, y)[source]¶

Test if values of a column are not in a list of values

Parameters:

x (Expr, Series) – Column to operate on
y (list) – List to test against

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))

is_not_null(x)[source]¶

Test if values of a column are not null

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))

is_null(x)[source]¶

Test if values of a column are null

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.Tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))

as_boolean(x)[source]¶

Convert to a boolean

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(bool_x = tp.as_boolean(col('x')))

as_float(x)[source]¶

Convert to float. Defaults to Float64.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(float_x = tp.as_float(col('x')))

as_integer(x)[source]¶

Convert to integer. Defaults to Int64.

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(int_x = tp.as_integer(col('x')))

as_string(x)[source]¶

Convert to string. Defaults to Utf8.

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(string_x = tp.as_string(col('x')))

cast(x, dtype)[source]¶

General type conversion.

Parameters:

x (Expr, Series) – Column to operate on
dtype (DataType) – Type to convert to

Examples

>>> df.mutate(abs_x = tp.cast(col('x'), tp.Float64))

as_date(x, fmt=None)[source]¶

Convert a string to a Date

Parameters:

x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”

Examples

>>> df = tp.Tibble(x = ['2021-01-01', '2021-10-01'])
>>> df.mutate(date_x = tp.as_date(col('x')))

as_datetime(x, fmt=None)[source]¶

Convert a string to a Datetime

Parameters:

x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”

Examples

>>> df = tp.Tibble(x = ['2021-01-01', '2021-10-01'])
>>> df.mutate(date_x = tp.as_datetime(col('x')))

hour(x)[source]¶

Extract the hour from a datetime

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.as_hour(col('x')))

make_date(year=1970, month=1, day=1)[source]¶

Create a date object

Parameters:

year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal

Examples

>>> df.mutate(date = tp.make_date(2000, 1, 1))

make_datetime(year=1970, month=1, day=1, hour=0, minute=0, second=0)[source]¶

Create a datetime object

Parameters:

year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal
hour (Expr, str, int) – Column or literal
minute (Expr, str, int) – Column or literal
second (Expr, str, int) – Column or literal

Examples

>>> df.mutate(date = tp.make_datetime(2000, 1, 1))

mday(x)[source]¶

Extract the month day from a date from 1 to 31.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(monthday = tp.mday(col('x')))

minute(x)[source]¶

Extract the minute from a datetime

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.minute(col('x')))

month(x)[source]¶

Extract the month from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(year = tp.month(col('x')))

quarter(x)[source]¶

Extract the quarter from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(quarter = tp.quarter(col('x')))

dt_round(x, rule, n)[source]¶

Round the datetime

Parameters:

x (Expr, Series) – Column to operate on
rule (str) –
Units of the downscaling operation. Any of:
- ”month”
- ”week”
- ”day”
- ”hour”
- ”minute”
- ”second”
n (int) – Number of units (e.g. 5 “day”, 15 “minute”.

Examples

>>> df.mutate(monthday = tp.mday(col('x')))

second(x)[source]¶

Extract the second from a datetime

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.minute(col('x')))

wday(x)[source]¶

Extract the weekday from a date from sunday = 1 to saturday = 7.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(weekday = tp.wday(col('x')))

week(x)[source]¶

Extract the week from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(week = tp.week(col('x')))

yday(x)[source]¶

Extract the year day from a date from 1 to 366.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(yearday = tp.yday(col('x')))

year(x)[source]¶

Extract the year from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(year = tp.year(col('x')))

col[source]¶

exclude[source]¶

lit[source]¶

Expr[source]¶

Series[source]¶

Int8[source]¶

Int16[source]¶

Int32[source]¶

Int64[source]¶

UInt8[source]¶

UInt16[source]¶

UInt32[source]¶

UInt64[source]¶

Float32[source]¶

Float64[source]¶

Boolean[source]¶

Utf8[source]¶

List[source]¶

Date[source]¶

Datetime[source]¶

Object[source]¶

paste(*args, sep=' ')[source]¶

Concatenate strings together

Parameters:: args (Expr, str) – Columns and or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(x_end = tp.paste(col('x'), 'end', sep = '_'))

paste0(*args)[source]¶

Concatenate strings together with no separator

Parameters:: args (Expr, str) – Columns and or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(xend = tp.paste0(col('x'), 'end'))

str_c(*args, sep='')[source]¶

Concatenate strings together

Parameters:: args (Expr, str) – Columns and/or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(x_end = str_c(col('x'), 'end', sep = '_'))

str_detect(string, pattern, negate=False)[source]¶

Detect the presence or absence of a pattern in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_detect('name', 'a'))
>>> df.mutate(x = str_detect('name', ['a', 'e']))

str_extract(string, pattern)[source]¶

Extract the target capture group from provided patterns

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_extract(col('name'), 'e'))

str_length(string)[source]¶

Length of a string

Parameters:: string (str) – Input series to operate on

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_length(col('name')))

str_remove_all(string, pattern)[source]¶

Removes all matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_remove_all(col('name'), 'a'))

str_remove(string, pattern)[source]¶

Removes the first matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_remove(col('name'), 'a'))

str_replace_all(string, pattern, replacement)[source]¶

Replaces all matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_replace_all(col('name'), 'a', 'A'))

str_replace(string, pattern, replacement)[source]¶

Replaces the first matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_replace(col('name'), 'a', 'A'))

str_ends(string, pattern, negate=False)[source]¶

Detect the presence or absence of a pattern at the end of a string.

Parameters:

string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing'])
>>> df.filter(tp.str_ends(col('words'), 'ing'))

str_starts(string, pattern, negate=False)[source]¶

Detect the presence or absence of a pattern at the beginning of a string.

Parameters:

string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing'])
>>> df.filter(tp.str_starts(col('words'), 'a'))

str_sub(string, start=0, end=None)[source]¶

Extract portion of string based on start and end inputs

Parameters:

string (str) – Input series to operate on
start (int) – First position of the character to return
end (int) – Last position of the character to return

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_sub(col('name'), 0, 3))

str_to_lower(string)[source]¶

Convert case of a string

Parameters:: string (str) – Convert case of this string

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_to_lower(col('name')))

str_to_upper(string)[source]¶

Convert case of a string

Parameters:: string (str) – Convert case of this string

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_to_upper(col('name')))

str_trim(string, side='both')[source]¶

Trim whitespace

Parameters:

string (Expr, Series) – Column or series to operate on
side (str) –
One of:
- ”both”
- ”left”
- ”right”

Examples

>>> df = tp.Tibble(x = [' a ', ' b ', ' c '])
>>> df.mutate(x = tp.str_trim(col('x')))

class Tibble(_data=None, **kwargs)[source]¶

Bases: tidypolars.reexports.pl.DataFrame

A data frame object that provides methods familiar to R tidyverse users.

property names¶

Get column names

Examples

>>> df.names

property ncol¶

Get number of columns

Examples

>>> df.ncol

property nrow¶

Get number of rows

Examples

>>> df.nrow

__repr__()[source]¶: Printing method

_repr_html_()[source]¶

Printing method for jupyter

Output rows and columns can be modified by setting the following ENVIRONMENT variables:

POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows

__copy__()[source]¶

__str__()[source]¶: Printing method

__getattribute__(attr)[source]¶: Return getattr(self, name).

__dir__()[source]¶: Default dir() implementation.

arrange(*args)[source]¶

Arrange/sort rows

Parameters:: *args (str) – Columns to sort by

Examples

>>> df = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> # Arrange in ascending order
>>> df.arrange('x', 'y')
...
>>> # Arrange some columns descending
>>> df.arrange(tp.desc('x'), 'y')

bind_cols(*args)[source]¶

Bind data frames by columns

Parameters:: df (Tibble) – Data frame to bind

Examples

>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.Tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)})
>>> df1.bind_cols(df2)

bind_rows(*args)[source]¶

Bind data frames by row

Parameters:: *args (Tibble, list) – Data frames to bind by row

Examples

>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.Tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)})
>>> df1.bind_rows(df2)

clone()[source]¶: Very cheap deep clone

count(*args, sort=False, name='n')[source]¶

Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.

Parameters:

*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.count()
>>> df.count('b')

distinct(*args)[source]¶

Select distinct/unique rows

Parameters:: *args (str, Expr) – Columns to find distinct/unique rows

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.distinct()
>>> df.distinct('b')

drop(*args)[source]¶

Drop unwanted columns

Parameters:: *args (str) – Columns to drop

Examples

>>> df.drop('x', 'y')

drop_null(*args)[source]¶

Drop rows containing missing values

Parameters:: *args (str) – Columns to drop nulls from (defaults to all)

Examples

>>> df = tp.Tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)}
>>> df.drop_null()
>>> df.drop_null('x', 'y')

head(n=5, *, by=None)[source]¶: Alias for .slice_head()

fill(*args, direction='down', by=None)[source]¶

Fill in missing values with previous or next value

Parameters:

*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': [1, None, 3, 4, 5],
...                 'b': [None, 2, None, None, 5],
...                 'groups': ['a', 'a', 'a', 'b', 'b']})
>>> df.fill('a', 'b')
>>> df.fill('a', 'b', by = 'groups')
>>> df.fill('a', 'b', direction = 'downup')

filter(*args, by=None)[source]¶

Filter rows on one or more conditions

Parameters:

*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.filter(col('a') < 2, col('b') == 'a')
>>> df.filter((col('a') < 2) & (col('b') == 'a'))
>>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')

frame_equal(other, null_equal=True)[source]¶: Check if two Tibbles are equal

inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶

Perform an inner join

Parameters:

df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.inner_join(df2)
>>> df1.inner_join(df2, on = 'x')
>>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')

left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶

Perform a left join

Parameters:

df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.left_join(df2)
>>> df1.left_join(df2, on = 'x')
>>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')

mutate(*args, by=None, **kwargs)[source]¶

Add or modify columns

Parameters:

*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']})
>>> df.mutate(double_a = col('a') * 2,
...           a_plus_b = col('a') + col('b'))
>>> df.mutate(row_num = row_number(), by = 'c')

full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶

Perform an full join

Parameters:

df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.full_join(df2)
>>> df1.full_join(df2, on = 'x')
>>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')

pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶

Pivot data from wide to long

Parameters:

cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column

Examples

>>> df = tp.Tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]})
>>> df.pivot_longer(cols = ['a', 'b'])
>>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')

pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶

Pivot data from long to wide

Parameters:

names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”

Examples

>>> df = tp.Tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]})
>>> df.pivot_wider(names_from = 'variable', values_from = 'value')

pull(var=None)[source]¶

Extract a column as a series

Parameters:: var (str) – Name of the column to extract. Defaults to the last column.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3))
>>> df.pull('a')

relocate(*args, before=None, after=None)[source]¶

Move a column or columns to a new position

Parameters:: *args (str, Expr) – Columns to move

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.relocate('a', before = 'c')
>>> df.relocate('b', after = 'c')

rename(*args, **kwargs)[source]¶

Rename columns

Parameters:

*args (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name

Examples

>>> df = tp.Tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']})
>>> df.rename(new_x = 'x') # dplyr interface
>>> df.rename({'x': 'new_x'}) # pandas interface

replace_null(replace=None)[source]¶

Replace null values

Parameters:: replace (dict) – Dictionary of column/replacement pairs

Examples

>>> df = tp.Tibble(x = [0, None], y = [None, None])
>>> df.replace_null(dict(x = 1, y = 2))

separate(sep_col, into, sep='_', remove=True)[source]¶

Separate a character column into multiple columns

Parameters:

sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame

Examples

>>> df = tp.Tibble(x = ['a_a', 'b_b', 'c_c'])
>>> df.separate('x', into = ['left', 'right'])

set_names(nm=None)[source]¶

Change the column names of the data frame

Parameters:: nm (list) – A list of new names for the data frame

Examples

>>> df = tp.Tibble(x = range(3), y = range(3))
>>> df.set_names(['a', 'b'])

select(*args)[source]¶

Select or drop columns

Parameters:: *args (str, Expr) – Columns to select

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select('a', 'b')
>>> df.select(col('a'), col('b'))

slice(*args, by=None)[source]¶

Grab rows from a data frame

Parameters:

*args (int, list) – Rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice(0, 1)
>>> df.slice(0, by = 'c')

slice_head(n=5, *, by=None)[source]¶

Grab top rows from a data frame

Parameters:

n (int) – Number of rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_head(2)
>>> df.slice_head(1, by = 'c')

slice_tail(n=5, *, by=None)[source]¶

Grab bottom rows from a data frame

Parameters:

n (int) – Number of rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_tail(2)
>>> df.slice_tail(1, by = 'c')

summarise(*args, by=None, **kwargs)[source]¶: Alias for .summarize()

summarize(*args, by=None, **kwargs)[source]¶

Aggregate data with summary statistics

Parameters:

*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.summarize(avg_a = tp.mean(col('a')))
>>> df.summarize(avg_a = tp.mean(col('a')),
...              by = 'c')
>>> df.summarize(avg_a = tp.mean(col('a')),
...              max_b = tp.max(col('b')))

tail(n=5, *, by=None)[source]¶: Alias for .slice_tail()

to_dict(as_series=True)[source]¶

Aggregate data with summary statistics

Parameters:: as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists

Examples

>>> df.to_dict()
>>> df.to_dict(as_series = False)

to_pandas()[source]¶

Convert to a pandas DataFrame

Examples

>>> df.to_pandas()

to_polars()[source]¶

Convert to a polars DataFrame

Examples

>>> df.to_polars()

unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶

Unite multiple columns by pasting strings together

Parameters:

col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame

Examples

>>> df = tp.Tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3))
>>> df.unite("united_col", unite_cols = ["a", "b"])

write_csv(file=None, has_headers=True, sep=',')[source]¶: Write a data frame to a csv

write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]¶: Write a data frame to a parquet

desc(x)[source]¶: Mark a column to order in descending

from_pandas(df)[source]¶

Convert from pandas DataFrame to Tibble

Parameters:: df (DataFrame) – pd.DataFrame to convert to a Tibble

Examples

>>> tp.from_pandas(df)

from_polars(df)[source]¶

Convert from polars DataFrame to Tibble

Parameters:: df (DataFrame) – pl.DataFrame to convert to a Tibble

Examples

>>> tp.from_polars(df)

contains(match, ignore_case=True)[source]¶

Contains a literal string

Parameters:

match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select(contains('c'))

ends_with(match, ignore_case=True)[source]¶

Ends with a suffix

Parameters:

match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.Tibble({'a': range(3), 'b_code': range(3), 'c_code': ['a', 'a', 'b']})
>>> df.select(ends_with('code'))

everything()[source]¶

Selects all columns

Examples

>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select(everything())

starts_with(match, ignore_case=True)[source]¶

Starts with a prefix

Parameters:

match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.Tibble({'a': range(3), 'add': range(3), 'sub': ['a', 'a', 'b']})
>>> df.select(starts_with('a'))

__all__¶

tidypolars¶

Submodules¶

Package Contents¶

Classes¶

Functions¶

Attributes¶

`tidypolars`¶