tidypolars¶

Submodules¶

Attributes¶

`__version__`
`col`
`exclude`
`lit`
`Expr`
`Series`
`Int8`
`Int16`
`Int32`
`Int64`
`UInt8`
`UInt16`
`UInt32`
`UInt64`
`Float32`
`Float64`
`Boolean`
`Utf8`
`List`
`Date`
`Datetime`
`Object`
`__all__`

Classes¶

tibble

A data frame object that provides methods familiar to R tidyverse users.

Functions¶

`abs`(x)	Absolute value
`across`(cols[, fn, names_prefix])	Apply a function across a selection of columns
`case_when`(*args[, _default])	Case when
`coalesce`(*args)	Coalesce missing values
`floor`(x)	Round numbers down to the lower integer
`if_else`(condition, true, false)	If Else
`lag`(x[, n, default])	Get lagging values
`lead`(x[, n, default])	Get leading values
`log`(x)	Compute the natural logarithm of a column
`log10`(x)	Compute the base 10 logarithm of a column
`read_csv`(file, args, *kwargs)	Simple wrapper around polars.read_csv
`read_parquet`(source, args, *kwargs)	Simple wrapper around polars.read_parquet
`rep`(x[, times])	Replicate the values in x
`replace_null`(x[, replace])	Replace null values
`round`(x[, decimals])	Get column standard deviation
`row_number`()	Return row number
`sqrt`(x)	Get column square root
`cor`(x, y[, method])	Find the correlation of two columns
`cov`(x, y)	Find the covariance of two columns
`count`(x)	Number of observations in each group
`first`(x)	Get first value
`last`(x)	Get last value
`length`(x)	Number of observations in each group
`max`(x)	Get column max
`mean`(x)	Get column mean
`median`(x)	Get column median
`min`(x)	Get column minimum
`n`()	Number of observations in each group
`n_distinct`(x)	Get number of distinct values in a column
`quantile`(x[, quantile])	Get number of distinct values in a column
`sd`(x)	Get column standard deviation
`sum`(x)	Get column sum
`var`(x)	Get column variance
`between`(x, left, right)	Test if values of a column are between two values
`is_finite`(x)	Test if values of a column are finite
`is_in`(x, y)	Test if values of a column are in a list of values
`is_infinite`(x)	Test if values of a column are infinite
`is_nan`(x)	Test if values of a column are nan
`is_not`(x)	Flip values of a boolean series
`is_not_in`(x, y)	Test if values of a column are not in a list of values
`is_not_null`(x)	Test if values of a column are not null
`is_null`(x)	Test if values of a column are null
`as_boolean`(x)	Convert to a boolean
`as_float`(x)	Convert to float. Defaults to Float64.
`as_integer`(x)	Convert to integer. Defaults to Int64.
`as_string`(x)	Convert to string. Defaults to Utf8.
`cast`(x, dtype)	General type conversion.
`as_date`(x[, format])	Convert a string to a Date
`as_datetime`(x[, format])	Convert a string to a Datetime
`hour`(x)	Extract the hour from a datetime
`make_date`([year, month, day])	Create a date object
`make_datetime`([year, month, day, hour, minute, second])	Create a datetime object
`mday`(x)	Extract the month day from a date from 1 to 31.
`minute`(x)	Extract the minute from a datetime
`month`(x)	Extract the month from a date
`quarter`(x)	Extract the quarter from a date
`dt_round`(x, rule, n)	Round the datetime
`second`(x)	Extract the second from a datetime
`wday`(x)	Extract the weekday from a date from sunday = 1 to saturday = 7.
`week`(x)	Extract the week from a date
`yday`(x)	Extract the year day from a date from 1 to 366.
`year`(x)	Extract the year from a date
`paste`(*args[, sep])	Concatenate strings together
`paste0`(*args)	Concatenate strings together with no separator
`str_c`(*args[, sep])	Concatenate strings together
`str_detect`(string, pattern[, negate])	Detect the presence or absence of a pattern in a string
`str_extract`(string, pattern)	Extract the target capture group from provided patterns
`str_length`(string)	Length of a string
`str_remove_all`(string, pattern)	Removes all matched patterns in a string
`str_remove`(string, pattern)	Removes the first matched patterns in a string
`str_replace_all`(string, pattern, replacement)	Replaces all matched patterns in a string
`str_replace`(string, pattern, replacement)	Replaces the first matched patterns in a string
`str_ends`(string, pattern[, negate])	Detect the presence or absence of a pattern at the end of a string.
`str_starts`(string, pattern[, negate])	Detect the presence or absence of a pattern at the beginning of a string.
`str_sub`(string[, start, end])	Extract portion of string based on start and end inputs
`str_to_lower`(string)	Convert case of a string
`str_to_upper`(string)	Convert case of a string
`str_trim`(string[, side])	Trim whitespace
`as_tibble`(x)	Convert an object to a tibble
`is_tibble`(x)	Is an object to a tibble
`desc`(x)	Mark a column to order in descending
`from_pandas`(df)	Convert from pandas DataFrame to tibble
`from_polars`(df)	Convert from polars DataFrame to tibble
`contains`(match[, ignore_case])	Contains a literal string
`ends_with`(match[, ignore_case])	Ends with a suffix
`everything`()	Selects all columns
`starts_with`(match[, ignore_case])	Starts with a prefix
`where`(col_type)	Select columns by type using a string

Package Contents¶

__version__¶

abs(x)[source]¶

Absolute value

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(abs_x = tp.abs('x'))
>>> df.mutate(abs_x = tp.abs(col('x')))

across(cols, fn=lambda x: ..., names_prefix=None)[source]¶

Apply a function across a selection of columns

Parameters:

cols (list) – Columns to operate on
fn (lambda) – A function or lambda to apply to each column
names_prefix (Optional - str) – Prefix to append to changed columns

Examples

>>> df = tp.tibble(x = ['a', 'a', 'b'], y = range(3), z = range(3))
>>> df.mutate(across(['y', 'z'], lambda x: x * 2))
>>> df.mutate(across(tp.Int64, lambda x: x * 2, names_prefix = "double_"))
>>> df.summarize(across(['y', 'z'], tp.mean), by = 'x')

case_when(*args, _default=pl.Null)[source]¶

Case when

Parameters:: expr (Expr) – A logical expression

Examples

>>> df = tp.tibble(x = range(1, 4))
>>> df.mutate(
>>>    case_x = tp.case_when(col('x') < 2, 1,
>>>                          col('x') < 3, 2,
>>>                          _default = 0)
>>> )

coalesce(*args)[source]¶

Coalesce missing values

Parameters:: args (Expr) – Columns to coalesce

Examples

>>> df.mutate(coalesce_xy = tp.coalesce(col('x'), col('y')))

floor(x)[source]¶

Round numbers down to the lower integer

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(floor_x = tp.floor(col('x')))

if_else(condition, true, false)[source]¶

If Else

Parameters:

condition (Expr) – A logical expression
true – Value if the condition is true
false – Value if the condition is false

Examples

>>> df = tp.tibble(x = range(1, 4))
>>> df.mutate(if_x = tp.if_else(col('x') < 2, 1, 2))

lag(x, n: int = 1, default=None)[source]¶

Get lagging values

Parameters:

x (Expr, Series) – Column to operate on
n (int) – Number of positions to lag by
default (optional) – Value to fill in missing values

Examples

>>> df.mutate(lag_x = tp.lag(col('x')))
>>> df.mutate(lag_x = tp.lag('x'))

lead(x, n: int = 1, default=None)[source]¶

Get leading values

Parameters:

x (Expr, Series) – Column to operate on
n (int) – Number of positions to lead by
default (optional) – Value to fill in missing values

Examples

>>> df.mutate(lead_x = tp.lead(col('x')))
>>> df.mutate(lead_x = col('x').lead())

log(x)[source]¶

Compute the natural logarithm of a column

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log('x'))

log10(x)[source]¶

Compute the base 10 logarithm of a column

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log10('x'))

read_csv(file: str, *args, **kwargs)[source]¶: Simple wrapper around polars.read_csv

read_parquet(source: str, *args, **kwargs)[source]¶: Simple wrapper around polars.read_parquet

rep(x, times=1)[source]¶

Replicate the values in x

Parameters:

x (const, Series) – Value or Series to repeat
times (int) – Number of times to repeat

Examples

>>> tp.rep(1, 3)
>>> tp.rep(pl.Series(range(3)), 3)

replace_null(x, replace=None)[source]¶

Replace null values

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = [0, None], y = [None, None])
>>> df.mutate(x = tp.replace_null(col('x'), 1))

round(x, decimals=0)[source]¶

Get column standard deviation

Parameters:

x (Expr, Series) – Column to operate on
decimals (int) – Decimals to round to

Examples

>>> df.mutate(x = tp.round(col('x')))

row_number()[source]¶

Return row number

Examples

>>> df.mutate(row_num = tp.row_number())

sqrt(x)[source]¶

Get column square root

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(sqrt_x = tp.sqrt('x'))

cor(x, y, method='pearson')[source]¶

Find the correlation of two columns

Parameters:

x (Expr) – A column
y (Expr) – A column
method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.

Examples

>>> df.summarize(cor = tp.cor(col('x'), col('y')))

cov(x, y)[source]¶

Find the covariance of two columns

Parameters:

x (Expr) – A column
y (Expr) – A column

Examples

>>> df.summarize(cov = tp.cov(col('x'), col('y')))

count(x)[source]¶

Number of observations in each group

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(count = tp.count(col('x')))

first(x)[source]¶

Get first value

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(first_x = tp.first('x'))
>>> df.summarize(first_x = tp.first(col('x')))

last(x)[source]¶

Get last value

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(last_x = tp.last('x'))
>>> df.summarize(last_x = tp.last(col('x')))

length(x)[source]¶

Number of observations in each group

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(length = tp.length(col('x')))

max(x)[source]¶

Get column max

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(max_x = tp.max('x'))
>>> df.summarize(max_x = tp.max(col('x')))

mean(x)[source]¶

Get column mean

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(mean_x = tp.mean('x'))
>>> df.summarize(mean_x = tp.mean(col('x')))

median(x)[source]¶

Get column median

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(median_x = tp.median('x'))
>>> df.summarize(median_x = tp.median(col('x')))

min(x)[source]¶

Get column minimum

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.min('x'))
>>> df.summarize(min_x = tp.min(col('x')))

n()[source]¶

Number of observations in each group

Examples

>>> df.summarize(count = tp.n())

n_distinct(x)[source]¶

Get number of distinct values in a column

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.n_distinct('x'))
>>> df.summarize(min_x = tp.n_distinct(col('x')))

quantile(x, quantile=0.5)[source]¶

Get number of distinct values in a column

Parameters:

x (Expr, Series) – Column to operate on
quantile (float) – Quantile to return

Examples

>>> df.summarize(quantile_x = tp.quantile('x', .25))

sd(x)[source]¶

Get column standard deviation

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sd_x = tp.sd('x'))
>>> df.summarize(sd_x = tp.sd(col('x')))

sum(x)[source]¶

Get column sum

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.sum('x'))
>>> df.summarize(sum_x = tp.sum(col('x')))

var(x)[source]¶

Get column variance

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.var('x'))
>>> df.summarize(sum_x = tp.var(col('x')))

between(x, left, right)[source]¶

Test if values of a column are between two values

Parameters:

x (Expr, Series) – Column to operate on
left (int) – Value to test if column is greater than or equal to
right (int) – Value to test if column is less than or equal to

Examples

>>> df = tp.tibble(x = range(4))
>>> df.filter(tp.between(col('x'), 1, 3))

is_finite(x)[source]¶

Test if values of a column are finite

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = [1.0, float('inf')])
>>> df.filter(tp.is_finite(col('x')))

is_in(x, y)[source]¶

Test if values of a column are in a list of values

Parameters:

x (Expr, Series) – Column to operate on
y (list) – List to test against

Examples

>>> df = tp.tibble(x = range(3))
>>> df.filter(tp.is_in(col('x'), [1, 2]))

is_infinite(x)[source]¶

Test if values of a column are infinite

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = [1.0, float('inf')])
>>> df.filter(tp.is_infinite(col('x')))

is_nan(x)[source]¶

Test if values of a column are nan

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = range(3))
>>> df.filter(tp.is_nan(col('x')))

is_not(x)[source]¶

Flip values of a boolean series

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = range(3))
>>> df.filter(tp.is_not(col('x') < 2))

is_not_in(x, y)[source]¶

Test if values of a column are not in a list of values

Parameters:

x (Expr, Series) – Column to operate on
y (list) – List to test against

Examples

>>> df = tp.tibble(x = range(3))
>>> df.filter(tp.is_not_in(col('x'), [1, 2]))

is_not_null(x)[source]¶

Test if values of a column are not null

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = range(3))
>>> df.filter(tp.is_not_null(col('x'), [1, 2]))

is_null(x)[source]¶

Test if values of a column are null

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df = tp.tibble(x = range(3))
>>> df.filter(tp.is_null(col('x')))

as_boolean(x)[source]¶

Convert to a boolean

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(bool_x = tp.as_boolean(col('x')))

as_float(x)[source]¶

Convert to float. Defaults to Float64.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(float_x = tp.as_float(col('x')))

as_integer(x)[source]¶

Convert to integer. Defaults to Int64.

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(int_x = tp.as_integer(col('x')))

as_string(x)[source]¶

Convert to string. Defaults to Utf8.

Parameters:: x (Expr) – Column to operate on

Examples

>>> df.mutate(string_x = tp.as_string(col('x')))

cast(x, dtype)[source]¶

General type conversion.

Parameters:

x (Expr, Series) – Column to operate on
dtype (DataType) – Type to convert to

Examples

>>> df.mutate(float_x = tp.cast(col('x'), tp.Float64))

as_date(x, format=None)[source]¶

Convert a string to a Date

Parameters:

x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”

Examples

>>> df = tp.tibble(x = ['2021-01-01', '2021-10-01'])
>>> df.mutate(date_x = tp.as_date(col('x')))

as_datetime(x, format=None)[source]¶

Convert a string to a Datetime

Parameters:

x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”

Examples

>>> df = tp.tibble(x = ['2021-01-01', '2021-10-01'])
>>> df.mutate(date_x = tp.as_datetime(col('x')))

hour(x)[source]¶

Extract the hour from a datetime

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.as_hour(col('x')))

make_date(year=1970, month=1, day=1)[source]¶

Create a date object

Parameters:

year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal

Examples

>>> df.mutate(date = tp.make_date(2000, 1, 1))

make_datetime(year=1970, month=1, day=1, hour=0, minute=0, second=0)[source]¶

Create a datetime object

Parameters:

year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal
hour (Expr, str, int) – Column or literal
minute (Expr, str, int) – Column or literal
second (Expr, str, int) – Column or literal

Examples

>>> df.mutate(date = tp.make_datetime(2000, 1, 1))

mday(x)[source]¶

Extract the month day from a date from 1 to 31.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(monthday = tp.mday(col('x')))

minute(x)[source]¶

Extract the minute from a datetime

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.minute(col('x')))

month(x)[source]¶

Extract the month from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(year = tp.month(col('x')))

quarter(x)[source]¶

Extract the quarter from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(quarter = tp.quarter(col('x')))

dt_round(x, rule, n)[source]¶

Round the datetime

Parameters:

x (Expr, Series) – Column to operate on
rule (str) –
Units of the downscaling operation. Any of:
- ”month”
- ”week”
- ”day”
- ”hour”
- ”minute”
- ”second”
n (int) – Number of units (e.g. 5 “day”, 15 “minute”.

Examples

>>> df.mutate(monthday = tp.mday(col('x')))

second(x)[source]¶

Extract the second from a datetime

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(hour = tp.minute(col('x')))

wday(x)[source]¶

Extract the weekday from a date from sunday = 1 to saturday = 7.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(weekday = tp.wday(col('x')))

week(x)[source]¶

Extract the week from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(week = tp.week(col('x')))

yday(x)[source]¶

Extract the year day from a date from 1 to 366.

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(yearday = tp.yday(col('x')))

year(x)[source]¶

Extract the year from a date

Parameters:: x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(year = tp.year(col('x')))

col¶

exclude[source]¶

lit[source]¶

Expr[source]¶

Series[source]¶

Int8[source]¶

Int16[source]¶

Int32[source]¶

Int64[source]¶

UInt8[source]¶

UInt16[source]¶

UInt32[source]¶

UInt64[source]¶

Float32[source]¶

Float64[source]¶

Boolean[source]¶

Utf8¶

List[source]¶

Date[source]¶

Datetime[source]¶

Object[source]¶

paste(*args, sep=' ')[source]¶

Concatenate strings together

Parameters:: args (Expr, str) – Columns and or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(x_end = tp.paste(col('x'), 'end', sep = '_'))

paste0(*args)[source]¶

Concatenate strings together with no separator

Parameters:: args (Expr, str) – Columns and or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(xend = tp.paste0(col('x'), 'end'))

str_c(*args, sep='')[source]¶

Concatenate strings together

Parameters:: args (Expr, str) – Columns and/or strings to concatenate

Examples

>>> df = tp.Tibble(x = ['a', 'b', 'c'])
>>> df.mutate(x_end = str_c(col('x'), 'end', sep = '_'))

str_detect(string, pattern, negate=False)[source]¶

Detect the presence or absence of a pattern in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_detect('name', 'a'))
>>> df.mutate(x = str_detect('name', ['a', 'e']))

str_extract(string, pattern)[source]¶

Extract the target capture group from provided patterns

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_extract(col('name'), 'e'))

str_length(string)[source]¶

Length of a string

Parameters:: string (str) – Input series to operate on

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_length(col('name')))

str_remove_all(string, pattern)[source]¶

Removes all matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_remove_all(col('name'), 'a'))

str_remove(string, pattern)[source]¶

Removes the first matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_remove(col('name'), 'a'))

str_replace_all(string, pattern, replacement)[source]¶

Replaces all matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_replace_all(col('name'), 'a', 'A'))

str_replace(string, pattern, replacement)[source]¶

Replaces the first matched patterns in a string

Parameters:

string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_replace(col('name'), 'a', 'A'))

str_ends(string, pattern, negate=False)[source]¶

Detect the presence or absence of a pattern at the end of a string.

Parameters:

string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing'])
>>> df.filter(tp.str_ends(col('words'), 'ing'))

str_starts(string, pattern, negate=False)[source]¶

Detect the presence or absence of a pattern at the beginning of a string.

Parameters:

string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements

Examples

>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing'])
>>> df.filter(tp.str_starts(col('words'), 'a'))

str_sub(string, start=0, end=None)[source]¶

Extract portion of string based on start and end inputs

Parameters:

string (str) – Input series to operate on
start (int) – First position of the character to return
end (int) – Last position of the character to return

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_sub(col('name'), 0, 3))

str_to_lower(string)[source]¶

Convert case of a string

Parameters:: string (str) – Convert case of this string

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_to_lower(col('name')))

str_to_upper(string)[source]¶

Convert case of a string

Parameters:: string (str) – Convert case of this string

Examples

>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape'])
>>> df.mutate(x = str_to_upper(col('name')))

str_trim(string, side='both')[source]¶

Trim whitespace

Parameters:

string (Expr, Series) – Column or series to operate on
side (str) –
One of:
- ”both”
- ”left”
- ”right”

Examples

>>> df = tp.Tibble(x = [' a ', ' b ', ' c '])
>>> df.mutate(x = tp.str_trim(col('x')))

as_tibble(x)[source]¶

Convert an object to a tibble

Parameters:: x ([pl.DataFrame, pd.DataFrame, dict]) – Object to convert to a tibble

Examples

>>> tp.as_tibble(polars_df)

is_tibble(x)[source]¶

Is an object to a tibble

Parameters:: x (object)

Examples

>>> tp.is_tibble(df)

class tibble(_data=None, **kwargs)[source]¶

Bases: tidypolars.reexports.pl.DataFrame

A data frame object that provides methods familiar to R tidyverse users.

__dir__()[source]¶

__repr__()[source]¶: Printing method

_repr_html_()[source]¶

Printing method for jupyter

Output rows and columns can be modified by setting the following ENVIRONMENT variables:

POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows

__copy__()[source]¶

__str__()[source]¶: Printing method

__getattribute__(attr)[source]¶

__getitem__(col)[source]¶

Get part of the DataFrame as a new DataFrame, Series, or scalar.

Parameters:

key –

Rows / columns to select. This is easiest to explain via example. Suppose we have a DataFrame with columns ‘a’, ‘d’, ‘c’, ‘d’. Here is what various types of key would do:

df[0, ‘a’] extracts the first element of column ‘a’ and returns a scalar.
df[0] extracts the first row and returns a Dataframe.
df[‘a’] extracts column ‘a’ and returns a Series.
df[0:2] extracts the first two rows and returns a Dataframe.
df[0:2, ‘a’] extracts the first two rows from column ‘a’ and returns a Series.
df[0:2, 0] extracts the first two rows from the first column and returns a Series.
df[[0, 1], [0, 1, 2]] extracts the first two rows and the first three columns and returns a Dataframe.
df[0: 2, [‘a’, ‘c’]] extracts the first two rows from columns ‘a’ and ‘c’ and returns a Dataframe.
df[:, 0: 2] extracts all rows from the first two columns and returns a Dataframe.
df[:, ‘a’: ‘c’] extracts all rows and all columns positioned between ‘a’ and ‘c’ inclusive and returns a Dataframe. In our example, that would extract columns ‘a’, ‘d’, and ‘c’.

Return type:

DataFrame, Series, or scalar, depending on key.

Examples

>>> df = pl.DataFrame(
...     {"a": [1, 2, 3], "d": [4, 5, 6], "c": [1, 3, 2], "b": [7, 8, 9]}
... )
>>> df[0]
shape: (1, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ d   ┆ c   ┆ b   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   ┆ 7   │
└─────┴─────┴─────┴─────┘
>>> df[0, "a"]
1
>>> df["a"]
shape: (3,)
Series: 'a' [i64]
[
    1
    2
    3
]
>>> df[0:2]
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ d   ┆ c   ┆ b   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   ┆ 7   │
│ 2   ┆ 5   ┆ 3   ┆ 8   │
└─────┴─────┴─────┴─────┘
>>> df[0:2, "a"]
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> df[0:2, 0]
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> df[[0, 1], [0, 1, 2]]
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ d   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   │
│ 2   ┆ 5   ┆ 3   │
└─────┴─────┴─────┘
>>> df[0:2, ["a", "c"]]
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ c   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 2   ┆ 3   │
└─────┴─────┘
>>> df[:, 0:2]
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ d   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 5   │
│ 3   ┆ 6   │
└─────┴─────┘
>>> df[:, "a":"c"]
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ d   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 1   │
│ 2   ┆ 5   ┆ 3   │
│ 3   ┆ 6   ┆ 2   │
└─────┴─────┴─────┘

arrange(*args)[source]¶

Arrange/sort rows

Parameters:: *args (str) – Columns to sort by

Examples

>>> df = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> # Arrange in ascending order
>>> df.arrange('x', 'y')
...
>>> # Arrange some columns descending
>>> df.arrange(tp.desc('x'), 'y')

as_dict(*, as_series=True)[source]¶

Aggregate data with summary statistics

Parameters:: as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists

Examples

>>> df.to_dict()
>>> df.to_dict(as_series = False)

as_pandas()[source]¶

Convert to a pandas DataFrame

Examples

>>> df.as_pandas()

as_polars()[source]¶

Convert to a polars DataFrame

Examples

>>> df.as_polars()

bind_cols(*args)[source]¶

Bind data frames by columns

Parameters:: df (tibble) – Data frame to bind

Examples

>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)})
>>> df1.bind_cols(df2)

bind_rows(*args)[source]¶

Bind data frames by row

Parameters:: *args (tibble, list) – Data frames to bind by row

Examples

>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
>>> df2 = tp.tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)})
>>> df1.bind_rows(df2)

clone()[source]¶: Very cheap deep clone

count(*args, sort=False, name='n')[source]¶

Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.

Parameters:

*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.

Examples

>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.count()
>>> df.count('b')

distinct(*args)[source]¶

Select distinct/unique rows

Parameters:: *args (str, Expr) – Columns to find distinct/unique rows

Examples

>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.distinct()
>>> df.distinct('b')

drop(*args)[source]¶

Drop unwanted columns

Parameters:: *args (str) – Columns to drop

Examples

>>> df.drop('x', 'y')

drop_null(*args)[source]¶

Drop rows containing missing values

Parameters:: *args (str) – Columns to drop nulls from (defaults to all)

Examples

>>> df = tp.tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)}
>>> df.drop_null()
>>> df.drop_null('x', 'y')

equals(other, null_equal=True)[source]¶: Check if two tibbles are equal

glimpse()[source]¶

Return a dense preview of the DataFrame.

The formatting shows one line per column so that wide dataframes display cleanly. Each line shows the column name, the data type, and the first few values.

fill(*args, direction='down', _by=None)[source]¶

Fill in missing values with previous or next value

Parameters:

*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': [1, None, 3, 4, 5],
...                 'b': [None, 2, None, None, 5],
...                 'groups': ['a', 'a', 'a', 'b', 'b']})
>>> df.fill('a', 'b')
>>> df.fill('a', 'b', by = 'groups')
>>> df.fill('a', 'b', direction = 'downup')

filter(*args, _by=None)[source]¶

Filter rows on one or more conditions

Parameters:

*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']})
>>> df.filter(col('a') < 2, col('b') == 'a')
>>> df.filter((col('a') < 2) & (col('b') == 'a'))
>>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')

full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶

Perform an full join

Parameters:

df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.full_join(df2)
>>> df1.full_join(df2, on = 'x')
>>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')

head(n=5, *, _by=None)[source]¶: Alias for .slice_head()

inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶

Perform an inner join

Parameters:

df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.inner_join(df2)
>>> df1.inner_join(df2, on = 'x')
>>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')

left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶

Perform a left join

Parameters:

df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.

Examples

>>> df1.left_join(df2)
>>> df1.left_join(df2, on = 'x')
>>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')

mutate(*args, _by=None, **kwargs)[source]¶

Add or modify columns

Parameters:

*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']})
>>> df.mutate(double_a = col('a') * 2,
...           a_plus_b = col('a') + col('b'))
>>> df.mutate(row_num = row_number(), by = 'c')

pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶

Pivot data from wide to long

Parameters:

cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column

Examples

>>> df = tp.tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]})
>>> df.pivot_longer(cols = ['a', 'b'])
>>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')

pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶

Pivot data from long to wide

Parameters:

names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”

Examples

>>> df = tp.tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]})
>>> df.pivot_wider(names_from = 'variable', values_from = 'value')

print()[source]¶

pull(var=None)[source]¶

Extract a column as a series

Parameters:: var (str) – Name of the column to extract. Defaults to the last column.

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3))
>>> df.pull('a')

relocate(*args, _before=None, _after=None)[source]¶

Move a column or columns to a new position

Parameters:: *args (str, Expr) – Columns to move

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.relocate('a', before = 'c')
>>> df.relocate('b', after = 'c')

rename(_mapping=None, **kwargs)[source]¶

Rename columns

Parameters:

_mapping (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name

Examples

>>> df = tp.tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']})
>>> df.rename(new_x = 'x') # dplyr interface
>>> df.rename({'x': 'new_x'}) # pandas interface

replace_null(replace=None)[source]¶

Replace null values

Parameters:: replace (dict) – Dictionary of column/replacement pairs

Examples

>>> df = tp.tibble(x = [0, None], y = [None, None])
>>> df.replace_null(dict(x = 1, y = 2))

separate(sep_col, into, sep='_', remove=True)[source]¶

Separate a character column into multiple columns

Parameters:

sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame

Examples

>>> df = tp.tibble(x = ['a_a', 'b_b', 'c_c'])
>>> df.separate('x', into = ['left', 'right'])

set_names(nm=None)[source]¶

Change the column names of the data frame

Parameters:: nm (list) – A list of new names for the data frame

Examples

>>> df = tp.tibble(x = range(3), y = range(3))
>>> df.set_names(['a', 'b'])

select(*args)[source]¶

Select or drop columns

Parameters:: *args (str, Expr) – Columns to select

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select('a', 'b')
>>> df.select(col('a'), col('b'))

slice(*args, _by=None)[source]¶

Grab rows from a data frame

Parameters:

*args (int, list) – Rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice(0, 1)
>>> df.slice(0, by = 'c')

slice_head(n=5, *, _by=None)[source]¶

Grab top rows from a data frame

Parameters:

n (int) – Number of rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_head(2)
>>> df.slice_head(1, by = 'c')

slice_tail(n=5, *, _by=None)[source]¶

Grab bottom rows from a data frame

Parameters:

n (int) – Number of rows to grab
by (str, list) – Columns to group by

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.slice_tail(2)
>>> df.slice_tail(1, by = 'c')

summarise(*args, _by=None, **kwargs)[source]¶: Alias for .summarize()

summarize(*args, _by=None, **kwargs)[source]¶

Aggregate data with summary statistics

Parameters:

*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.summarize(avg_a = tp.mean(col('a')))
>>> df.summarize(avg_a = tp.mean(col('a')),
...              by = 'c')
>>> df.summarize(avg_a = tp.mean(col('a')),
...              max_b = tp.max(col('b')))

tail(n=5, *, _by=None)[source]¶: Alias for .slice_tail()

unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶

Unite multiple columns by pasting strings together

Parameters:

col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame

Examples

>>> df = tp.tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3))
>>> df.unite("united_col", unite_cols = ["a", "b"])

write_csv(file=None, has_headers=True, sep=',')[source]¶: Write a data frame to a csv

write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]¶: Write a data frame to a parquet

property names¶

Get column names

Examples

>>> df.names

property ncol¶

Get number of columns

Examples

>>> df.ncol

property nrow¶

Get number of rows

Examples

>>> df.nrow

property plot¶

Access to polars plotting

Examples

>>> df.plot

desc(x)[source]¶: Mark a column to order in descending

from_pandas(df)[source]¶

Convert from pandas DataFrame to tibble

Parameters:: df (DataFrame) – pd.DataFrame to convert to a tibble

Examples

>>> tp.from_pandas(df)

from_polars(df)[source]¶

Convert from polars DataFrame to tibble

Parameters:: df (DataFrame) – pl.DataFrame to convert to a tibble

Examples

>>> tp.from_polars(df)

contains(match, ignore_case=True)[source]¶

Contains a literal string

Parameters:

match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select(contains('c'))

ends_with(match, ignore_case=True)[source]¶

Ends with a suffix

Parameters:

match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.tibble({'a': range(3), 'b_code': range(3), 'c_code': ['a', 'a', 'b']})
>>> df.select(ends_with('code'))

everything()[source]¶

Selects all columns

Examples

>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
>>> df.select(everything())

starts_with(match, ignore_case=True)[source]¶

Starts with a prefix

Parameters:

match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.

Examples

>>> df = tp.tibble({'a': range(3), 'add': range(3), 'sub': ['a', 'a', 'b']})
>>> df.select(starts_with('a'))

where(col_type)[source]¶

Select columns by type using a string

Options:: date, datetime, float, integer, numeric, string

Examples

>>> df.select(tp.where("integer"))

__all__¶