tidypolars
¶
Submodules¶
Package Contents¶
Classes¶
A data frame object that provides methods familiar to R tidyverse users. |
Functions¶
|
Absolute value |
|
Apply a function across a selection of columns |
|
Case when |
|
Coalesce missing values |
|
Round numbers down to the lower integer |
|
If Else |
|
Get lagging values |
|
Get leading values |
|
Compute the natural logarithm of a column |
|
Compute the base 10 logarithm of a column |
|
Simple wrapper around polars.read_csv |
|
Simple wrapper around polars.read_parquet |
|
Replicate the values in x |
|
Replace null values |
|
Get column standard deviation |
Return row number |
|
|
Get column square root |
|
Find the correlation of two columns |
|
Find the covariance of two columns |
|
Number of observations in each group |
|
Get first value |
|
Get last value |
|
Number of observations in each group |
|
Get column max |
|
Get column mean |
|
Get column median |
|
Get column minimum |
|
Number of observations in each group |
|
Get number of distinct values in a column |
|
Get number of distinct values in a column |
|
Get column standard deviation |
|
Get column sum |
|
Get column variance |
|
Test if values of a column are between two values |
|
Test if values of a column are finite |
|
Test if values of a column are in a list of values |
|
Test if values of a column are infinite |
|
Test if values of a column are nan |
|
Flip values of a boolean series |
|
Test if values of a column are not in a list of values |
|
Test if values of a column are not null |
|
Test if values of a column are null |
|
Convert to a boolean |
|
Convert to float. Defaults to Float64. |
|
Convert to integer. Defaults to Int64. |
|
Convert to string. Defaults to Utf8. |
|
General type conversion. |
|
Convert a string to a Date |
|
Convert a string to a Datetime |
|
Extract the hour from a datetime |
|
Create a date object |
|
Create a datetime object |
|
Extract the month day from a date from 1 to 31. |
|
Extract the minute from a datetime |
|
Extract the month from a date |
|
Extract the quarter from a date |
|
Round the datetime |
|
Extract the second from a datetime |
|
Extract the weekday from a date from sunday = 1 to saturday = 7. |
|
Extract the week from a date |
|
Extract the year day from a date from 1 to 366. |
|
Extract the year from a date |
|
Concatenate strings together |
|
Concatenate strings together with no separator |
|
Concatenate strings together |
|
Detect the presence or absence of a pattern in a string |
|
Extract the target capture group from provided patterns |
|
Length of a string |
|
Removes all matched patterns in a string |
|
Removes the first matched patterns in a string |
|
Replaces all matched patterns in a string |
|
Replaces the first matched patterns in a string |
|
Detect the presence or absence of a pattern at the end of a string. |
|
Detect the presence or absence of a pattern at the beginning of a string. |
|
Extract portion of string based on start and end inputs |
|
Convert case of a string |
|
Convert case of a string |
|
Trim whitespace |
|
Mark a column to order in descending |
|
Convert from pandas DataFrame to Tibble |
|
Convert from polars DataFrame to Tibble |
|
Contains a literal string |
|
Ends with a suffix |
Selects all columns |
|
|
Starts with a prefix |
Attributes¶
- __version__¶
- abs(x)[source]¶
Absolute value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(abs_x = tp.abs('x')) >>> df.mutate(abs_x = tp.abs(col('x')))
- across(cols, fn=lambda x: ..., names_prefix=None)[source]¶
Apply a function across a selection of columns
- Parameters:
cols (list) – Columns to operate on
fn (lambda) – A function or lambda to apply to each column
names_prefix (Optional - str) – Prefix to append to changed columns
Examples
>>> df = tp.Tibble(x = ['a', 'a', 'b'], y = range(3), z = range(3)) >>> df.mutate(across(['y', 'z'], lambda x: x * 2)) >>> df.mutate(across(tp.Int64, lambda x: x * 2, names_prefix = "double_")) >>> df.summarize(across(['y', 'z'], tp.mean), by = 'x')
- case_when(expr)[source]¶
Case when
- Parameters:
expr (Expr) – A logical expression
Examples
>>> df = tp.Tibble(x = range(1, 4)) >>> df.mutate( >>> case_x = tp.case_when(col('x') < 2).then(1) >>> .when(col('x') < 3).then(2) >>> .otherwise(0) >>> )
- coalesce(*args)[source]¶
Coalesce missing values
- Parameters:
args (Expr) – Columns to coalesce
Examples
>>> df.mutate(abs_x = tp.cast(col('x'), tp.Float64))
- floor(x)[source]¶
Round numbers down to the lower integer
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(floor_x = tp.floor(col('x')))
- if_else(condition, true, false)[source]¶
If Else
- Parameters:
condition (Expr) – A logical expression
true – Value if the condition is true
false – Value if the condition is false
Examples
>>> df = tp.Tibble(x = range(1, 4)) >>> df.mutate(if_x = tp.if_else(col('x') < 2, 1, 2))
- lag(x, n: int = 1, default=None)[source]¶
Get lagging values
- Parameters:
x (Expr, Series) – Column to operate on
n (int) – Number of positions to lag by
default (optional) – Value to fill in missing values
Examples
>>> df.mutate(lag_x = tp.lag(col('x'))) >>> df.mutate(lag_x = tp.lag('x'))
- lead(x, n: int = 1, default=None)[source]¶
Get leading values
- Parameters:
x (Expr, Series) – Column to operate on
n (int) – Number of positions to lead by
default (optional) – Value to fill in missing values
Examples
>>> df.mutate(lead_x = tp.lead(col('x'))) >>> df.mutate(lead_x = col('x').lead())
- log(x)[source]¶
Compute the natural logarithm of a column
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(log = tp.log('x'))
- log10(x)[source]¶
Compute the base 10 logarithm of a column
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(log = tp.log10('x'))
- rep(x, times=1)[source]¶
Replicate the values in x
- Parameters:
x (const, Series) – Value or Series to repeat
times (int) – Number of times to repeat
Examples
>>> tp.rep(1, 3) >>> tp.rep(pl.Series(range(3)), 3)
- replace_null(x, replace=None)[source]¶
Replace null values
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = [0, None], y = [None, None]) >>> df.mutate(x = tp.replace_null(col('x'), 1))
- round(x, decimals=0)[source]¶
Get column standard deviation
- Parameters:
x (Expr, Series) – Column to operate on
decimals (int) – Decimals to round to
Examples
>>> df.mutate(x = tp.round(col('x')))
- sqrt(x)[source]¶
Get column square root
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(sqrt_x = tp.sqrt('x'))
- cor(x, y, method='pearson')[source]¶
Find the correlation of two columns
- Parameters:
x (Expr) – A column
y (Expr) – A column
method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.
Examples
>>> df.summarize(cor = tp.cor(col('x'), col('y')))
- cov(x, y)[source]¶
Find the covariance of two columns
- Parameters:
x (Expr) – A column
y (Expr) – A column
Examples
>>> df.summarize(cor = tp.cov(col('x'), col('y')))
- count(x)[source]¶
Number of observations in each group
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(count = tp.count(col('x')))
- first(x)[source]¶
Get first value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(first_x = tp.first('x')) >>> df.summarize(first_x = tp.first(col('x')))
- last(x)[source]¶
Get last value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(last_x = tp.last('x')) >>> df.summarize(last_x = tp.last(col('x')))
- length(x)[source]¶
Number of observations in each group
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(length = tp.length(col('x')))
- max(x)[source]¶
Get column max
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(max_x = tp.max('x')) >>> df.summarize(max_x = tp.max(col('x')))
- mean(x)[source]¶
Get column mean
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(mean_x = tp.mean('x')) >>> df.summarize(mean_x = tp.mean(col('x')))
- median(x)[source]¶
Get column median
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(median_x = tp.median('x')) >>> df.summarize(median_x = tp.median(col('x')))
- min(x)[source]¶
Get column minimum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(min_x = tp.min('x')) >>> df.summarize(min_x = tp.min(col('x')))
- n_distinct(x)[source]¶
Get number of distinct values in a column
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(min_x = tp.n_distinct('x')) >>> df.summarize(min_x = tp.n_distinct(col('x')))
- quantile(x, quantile=0.5)[source]¶
Get number of distinct values in a column
- Parameters:
x (Expr, Series) – Column to operate on
quantile (float) – Quantile to return
Examples
>>> df.summarize(quantile_x = tp.quantile('x', .25))
- sd(x)[source]¶
Get column standard deviation
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(sd_x = tp.sd('x')) >>> df.summarize(sd_x = tp.sd(col('x')))
- sum(x)[source]¶
Get column sum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(sum_x = tp.sum('x')) >>> df.summarize(sum_x = tp.sum(col('x')))
- var(x)[source]¶
Get column variance
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.summarize(sum_x = tp.var('x')) >>> df.summarize(sum_x = tp.var(col('x')))
- between(x, left, right)[source]¶
Test if values of a column are between two values
- Parameters:
x (Expr, Series) – Column to operate on
left (int) – Value to test if column is greater than or equal to
right (int) – Value to test if column is less than or equal to
Examples
>>> df = tp.Tibble(x = range(4)) >>> df.filter(tp.between(col('x'), 1, 3))
- is_finite(x)[source]¶
Test if values of a column are finite
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = [1.0, float('inf')]) >>> df.filter(tp.is_finite(col('x')))
- is_in(x, y)[source]¶
Test if values of a column are in a list of values
- Parameters:
x (Expr, Series) – Column to operate on
y (list) – List to test against
Examples
>>> df = tp.Tibble(x = range(3)) >>> df.filter(tp.is_in(col('x'), [1, 2]))
- is_infinite(x)[source]¶
Test if values of a column are infinite
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = [1.0, float('inf')]) >>> df.filter(tp.is_infinite(col('x')))
- is_nan(x)[source]¶
Test if values of a column are nan
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = range(3)) >>> df.filter(tp.is_nan(col('x')))
- is_not(x)[source]¶
Flip values of a boolean series
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = range(3)) >>> df.filter(tp.is_not(col('x') < 2))
- is_not_in(x, y)[source]¶
Test if values of a column are not in a list of values
- Parameters:
x (Expr, Series) – Column to operate on
y (list) – List to test against
Examples
>>> df = tp.Tibble(x = range(3)) >>> df.filter(tp.is_not_in(col('x'), [1, 2]))
- is_not_null(x)[source]¶
Test if values of a column are not null
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = range(3)) >>> df.filter(tp.is_not_in(col('x'), [1, 2]))
- is_null(x)[source]¶
Test if values of a column are null
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.Tibble(x = range(3)) >>> df.filter(tp.is_not_in(col('x'), [1, 2]))
- as_boolean(x)[source]¶
Convert to a boolean
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(bool_x = tp.as_boolean(col('x')))
- as_float(x)[source]¶
Convert to float. Defaults to Float64.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(float_x = tp.as_float(col('x')))
- as_integer(x)[source]¶
Convert to integer. Defaults to Int64.
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(int_x = tp.as_integer(col('x')))
- as_string(x)[source]¶
Convert to string. Defaults to Utf8.
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(string_x = tp.as_string(col('x')))
- cast(x, dtype)[source]¶
General type conversion.
- Parameters:
x (Expr, Series) – Column to operate on
dtype (DataType) – Type to convert to
Examples
>>> df.mutate(abs_x = tp.cast(col('x'), tp.Float64))
- as_date(x, fmt=None)[source]¶
Convert a string to a Date
- Parameters:
x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”
Examples
>>> df = tp.Tibble(x = ['2021-01-01', '2021-10-01']) >>> df.mutate(date_x = tp.as_date(col('x')))
- as_datetime(x, fmt=None)[source]¶
Convert a string to a Datetime
- Parameters:
x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”
Examples
>>> df = tp.Tibble(x = ['2021-01-01', '2021-10-01']) >>> df.mutate(date_x = tp.as_datetime(col('x')))
- hour(x)[source]¶
Extract the hour from a datetime
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(hour = tp.as_hour(col('x')))
- make_date(year=1970, month=1, day=1)[source]¶
Create a date object
- Parameters:
year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal
Examples
>>> df.mutate(date = tp.make_date(2000, 1, 1))
- make_datetime(year=1970, month=1, day=1, hour=0, minute=0, second=0)[source]¶
Create a datetime object
- Parameters:
year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal
hour (Expr, str, int) – Column or literal
minute (Expr, str, int) – Column or literal
second (Expr, str, int) – Column or literal
Examples
>>> df.mutate(date = tp.make_datetime(2000, 1, 1))
- mday(x)[source]¶
Extract the month day from a date from 1 to 31.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(monthday = tp.mday(col('x')))
- minute(x)[source]¶
Extract the minute from a datetime
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(hour = tp.minute(col('x')))
- month(x)[source]¶
Extract the month from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(year = tp.month(col('x')))
- quarter(x)[source]¶
Extract the quarter from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(quarter = tp.quarter(col('x')))
- dt_round(x, rule, n)[source]¶
Round the datetime
- Parameters:
x (Expr, Series) – Column to operate on
rule (str) –
Units of the downscaling operation. Any of:
”month”
”week”
”day”
”hour”
”minute”
”second”
n (int) – Number of units (e.g. 5 “day”, 15 “minute”.
Examples
>>> df.mutate(monthday = tp.mday(col('x')))
- second(x)[source]¶
Extract the second from a datetime
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(hour = tp.minute(col('x')))
- wday(x)[source]¶
Extract the weekday from a date from sunday = 1 to saturday = 7.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(weekday = tp.wday(col('x')))
- week(x)[source]¶
Extract the week from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(week = tp.week(col('x')))
- yday(x)[source]¶
Extract the year day from a date from 1 to 366.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(yearday = tp.yday(col('x')))
- year(x)[source]¶
Extract the year from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(year = tp.year(col('x')))
- paste(*args, sep=' ')[source]¶
Concatenate strings together
- Parameters:
args (Expr, str) – Columns and or strings to concatenate
Examples
>>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(x_end = tp.paste(col('x'), 'end', sep = '_'))
- paste0(*args)[source]¶
Concatenate strings together with no separator
- Parameters:
args (Expr, str) – Columns and or strings to concatenate
Examples
>>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(xend = tp.paste0(col('x'), 'end'))
- str_c(*args, sep='')[source]¶
Concatenate strings together
- Parameters:
args (Expr, str) – Columns and/or strings to concatenate
Examples
>>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(x_end = str_c(col('x'), 'end', sep = '_'))
- str_detect(string, pattern, negate=False)[source]¶
Detect the presence or absence of a pattern in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_detect('name', 'a')) >>> df.mutate(x = str_detect('name', ['a', 'e']))
- str_extract(string, pattern)[source]¶
Extract the target capture group from provided patterns
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_extract(col('name'), 'e'))
- str_length(string)[source]¶
Length of a string
- Parameters:
string (str) – Input series to operate on
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_length(col('name')))
- str_remove_all(string, pattern)[source]¶
Removes all matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_remove_all(col('name'), 'a'))
- str_remove(string, pattern)[source]¶
Removes the first matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_remove(col('name'), 'a'))
- str_replace_all(string, pattern, replacement)[source]¶
Replaces all matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_replace_all(col('name'), 'a', 'A'))
- str_replace(string, pattern, replacement)[source]¶
Replaces the first matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_replace(col('name'), 'a', 'A'))
- str_ends(string, pattern, negate=False)[source]¶
Detect the presence or absence of a pattern at the end of a string.
- Parameters:
string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements
Examples
>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing']) >>> df.filter(tp.str_ends(col('words'), 'ing'))
- str_starts(string, pattern, negate=False)[source]¶
Detect the presence or absence of a pattern at the beginning of a string.
- Parameters:
string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements
Examples
>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing']) >>> df.filter(tp.str_starts(col('words'), 'a'))
- str_sub(string, start=0, end=None)[source]¶
Extract portion of string based on start and end inputs
- Parameters:
string (str) – Input series to operate on
start (int) – First position of the character to return
end (int) – Last position of the character to return
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_sub(col('name'), 0, 3))
- str_to_lower(string)[source]¶
Convert case of a string
- Parameters:
string (str) – Convert case of this string
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_to_lower(col('name')))
- str_to_upper(string)[source]¶
Convert case of a string
- Parameters:
string (str) – Convert case of this string
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_to_upper(col('name')))
- str_trim(string, side='both')[source]¶
Trim whitespace
- Parameters:
string (Expr, Series) – Column or series to operate on
side (str) –
- One of:
”both”
”left”
”right”
Examples
>>> df = tp.Tibble(x = [' a ', ' b ', ' c ']) >>> df.mutate(x = tp.str_trim(col('x')))
- class Tibble(_data=None, **kwargs)[source]¶
Bases:
tidypolars.reexports.pl.DataFrame
A data frame object that provides methods familiar to R tidyverse users.
- property names¶
Get column names
Examples
>>> df.names
- property ncol¶
Get number of columns
Examples
>>> df.ncol
- property nrow¶
Get number of rows
Examples
>>> df.nrow
- _repr_html_()[source]¶
Printing method for jupyter
Output rows and columns can be modified by setting the following ENVIRONMENT variables:
POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows
- arrange(*args)[source]¶
Arrange/sort rows
- Parameters:
*args (str) – Columns to sort by
Examples
>>> df = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> # Arrange in ascending order >>> df.arrange('x', 'y') ... >>> # Arrange some columns descending >>> df.arrange(tp.desc('x'), 'y')
- bind_cols(*args)[source]¶
Bind data frames by columns
- Parameters:
df (Tibble) – Data frame to bind
Examples
>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.Tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)}) >>> df1.bind_cols(df2)
- bind_rows(*args)[source]¶
Bind data frames by row
- Parameters:
*args (Tibble, list) – Data frames to bind by row
Examples
>>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.Tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)}) >>> df1.bind_rows(df2)
- count(*args, sort=False, name='n')[source]¶
Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.
- Parameters:
*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.
Examples
>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.count() >>> df.count('b')
- distinct(*args)[source]¶
Select distinct/unique rows
- Parameters:
*args (str, Expr) – Columns to find distinct/unique rows
Examples
>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.distinct() >>> df.distinct('b')
- drop(*args)[source]¶
Drop unwanted columns
- Parameters:
*args (str) – Columns to drop
Examples
>>> df.drop('x', 'y')
- drop_null(*args)[source]¶
Drop rows containing missing values
- Parameters:
*args (str) – Columns to drop nulls from (defaults to all)
Examples
>>> df = tp.Tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)} >>> df.drop_null() >>> df.drop_null('x', 'y')
- fill(*args, direction='down', by=None)[source]¶
Fill in missing values with previous or next value
- Parameters:
*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': [1, None, 3, 4, 5], ... 'b': [None, 2, None, None, 5], ... 'groups': ['a', 'a', 'a', 'b', 'b']}) >>> df.fill('a', 'b') >>> df.fill('a', 'b', by = 'groups') >>> df.fill('a', 'b', direction = 'downup')
- filter(*args, by=None)[source]¶
Filter rows on one or more conditions
- Parameters:
*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.filter(col('a') < 2, col('b') == 'a') >>> df.filter((col('a') < 2) & (col('b') == 'a')) >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')
- inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform an inner join
- Parameters:
df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.inner_join(df2) >>> df1.inner_join(df2, on = 'x') >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')
- left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform a left join
- Parameters:
df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.left_join(df2) >>> df1.left_join(df2, on = 'x') >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')
- mutate(*args, by=None, **kwargs)[source]¶
Add or modify columns
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']}) >>> df.mutate(double_a = col('a') * 2, ... a_plus_b = col('a') + col('b')) >>> df.mutate(row_num = row_number(), by = 'c')
- full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶
Perform an full join
- Parameters:
df (Tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.full_join(df2) >>> df1.full_join(df2, on = 'x') >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')
- pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶
Pivot data from wide to long
- Parameters:
cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column
Examples
>>> df = tp.Tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]}) >>> df.pivot_longer(cols = ['a', 'b']) >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')
- pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶
Pivot data from long to wide
- Parameters:
names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”
Examples
>>> df = tp.Tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]}) >>> df.pivot_wider(names_from = 'variable', values_from = 'value')
- pull(var=None)[source]¶
Extract a column as a series
- Parameters:
var (str) – Name of the column to extract. Defaults to the last column.
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3)) >>> df.pull('a')
- relocate(*args, before=None, after=None)[source]¶
Move a column or columns to a new position
- Parameters:
*args (str, Expr) – Columns to move
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.relocate('a', before = 'c') >>> df.relocate('b', after = 'c')
- rename(*args, **kwargs)[source]¶
Rename columns
- Parameters:
*args (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name
Examples
>>> df = tp.Tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']}) >>> df.rename(new_x = 'x') # dplyr interface >>> df.rename({'x': 'new_x'}) # pandas interface
- replace_null(replace=None)[source]¶
Replace null values
- Parameters:
replace (dict) – Dictionary of column/replacement pairs
Examples
>>> df = tp.Tibble(x = [0, None], y = [None, None]) >>> df.replace_null(dict(x = 1, y = 2))
- separate(sep_col, into, sep='_', remove=True)[source]¶
Separate a character column into multiple columns
- Parameters:
sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame
Examples
>>> df = tp.Tibble(x = ['a_a', 'b_b', 'c_c']) >>> df.separate('x', into = ['left', 'right'])
- set_names(nm=None)[source]¶
Change the column names of the data frame
- Parameters:
nm (list) – A list of new names for the data frame
Examples
>>> df = tp.Tibble(x = range(3), y = range(3)) >>> df.set_names(['a', 'b'])
- select(*args)[source]¶
Select or drop columns
- Parameters:
*args (str, Expr) – Columns to select
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select('a', 'b') >>> df.select(col('a'), col('b'))
- slice(*args, by=None)[source]¶
Grab rows from a data frame
- Parameters:
*args (int, list) – Rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice(0, 1) >>> df.slice(0, by = 'c')
- slice_head(n=5, *, by=None)[source]¶
Grab top rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_head(2) >>> df.slice_head(1, by = 'c')
- slice_tail(n=5, *, by=None)[source]¶
Grab bottom rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_tail(2) >>> df.slice_tail(1, by = 'c')
- summarize(*args, by=None, **kwargs)[source]¶
Aggregate data with summary statistics
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.summarize(avg_a = tp.mean(col('a'))) >>> df.summarize(avg_a = tp.mean(col('a')), ... by = 'c') >>> df.summarize(avg_a = tp.mean(col('a')), ... max_b = tp.max(col('b')))
- to_dict(as_series=True)[source]¶
Aggregate data with summary statistics
- Parameters:
as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists
Examples
>>> df.to_dict() >>> df.to_dict(as_series = False)
- unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶
Unite multiple columns by pasting strings together
- Parameters:
col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame
Examples
>>> df = tp.Tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3)) >>> df.unite("united_col", unite_cols = ["a", "b"])
- from_pandas(df)[source]¶
Convert from pandas DataFrame to Tibble
- Parameters:
df (DataFrame) – pd.DataFrame to convert to a Tibble
Examples
>>> tp.from_pandas(df)
- from_polars(df)[source]¶
Convert from polars DataFrame to Tibble
- Parameters:
df (DataFrame) – pl.DataFrame to convert to a Tibble
Examples
>>> tp.from_polars(df)
- contains(match, ignore_case=True)[source]¶
Contains a literal string
- Parameters:
match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select(contains('c'))
- ends_with(match, ignore_case=True)[source]¶
Ends with a suffix
- Parameters:
match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.
Examples
>>> df = tp.Tibble({'a': range(3), 'b_code': range(3), 'c_code': ['a', 'a', 'b']}) >>> df.select(ends_with('code'))
- everything()[source]¶
Selects all columns
Examples
>>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select(everything())
- starts_with(match, ignore_case=True)[source]¶
Starts with a prefix
- Parameters:
match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.
Examples
>>> df = tp.Tibble({'a': range(3), 'add': range(3), 'sub': ['a', 'a', 'b']}) >>> df.select(starts_with('a'))
- __all__¶