tidypolars¶
Submodules¶
Attributes¶
Classes¶
A data frame object that provides methods familiar to R tidyverse users. |
Functions¶
|
Absolute value |
|
Apply a function across a selection of columns |
|
Case when |
|
Coalesce missing values |
|
Round numbers down to the lower integer |
|
If Else |
|
Get lagging values |
|
Get leading values |
|
Compute the natural logarithm of a column |
|
Compute the base 10 logarithm of a column |
|
Simple wrapper around polars.read_csv |
|
Simple wrapper around polars.read_parquet |
|
Replicate the values in x |
|
Replace null values |
|
Get column standard deviation |
Return row number |
|
|
Get column square root |
|
Find the correlation of two columns |
|
Find the covariance of two columns |
|
Number of observations in each group |
|
Get first value |
|
Get last value |
|
Number of observations in each group |
|
Get column max |
|
Get column mean |
|
Get column median |
|
Get column minimum |
|
Number of observations in each group |
|
Get number of distinct values in a column |
|
Get number of distinct values in a column |
|
Get column standard deviation |
|
Get column sum |
|
Get column variance |
|
Test if values of a column are between two values |
|
Test if values of a column are finite |
|
Test if values of a column are in a list of values |
|
Test if values of a column are infinite |
|
Test if values of a column are nan |
|
Flip values of a boolean series |
|
Test if values of a column are not in a list of values |
|
Test if values of a column are not null |
|
Test if values of a column are null |
|
Convert to a boolean |
|
Convert to float. Defaults to Float64. |
|
Convert to integer. Defaults to Int64. |
|
Convert to string. Defaults to Utf8. |
|
General type conversion. |
|
Convert a string to a Date |
|
Convert a string to a Datetime |
|
Extract the hour from a datetime |
|
Create a date object |
|
Create a datetime object |
|
Extract the month day from a date from 1 to 31. |
|
Extract the minute from a datetime |
|
Extract the month from a date |
|
Extract the quarter from a date |
|
Round the datetime |
|
Extract the second from a datetime |
|
Extract the weekday from a date from sunday = 1 to saturday = 7. |
|
Extract the week from a date |
|
Extract the year day from a date from 1 to 366. |
|
Extract the year from a date |
|
Concatenate strings together |
|
Concatenate strings together with no separator |
|
Concatenate strings together |
|
Detect the presence or absence of a pattern in a string |
|
Extract the target capture group from provided patterns |
|
Length of a string |
|
Removes all matched patterns in a string |
|
Removes the first matched patterns in a string |
|
Replaces all matched patterns in a string |
|
Replaces the first matched patterns in a string |
|
Detect the presence or absence of a pattern at the end of a string. |
|
Detect the presence or absence of a pattern at the beginning of a string. |
|
Extract portion of string based on start and end inputs |
|
Convert case of a string |
|
Convert case of a string |
|
Trim whitespace |
|
Convert an object to a tibble |
|
Is an object to a tibble |
|
Mark a column to order in descending |
|
Convert from pandas DataFrame to tibble |
|
Convert from polars DataFrame to tibble |
|
Contains a literal string |
|
Ends with a suffix |
Selects all columns |
|
|
Starts with a prefix |
|
Select columns by type using a string |
Package Contents¶
- __version__¶
- abs(x)[source]¶
Absolute value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(abs_x = tp.abs('x')) >>> df.mutate(abs_x = tp.abs(col('x')))
- across(cols, fn=lambda x: ..., names_prefix=None)[source]¶
Apply a function across a selection of columns
- Parameters:
cols (list) – Columns to operate on
fn (lambda) – A function or lambda to apply to each column
names_prefix (Optional - str) – Prefix to append to changed columns
Examples
>>> df = tp.tibble(x = ['a', 'a', 'b'], y = range(3), z = range(3)) >>> df.mutate(across(['y', 'z'], lambda x: x * 2)) >>> df.mutate(across(tp.Int64, lambda x: x * 2, names_prefix = "double_")) >>> df.summarize(across(['y', 'z'], tp.mean), by = 'x')
- case_when(*args, _default=pl.Null)[source]¶
Case when
- Parameters:
expr (Expr) – A logical expression
Examples
>>> df = tp.tibble(x = range(1, 4)) >>> df.mutate( >>> case_x = tp.case_when(col('x') < 2, 1, >>> col('x') < 3, 2, >>> _default = 0) >>> )
- coalesce(*args)[source]¶
Coalesce missing values
- Parameters:
args (Expr) – Columns to coalesce
Examples
>>> df.mutate(coalesce_xy = tp.coalesce(col('x'), col('y')))
- floor(x)[source]¶
Round numbers down to the lower integer
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(floor_x = tp.floor(col('x')))
- if_else(condition, true, false)[source]¶
If Else
- Parameters:
condition (Expr) – A logical expression
true – Value if the condition is true
false – Value if the condition is false
Examples
>>> df = tp.tibble(x = range(1, 4)) >>> df.mutate(if_x = tp.if_else(col('x') < 2, 1, 2))
- lag(x, n: int = 1, default=None)[source]¶
Get lagging values
- Parameters:
x (Expr, Series) – Column to operate on
n (int) – Number of positions to lag by
default (optional) – Value to fill in missing values
Examples
>>> df.mutate(lag_x = tp.lag(col('x'))) >>> df.mutate(lag_x = tp.lag('x'))
- lead(x, n: int = 1, default=None)[source]¶
Get leading values
- Parameters:
x (Expr, Series) – Column to operate on
n (int) – Number of positions to lead by
default (optional) – Value to fill in missing values
Examples
>>> df.mutate(lead_x = tp.lead(col('x'))) >>> df.mutate(lead_x = col('x').lead())
- log(x)[source]¶
Compute the natural logarithm of a column
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(log = tp.log('x'))
- log10(x)[source]¶
Compute the base 10 logarithm of a column
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(log = tp.log10('x'))
- rep(x, times=1)[source]¶
Replicate the values in x
- Parameters:
x (const, Series) – Value or Series to repeat
times (int) – Number of times to repeat
Examples
>>> tp.rep(1, 3) >>> tp.rep(pl.Series(range(3)), 3)
- replace_null(x, replace=None)[source]¶
Replace null values
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = [0, None], y = [None, None]) >>> df.mutate(x = tp.replace_null(col('x'), 1))
- round(x, decimals=0)[source]¶
Get column standard deviation
- Parameters:
x (Expr, Series) – Column to operate on
decimals (int) – Decimals to round to
Examples
>>> df.mutate(x = tp.round(col('x')))
- sqrt(x)[source]¶
Get column square root
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(sqrt_x = tp.sqrt('x'))
- cor(x, y, method='pearson')[source]¶
Find the correlation of two columns
- Parameters:
x (Expr) – A column
y (Expr) – A column
method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.
Examples
>>> df.summarize(cor = tp.cor(col('x'), col('y')))
- cov(x, y)[source]¶
Find the covariance of two columns
- Parameters:
x (Expr) – A column
y (Expr) – A column
Examples
>>> df.summarize(cov = tp.cov(col('x'), col('y')))
- count(x)[source]¶
Number of observations in each group
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(count = tp.count(col('x')))
- first(x)[source]¶
Get first value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(first_x = tp.first('x')) >>> df.summarize(first_x = tp.first(col('x')))
- last(x)[source]¶
Get last value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(last_x = tp.last('x')) >>> df.summarize(last_x = tp.last(col('x')))
- length(x)[source]¶
Number of observations in each group
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(length = tp.length(col('x')))
- max(x)[source]¶
Get column max
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(max_x = tp.max('x')) >>> df.summarize(max_x = tp.max(col('x')))
- mean(x)[source]¶
Get column mean
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(mean_x = tp.mean('x')) >>> df.summarize(mean_x = tp.mean(col('x')))
- median(x)[source]¶
Get column median
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(median_x = tp.median('x')) >>> df.summarize(median_x = tp.median(col('x')))
- min(x)[source]¶
Get column minimum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(min_x = tp.min('x')) >>> df.summarize(min_x = tp.min(col('x')))
- n_distinct(x)[source]¶
Get number of distinct values in a column
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(min_x = tp.n_distinct('x')) >>> df.summarize(min_x = tp.n_distinct(col('x')))
- quantile(x, quantile=0.5)[source]¶
Get number of distinct values in a column
- Parameters:
x (Expr, Series) – Column to operate on
quantile (float) – Quantile to return
Examples
>>> df.summarize(quantile_x = tp.quantile('x', .25))
- sd(x)[source]¶
Get column standard deviation
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(sd_x = tp.sd('x')) >>> df.summarize(sd_x = tp.sd(col('x')))
- sum(x)[source]¶
Get column sum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(sum_x = tp.sum('x')) >>> df.summarize(sum_x = tp.sum(col('x')))
- var(x)[source]¶
Get column variance
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.summarize(sum_x = tp.var('x')) >>> df.summarize(sum_x = tp.var(col('x')))
- between(x, left, right)[source]¶
Test if values of a column are between two values
- Parameters:
x (Expr, Series) – Column to operate on
left (int) – Value to test if column is greater than or equal to
right (int) – Value to test if column is less than or equal to
Examples
>>> df = tp.tibble(x = range(4)) >>> df.filter(tp.between(col('x'), 1, 3))
- is_finite(x)[source]¶
Test if values of a column are finite
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = [1.0, float('inf')]) >>> df.filter(tp.is_finite(col('x')))
- is_in(x, y)[source]¶
Test if values of a column are in a list of values
- Parameters:
x (Expr, Series) – Column to operate on
y (list) – List to test against
Examples
>>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_in(col('x'), [1, 2]))
- is_infinite(x)[source]¶
Test if values of a column are infinite
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = [1.0, float('inf')]) >>> df.filter(tp.is_infinite(col('x')))
- is_nan(x)[source]¶
Test if values of a column are nan
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_nan(col('x')))
- is_not(x)[source]¶
Flip values of a boolean series
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_not(col('x') < 2))
- is_not_in(x, y)[source]¶
Test if values of a column are not in a list of values
- Parameters:
x (Expr, Series) – Column to operate on
y (list) – List to test against
Examples
>>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_not_in(col('x'), [1, 2]))
- is_not_null(x)[source]¶
Test if values of a column are not null
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_not_null(col('x'), [1, 2]))
- is_null(x)[source]¶
Test if values of a column are null
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_null(col('x')))
- as_boolean(x)[source]¶
Convert to a boolean
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(bool_x = tp.as_boolean(col('x')))
- as_float(x)[source]¶
Convert to float. Defaults to Float64.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(float_x = tp.as_float(col('x')))
- as_integer(x)[source]¶
Convert to integer. Defaults to Int64.
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(int_x = tp.as_integer(col('x')))
- as_string(x)[source]¶
Convert to string. Defaults to Utf8.
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(string_x = tp.as_string(col('x')))
- cast(x, dtype)[source]¶
General type conversion.
- Parameters:
x (Expr, Series) – Column to operate on
dtype (DataType) – Type to convert to
Examples
>>> df.mutate(float_x = tp.cast(col('x'), tp.Float64))
- as_date(x, format=None)[source]¶
Convert a string to a Date
- Parameters:
x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”
Examples
>>> df = tp.tibble(x = ['2021-01-01', '2021-10-01']) >>> df.mutate(date_x = tp.as_date(col('x')))
- as_datetime(x, format=None)[source]¶
Convert a string to a Datetime
- Parameters:
x (Expr, Series) – Column to operate on
fmt (str) – “yyyy-mm-dd”
Examples
>>> df = tp.tibble(x = ['2021-01-01', '2021-10-01']) >>> df.mutate(date_x = tp.as_datetime(col('x')))
- hour(x)[source]¶
Extract the hour from a datetime
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(hour = tp.as_hour(col('x')))
- make_date(year=1970, month=1, day=1)[source]¶
Create a date object
- Parameters:
year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal
Examples
>>> df.mutate(date = tp.make_date(2000, 1, 1))
- make_datetime(year=1970, month=1, day=1, hour=0, minute=0, second=0)[source]¶
Create a datetime object
- Parameters:
year (Expr, str, int) – Column or literal
month (Expr, str, int) – Column or literal
day (Expr, str, int) – Column or literal
hour (Expr, str, int) – Column or literal
minute (Expr, str, int) – Column or literal
second (Expr, str, int) – Column or literal
Examples
>>> df.mutate(date = tp.make_datetime(2000, 1, 1))
- mday(x)[source]¶
Extract the month day from a date from 1 to 31.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(monthday = tp.mday(col('x')))
- minute(x)[source]¶
Extract the minute from a datetime
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(hour = tp.minute(col('x')))
- month(x)[source]¶
Extract the month from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(year = tp.month(col('x')))
- quarter(x)[source]¶
Extract the quarter from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(quarter = tp.quarter(col('x')))
- dt_round(x, rule, n)[source]¶
Round the datetime
- Parameters:
x (Expr, Series) – Column to operate on
rule (str) –
Units of the downscaling operation. Any of:
”month”
”week”
”day”
”hour”
”minute”
”second”
n (int) – Number of units (e.g. 5 “day”, 15 “minute”.
Examples
>>> df.mutate(monthday = tp.mday(col('x')))
- second(x)[source]¶
Extract the second from a datetime
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(hour = tp.minute(col('x')))
- wday(x)[source]¶
Extract the weekday from a date from sunday = 1 to saturday = 7.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(weekday = tp.wday(col('x')))
- week(x)[source]¶
Extract the week from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(week = tp.week(col('x')))
- yday(x)[source]¶
Extract the year day from a date from 1 to 366.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(yearday = tp.yday(col('x')))
- year(x)[source]¶
Extract the year from a date
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(year = tp.year(col('x')))
- col¶
- Utf8¶
- paste(*args, sep=' ')[source]¶
Concatenate strings together
- Parameters:
args (Expr, str) – Columns and or strings to concatenate
Examples
>>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(x_end = tp.paste(col('x'), 'end', sep = '_'))
- paste0(*args)[source]¶
Concatenate strings together with no separator
- Parameters:
args (Expr, str) – Columns and or strings to concatenate
Examples
>>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(xend = tp.paste0(col('x'), 'end'))
- str_c(*args, sep='')[source]¶
Concatenate strings together
- Parameters:
args (Expr, str) – Columns and/or strings to concatenate
Examples
>>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(x_end = str_c(col('x'), 'end', sep = '_'))
- str_detect(string, pattern, negate=False)[source]¶
Detect the presence or absence of a pattern in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_detect('name', 'a')) >>> df.mutate(x = str_detect('name', ['a', 'e']))
- str_extract(string, pattern)[source]¶
Extract the target capture group from provided patterns
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_extract(col('name'), 'e'))
- str_length(string)[source]¶
Length of a string
- Parameters:
string (str) – Input series to operate on
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_length(col('name')))
- str_remove_all(string, pattern)[source]¶
Removes all matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_remove_all(col('name'), 'a'))
- str_remove(string, pattern)[source]¶
Removes the first matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_remove(col('name'), 'a'))
- str_replace_all(string, pattern, replacement)[source]¶
Replaces all matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_replace_all(col('name'), 'a', 'A'))
- str_replace(string, pattern, replacement)[source]¶
Replaces the first matched patterns in a string
- Parameters:
string (str) – Input series to operate on
pattern (str) – Pattern to look for
replacement (str) – String that replaces anything that matches the pattern
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_replace(col('name'), 'a', 'A'))
- str_ends(string, pattern, negate=False)[source]¶
Detect the presence or absence of a pattern at the end of a string.
- Parameters:
string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements
Examples
>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing']) >>> df.filter(tp.str_ends(col('words'), 'ing'))
- str_starts(string, pattern, negate=False)[source]¶
Detect the presence or absence of a pattern at the beginning of a string.
- Parameters:
string (Expr) – Column to operate on
pattern (str) – Pattern to look for
negate (bool) – If True, return non-matching elements
Examples
>>> df = tp.Tibble(words = ['apple', 'bear', 'amazing']) >>> df.filter(tp.str_starts(col('words'), 'a'))
- str_sub(string, start=0, end=None)[source]¶
Extract portion of string based on start and end inputs
- Parameters:
string (str) – Input series to operate on
start (int) – First position of the character to return
end (int) – Last position of the character to return
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_sub(col('name'), 0, 3))
- str_to_lower(string)[source]¶
Convert case of a string
- Parameters:
string (str) – Convert case of this string
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_to_lower(col('name')))
- str_to_upper(string)[source]¶
Convert case of a string
- Parameters:
string (str) – Convert case of this string
Examples
>>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_to_upper(col('name')))
- str_trim(string, side='both')[source]¶
Trim whitespace
- Parameters:
string (Expr, Series) – Column or series to operate on
side (str) –
- One of:
”both”
”left”
”right”
Examples
>>> df = tp.Tibble(x = [' a ', ' b ', ' c ']) >>> df.mutate(x = tp.str_trim(col('x')))
- as_tibble(x)[source]¶
Convert an object to a tibble
- Parameters:
x ([pl.DataFrame, pd.DataFrame, dict]) – Object to convert to a tibble
Examples
>>> tp.as_tibble(polars_df)
- class tibble(_data=None, **kwargs)[source]¶
Bases:
tidypolars.reexports.pl.DataFrameA data frame object that provides methods familiar to R tidyverse users.
- _repr_html_()[source]¶
Printing method for jupyter
Output rows and columns can be modified by setting the following ENVIRONMENT variables:
POLARS_FMT_MAX_COLS: set the number of columns
POLARS_FMT_MAX_ROWS: set the number of rows
- __getitem__(col)[source]¶
Get part of the DataFrame as a new DataFrame, Series, or scalar.
- Parameters:
key –
Rows / columns to select. This is easiest to explain via example. Suppose we have a DataFrame with columns ‘a’, ‘d’, ‘c’, ‘d’. Here is what various types of key would do:
df[0, ‘a’] extracts the first element of column ‘a’ and returns a scalar.
df[0] extracts the first row and returns a Dataframe.
df[‘a’] extracts column ‘a’ and returns a Series.
df[0:2] extracts the first two rows and returns a Dataframe.
df[0:2, ‘a’] extracts the first two rows from column ‘a’ and returns a Series.
df[0:2, 0] extracts the first two rows from the first column and returns a Series.
df[[0, 1], [0, 1, 2]] extracts the first two rows and the first three columns and returns a Dataframe.
df[0: 2, [‘a’, ‘c’]] extracts the first two rows from columns ‘a’ and ‘c’ and returns a Dataframe.
df[:, 0: 2] extracts all rows from the first two columns and returns a Dataframe.
df[:, ‘a’: ‘c’] extracts all rows and all columns positioned between ‘a’ and ‘c’ inclusive and returns a Dataframe. In our example, that would extract columns ‘a’, ‘d’, and ‘c’.
- Return type:
DataFrame, Series, or scalar, depending on key.
Examples
>>> df = pl.DataFrame( ... {"a": [1, 2, 3], "d": [4, 5, 6], "c": [1, 3, 2], "b": [7, 8, 9]} ... ) >>> df[0] shape: (1, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ └─────┴─────┴─────┴─────┘ >>> df[0, "a"] 1 >>> df["a"] shape: (3,) Series: 'a' [i64] [ 1 2 3 ] >>> df[0:2] shape: (2, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ │ 2 ┆ 5 ┆ 3 ┆ 8 │ └─────┴─────┴─────┴─────┘ >>> df[0:2, "a"] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[0:2, 0] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[[0, 1], [0, 1, 2]] shape: (2, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ └─────┴─────┴─────┘ >>> df[0:2, ["a", "c"]] shape: (2, 2) ┌─────┬─────┐ │ a ┆ c │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 1 │ │ 2 ┆ 3 │ └─────┴─────┘ >>> df[:, 0:2] shape: (3, 2) ┌─────┬─────┐ │ a ┆ d │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 4 │ │ 2 ┆ 5 │ │ 3 ┆ 6 │ └─────┴─────┘ >>> df[:, "a":"c"] shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ │ 3 ┆ 6 ┆ 2 │ └─────┴─────┴─────┘
- arrange(*args)[source]¶
Arrange/sort rows
- Parameters:
*args (str) – Columns to sort by
Examples
>>> df = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> # Arrange in ascending order >>> df.arrange('x', 'y') ... >>> # Arrange some columns descending >>> df.arrange(tp.desc('x'), 'y')
- as_dict(*, as_series=True)[source]¶
Aggregate data with summary statistics
- Parameters:
as_series (bool) – If True - returns the dict values as Series If False - returns the dict values as lists
Examples
>>> df.to_dict() >>> df.to_dict(as_series = False)
- bind_cols(*args)[source]¶
Bind data frames by columns
- Parameters:
df (tibble) – Data frame to bind
Examples
>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)}) >>> df1.bind_cols(df2)
- bind_rows(*args)[source]¶
Bind data frames by row
- Parameters:
*args (tibble, list) – Data frames to bind by row
Examples
>>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)}) >>> df1.bind_rows(df2)
- count(*args, sort=False, name='n')[source]¶
Returns row counts of the dataset. If bare column names are provided, count() returns counts by group.
- Parameters:
*args (str, Expr) – Columns to group by
sort (bool) – Should columns be ordered in descending order by count
name (str) – The name of the new column in the output. If omitted, it will default to “n”.
Examples
>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.count() >>> df.count('b')
- distinct(*args)[source]¶
Select distinct/unique rows
- Parameters:
*args (str, Expr) – Columns to find distinct/unique rows
Examples
>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.distinct() >>> df.distinct('b')
- drop(*args)[source]¶
Drop unwanted columns
- Parameters:
*args (str) – Columns to drop
Examples
>>> df.drop('x', 'y')
- drop_null(*args)[source]¶
Drop rows containing missing values
- Parameters:
*args (str) – Columns to drop nulls from (defaults to all)
Examples
>>> df = tp.tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)} >>> df.drop_null() >>> df.drop_null('x', 'y')
- glimpse()[source]¶
Return a dense preview of the DataFrame.
The formatting shows one line per column so that wide dataframes display cleanly. Each line shows the column name, the data type, and the first few values.
- fill(*args, direction='down', _by=None)[source]¶
Fill in missing values with previous or next value
- Parameters:
*args (str) – Columns to fill
direction (str) – Direction to fill. One of [‘down’, ‘up’, ‘downup’, ‘updown’]
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': [1, None, 3, 4, 5], ... 'b': [None, 2, None, None, 5], ... 'groups': ['a', 'a', 'a', 'b', 'b']}) >>> df.fill('a', 'b') >>> df.fill('a', 'b', by = 'groups') >>> df.fill('a', 'b', direction = 'downup')
- filter(*args, _by=None)[source]¶
Filter rows on one or more conditions
- Parameters:
*args (Expr) – Conditions to filter by
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.filter(col('a') < 2, col('b') == 'a') >>> df.filter((col('a') < 2) & (col('b') == 'a')) >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')
- full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')[source]¶
Perform an full join
- Parameters:
df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.full_join(df2) >>> df1.full_join(df2, on = 'x') >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')
- inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform an inner join
- Parameters:
df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.inner_join(df2) >>> df1.inner_join(df2, on = 'x') >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')
- left_join(df, left_on=None, right_on=None, on=None, suffix='_right')[source]¶
Perform a left join
- Parameters:
df (tibble) – Lazy DataFrame to join with.
left_on (str, list) – Join column(s) of the left DataFrame.
right_on (str, list) – Join column(s) of the right DataFrame.
on (str, list) – Join column(s) of both DataFrames. If set, left_on and right_on should be None.
suffix (str) – Suffix to append to columns with a duplicate name.
Examples
>>> df1.left_join(df2) >>> df1.left_join(df2, on = 'x') >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')
- mutate(*args, _by=None, **kwargs)[source]¶
Add or modify columns
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']}) >>> df.mutate(double_a = col('a') * 2, ... a_plus_b = col('a') + col('b')) >>> df.mutate(row_num = row_number(), by = 'c')
- pivot_longer(cols=everything(), names_to='name', values_to='value')[source]¶
Pivot data from wide to long
- Parameters:
cols (Expr) – List of the columns to pivot. Defaults to all columns.
names_to (str) – Name of the new “names” column.
values_to (str) – Name of the new “values” column
Examples
>>> df = tp.tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]}) >>> df.pivot_longer(cols = ['a', 'b']) >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')
- pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)[source]¶
Pivot data from long to wide
- Parameters:
names_from (str) – Column to get the new column names from.
values_from (str) – Column to get the new column values from
id_cols (str, list) – A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in names_from and values_from.
values_fn (str) – Function for how multiple entries per group should be dealt with. Any of ‘first’, ‘count’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’
values_fill (str) – If values are missing/null, what value should be filled in. Can use: “backward”, “forward”, “mean”, “min”, “max”, “zero”, “one”
Examples
>>> df = tp.tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]}) >>> df.pivot_wider(names_from = 'variable', values_from = 'value')
- pull(var=None)[source]¶
Extract a column as a series
- Parameters:
var (str) – Name of the column to extract. Defaults to the last column.
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3)) >>> df.pull('a')
- relocate(*args, _before=None, _after=None)[source]¶
Move a column or columns to a new position
- Parameters:
*args (str, Expr) – Columns to move
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.relocate('a', before = 'c') >>> df.relocate('b', after = 'c')
- rename(_mapping=None, **kwargs)[source]¶
Rename columns
- Parameters:
_mapping (dict) – Dictionary mapping of new names
**kwargs (str) – key-value pair of new name from old name
Examples
>>> df = tp.tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']}) >>> df.rename(new_x = 'x') # dplyr interface >>> df.rename({'x': 'new_x'}) # pandas interface
- replace_null(replace=None)[source]¶
Replace null values
- Parameters:
replace (dict) – Dictionary of column/replacement pairs
Examples
>>> df = tp.tibble(x = [0, None], y = [None, None]) >>> df.replace_null(dict(x = 1, y = 2))
- separate(sep_col, into, sep='_', remove=True)[source]¶
Separate a character column into multiple columns
- Parameters:
sep_col (str) – Column to split into multiple columns
into (list) – List of new column names
sep (str) – Separator to split on. Default to ‘_’
remove (bool) – If True removes the input column from the output data frame
Examples
>>> df = tp.tibble(x = ['a_a', 'b_b', 'c_c']) >>> df.separate('x', into = ['left', 'right'])
- set_names(nm=None)[source]¶
Change the column names of the data frame
- Parameters:
nm (list) – A list of new names for the data frame
Examples
>>> df = tp.tibble(x = range(3), y = range(3)) >>> df.set_names(['a', 'b'])
- select(*args)[source]¶
Select or drop columns
- Parameters:
*args (str, Expr) – Columns to select
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select('a', 'b') >>> df.select(col('a'), col('b'))
- slice(*args, _by=None)[source]¶
Grab rows from a data frame
- Parameters:
*args (int, list) – Rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice(0, 1) >>> df.slice(0, by = 'c')
- slice_head(n=5, *, _by=None)[source]¶
Grab top rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_head(2) >>> df.slice_head(1, by = 'c')
- slice_tail(n=5, *, _by=None)[source]¶
Grab bottom rows from a data frame
- Parameters:
n (int) – Number of rows to grab
by (str, list) – Columns to group by
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_tail(2) >>> df.slice_tail(1, by = 'c')
- summarize(*args, _by=None, **kwargs)[source]¶
Aggregate data with summary statistics
- Parameters:
*args (Expr) – Column expressions to add or modify
by (str, list) – Columns to group by
**kwargs (Expr) – Column expressions to add or modify
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.summarize(avg_a = tp.mean(col('a'))) >>> df.summarize(avg_a = tp.mean(col('a')), ... by = 'c') >>> df.summarize(avg_a = tp.mean(col('a')), ... max_b = tp.max(col('b')))
- unite(col='_united', unite_cols=[], sep='_', remove=True)[source]¶
Unite multiple columns by pasting strings together
- Parameters:
col (str) – Name of the new column
unite_cols (list) – List of columns to unite
sep (str) – Separator to use between values
remove (bool) – If True removes input columns from the data frame
Examples
>>> df = tp.tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3)) >>> df.unite("united_col", unite_cols = ["a", "b"])
- write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)[source]¶
Write a data frame to a parquet
- property names¶
Get column names
Examples
>>> df.names
- property ncol¶
Get number of columns
Examples
>>> df.ncol
- property nrow¶
Get number of rows
Examples
>>> df.nrow
- property plot¶
Access to polars plotting
Examples
>>> df.plot
- from_pandas(df)[source]¶
Convert from pandas DataFrame to tibble
- Parameters:
df (DataFrame) – pd.DataFrame to convert to a tibble
Examples
>>> tp.from_pandas(df)
- from_polars(df)[source]¶
Convert from polars DataFrame to tibble
- Parameters:
df (DataFrame) – pl.DataFrame to convert to a tibble
Examples
>>> tp.from_polars(df)
- contains(match, ignore_case=True)[source]¶
Contains a literal string
- Parameters:
match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select(contains('c'))
- ends_with(match, ignore_case=True)[source]¶
Ends with a suffix
- Parameters:
match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.
Examples
>>> df = tp.tibble({'a': range(3), 'b_code': range(3), 'c_code': ['a', 'a', 'b']}) >>> df.select(ends_with('code'))
- everything()[source]¶
Selects all columns
Examples
>>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select(everything())
- starts_with(match, ignore_case=True)[source]¶
Starts with a prefix
- Parameters:
match (str) – String to match columns
ignore_case (bool) – If TRUE, the default, ignores case when matching names.
Examples
>>> df = tp.tibble({'a': range(3), 'add': range(3), 'sub': ['a', 'a', 'b']}) >>> df.select(starts_with('a'))
- where(col_type)[source]¶
Select columns by type using a string
- Options:
date, datetime, float, integer, numeric, string
Examples
>>> df.select(tp.where("integer"))
- __all__¶