tidypolars ========== .. py:module:: tidypolars Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/tidypolars/funs/index /autoapi/tidypolars/lubridate/index /autoapi/tidypolars/reexports/index /autoapi/tidypolars/stringr/index /autoapi/tidypolars/tibble_df/index /autoapi/tidypolars/tidyselect/index Attributes ---------- .. autoapisummary:: tidypolars.__version__ tidypolars.col tidypolars.exclude tidypolars.lit tidypolars.Expr tidypolars.Series tidypolars.Int8 tidypolars.Int16 tidypolars.Int32 tidypolars.Int64 tidypolars.UInt8 tidypolars.UInt16 tidypolars.UInt32 tidypolars.UInt64 tidypolars.Float32 tidypolars.Float64 tidypolars.Boolean tidypolars.Utf8 tidypolars.List tidypolars.Date tidypolars.Datetime tidypolars.Object tidypolars.__all__ Classes ------- .. autoapisummary:: tidypolars.tibble Functions --------- .. autoapisummary:: tidypolars.abs tidypolars.across tidypolars.case_when tidypolars.coalesce tidypolars.floor tidypolars.if_else tidypolars.lag tidypolars.lead tidypolars.log tidypolars.log10 tidypolars.read_csv tidypolars.read_parquet tidypolars.rep tidypolars.replace_null tidypolars.round tidypolars.row_number tidypolars.sqrt tidypolars.cor tidypolars.cov tidypolars.count tidypolars.first tidypolars.last tidypolars.length tidypolars.max tidypolars.mean tidypolars.median tidypolars.min tidypolars.n tidypolars.n_distinct tidypolars.quantile tidypolars.sd tidypolars.sum tidypolars.var tidypolars.between tidypolars.is_finite tidypolars.is_in tidypolars.is_infinite tidypolars.is_nan tidypolars.is_not tidypolars.is_not_in tidypolars.is_not_null tidypolars.is_null tidypolars.as_boolean tidypolars.as_float tidypolars.as_integer tidypolars.as_string tidypolars.cast tidypolars.as_date tidypolars.as_datetime tidypolars.hour tidypolars.make_date tidypolars.make_datetime tidypolars.mday tidypolars.minute tidypolars.month tidypolars.quarter tidypolars.dt_round tidypolars.second tidypolars.wday tidypolars.week tidypolars.yday tidypolars.year tidypolars.paste tidypolars.paste0 tidypolars.str_c tidypolars.str_detect tidypolars.str_extract tidypolars.str_length tidypolars.str_remove_all tidypolars.str_remove tidypolars.str_replace_all tidypolars.str_replace tidypolars.str_ends tidypolars.str_starts tidypolars.str_sub tidypolars.str_to_lower tidypolars.str_to_upper tidypolars.str_trim tidypolars.as_tibble tidypolars.is_tibble tidypolars.desc tidypolars.from_pandas tidypolars.from_polars tidypolars.contains tidypolars.ends_with tidypolars.everything tidypolars.starts_with tidypolars.where Package Contents ---------------- .. py:data:: __version__ .. py:function:: abs(x) Absolute value :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(abs_x = tp.abs('x')) >>> df.mutate(abs_x = tp.abs(col('x'))) .. py:function:: across(cols, fn=lambda x: x, names_prefix=None) Apply a function across a selection of columns :param cols: Columns to operate on :type cols: list :param fn: A function or lambda to apply to each column :type fn: lambda :param names_prefix: Prefix to append to changed columns :type names_prefix: Optional - str .. rubric:: Examples >>> df = tp.tibble(x = ['a', 'a', 'b'], y = range(3), z = range(3)) >>> df.mutate(across(['y', 'z'], lambda x: x * 2)) >>> df.mutate(across(tp.Int64, lambda x: x * 2, names_prefix = "double_")) >>> df.summarize(across(['y', 'z'], tp.mean), by = 'x') .. py:function:: case_when(*args, _default=pl.Null) Case when :param expr: A logical expression :type expr: Expr .. rubric:: Examples >>> df = tp.tibble(x = range(1, 4)) >>> df.mutate( >>> case_x = tp.case_when(col('x') < 2, 1, >>> col('x') < 3, 2, >>> _default = 0) >>> ) .. py:function:: coalesce(*args) Coalesce missing values :param args: Columns to coalesce :type args: Expr .. rubric:: Examples >>> df.mutate(coalesce_xy = tp.coalesce(col('x'), col('y'))) .. py:function:: floor(x) Round numbers down to the lower integer :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(floor_x = tp.floor(col('x'))) .. py:function:: if_else(condition, true, false) If Else :param condition: A logical expression :type condition: Expr :param true: Value if the condition is true :param false: Value if the condition is false .. rubric:: Examples >>> df = tp.tibble(x = range(1, 4)) >>> df.mutate(if_x = tp.if_else(col('x') < 2, 1, 2)) .. py:function:: lag(x, n: int = 1, default=None) Get lagging values :param x: Column to operate on :type x: Expr, Series :param n: Number of positions to lag by :type n: int :param default: Value to fill in missing values :type default: optional .. rubric:: Examples >>> df.mutate(lag_x = tp.lag(col('x'))) >>> df.mutate(lag_x = tp.lag('x')) .. py:function:: lead(x, n: int = 1, default=None) Get leading values :param x: Column to operate on :type x: Expr, Series :param n: Number of positions to lead by :type n: int :param default: Value to fill in missing values :type default: optional .. rubric:: Examples >>> df.mutate(lead_x = tp.lead(col('x'))) >>> df.mutate(lead_x = col('x').lead()) .. py:function:: log(x) Compute the natural logarithm of a column :param x: Column to operate on :type x: Expr .. rubric:: Examples >>> df.mutate(log = tp.log('x')) .. py:function:: log10(x) Compute the base 10 logarithm of a column :param x: Column to operate on :type x: Expr .. rubric:: Examples >>> df.mutate(log = tp.log10('x')) .. py:function:: read_csv(file: str, *args, **kwargs) Simple wrapper around polars.read_csv .. py:function:: read_parquet(source: str, *args, **kwargs) Simple wrapper around polars.read_parquet .. py:function:: rep(x, times=1) Replicate the values in x :param x: Value or Series to repeat :type x: const, Series :param times: Number of times to repeat :type times: int .. rubric:: Examples >>> tp.rep(1, 3) >>> tp.rep(pl.Series(range(3)), 3) .. py:function:: replace_null(x, replace=None) Replace null values :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = [0, None], y = [None, None]) >>> df.mutate(x = tp.replace_null(col('x'), 1)) .. py:function:: round(x, decimals=0) Get column standard deviation :param x: Column to operate on :type x: Expr, Series :param decimals: Decimals to round to :type decimals: int .. rubric:: Examples >>> df.mutate(x = tp.round(col('x'))) .. py:function:: row_number() Return row number .. rubric:: Examples >>> df.mutate(row_num = tp.row_number()) .. py:function:: sqrt(x) Get column square root :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(sqrt_x = tp.sqrt('x')) .. py:function:: cor(x, y, method='pearson') Find the correlation of two columns :param x: A column :type x: Expr :param y: A column :type y: Expr :param method: Type of correlation to find. Either 'pearson' or 'spearman'. :type method: str .. rubric:: Examples >>> df.summarize(cor = tp.cor(col('x'), col('y'))) .. py:function:: cov(x, y) Find the covariance of two columns :param x: A column :type x: Expr :param y: A column :type y: Expr .. rubric:: Examples >>> df.summarize(cov = tp.cov(col('x'), col('y'))) .. py:function:: count(x) Number of observations in each group :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(count = tp.count(col('x'))) .. py:function:: first(x) Get first value :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(first_x = tp.first('x')) >>> df.summarize(first_x = tp.first(col('x'))) .. py:function:: last(x) Get last value :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(last_x = tp.last('x')) >>> df.summarize(last_x = tp.last(col('x'))) .. py:function:: length(x) Number of observations in each group :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(length = tp.length(col('x'))) .. py:function:: max(x) Get column max :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(max_x = tp.max('x')) >>> df.summarize(max_x = tp.max(col('x'))) .. py:function:: mean(x) Get column mean :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(mean_x = tp.mean('x')) >>> df.summarize(mean_x = tp.mean(col('x'))) .. py:function:: median(x) Get column median :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(median_x = tp.median('x')) >>> df.summarize(median_x = tp.median(col('x'))) .. py:function:: min(x) Get column minimum :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(min_x = tp.min('x')) >>> df.summarize(min_x = tp.min(col('x'))) .. py:function:: n() Number of observations in each group .. rubric:: Examples >>> df.summarize(count = tp.n()) .. py:function:: n_distinct(x) Get number of distinct values in a column :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(min_x = tp.n_distinct('x')) >>> df.summarize(min_x = tp.n_distinct(col('x'))) .. py:function:: quantile(x, quantile=0.5) Get number of distinct values in a column :param x: Column to operate on :type x: Expr, Series :param quantile: Quantile to return :type quantile: float .. rubric:: Examples >>> df.summarize(quantile_x = tp.quantile('x', .25)) .. py:function:: sd(x) Get column standard deviation :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(sd_x = tp.sd('x')) >>> df.summarize(sd_x = tp.sd(col('x'))) .. py:function:: sum(x) Get column sum :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.summarize(sum_x = tp.sum('x')) >>> df.summarize(sum_x = tp.sum(col('x'))) .. py:function:: var(x) Get column variance :param x: Column to operate on :type x: Expr .. rubric:: Examples >>> df.summarize(sum_x = tp.var('x')) >>> df.summarize(sum_x = tp.var(col('x'))) .. py:function:: between(x, left, right) Test if values of a column are between two values :param x: Column to operate on :type x: Expr, Series :param left: Value to test if column is greater than or equal to :type left: int :param right: Value to test if column is less than or equal to :type right: int .. rubric:: Examples >>> df = tp.tibble(x = range(4)) >>> df.filter(tp.between(col('x'), 1, 3)) .. py:function:: is_finite(x) Test if values of a column are finite :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = [1.0, float('inf')]) >>> df.filter(tp.is_finite(col('x'))) .. py:function:: is_in(x, y) Test if values of a column are in a list of values :param x: Column to operate on :type x: Expr, Series :param y: List to test against :type y: list .. rubric:: Examples >>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_in(col('x'), [1, 2])) .. py:function:: is_infinite(x) Test if values of a column are infinite :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = [1.0, float('inf')]) >>> df.filter(tp.is_infinite(col('x'))) .. py:function:: is_nan(x) Test if values of a column are nan :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_nan(col('x'))) .. py:function:: is_not(x) Flip values of a boolean series :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_not(col('x') < 2)) .. py:function:: is_not_in(x, y) Test if values of a column are not in a list of values :param x: Column to operate on :type x: Expr, Series :param y: List to test against :type y: list .. rubric:: Examples >>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_not_in(col('x'), [1, 2])) .. py:function:: is_not_null(x) Test if values of a column are not null :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_not_null(col('x'), [1, 2])) .. py:function:: is_null(x) Test if values of a column are null :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df = tp.tibble(x = range(3)) >>> df.filter(tp.is_null(col('x'))) .. py:function:: as_boolean(x) Convert to a boolean :param x: Column to operate on :type x: Expr .. rubric:: Examples >>> df.mutate(bool_x = tp.as_boolean(col('x'))) .. py:function:: as_float(x) Convert to float. Defaults to Float64. :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(float_x = tp.as_float(col('x'))) .. py:function:: as_integer(x) Convert to integer. Defaults to Int64. :param x: Column to operate on :type x: Expr .. rubric:: Examples >>> df.mutate(int_x = tp.as_integer(col('x'))) .. py:function:: as_string(x) Convert to string. Defaults to Utf8. :param x: Column to operate on :type x: Expr .. rubric:: Examples >>> df.mutate(string_x = tp.as_string(col('x'))) .. py:function:: cast(x, dtype) General type conversion. :param x: Column to operate on :type x: Expr, Series :param dtype: Type to convert to :type dtype: DataType .. rubric:: Examples >>> df.mutate(float_x = tp.cast(col('x'), tp.Float64)) .. py:function:: as_date(x, format=None) Convert a string to a Date :param x: Column to operate on :type x: Expr, Series :param fmt: "yyyy-mm-dd" :type fmt: str .. rubric:: Examples >>> df = tp.tibble(x = ['2021-01-01', '2021-10-01']) >>> df.mutate(date_x = tp.as_date(col('x'))) .. py:function:: as_datetime(x, format=None) Convert a string to a Datetime :param x: Column to operate on :type x: Expr, Series :param fmt: "yyyy-mm-dd" :type fmt: str .. rubric:: Examples >>> df = tp.tibble(x = ['2021-01-01', '2021-10-01']) >>> df.mutate(date_x = tp.as_datetime(col('x'))) .. py:function:: hour(x) Extract the hour from a datetime :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(hour = tp.as_hour(col('x'))) .. py:function:: make_date(year=1970, month=1, day=1) Create a date object :param year: Column or literal :type year: Expr, str, int :param month: Column or literal :type month: Expr, str, int :param day: Column or literal :type day: Expr, str, int .. rubric:: Examples >>> df.mutate(date = tp.make_date(2000, 1, 1)) .. py:function:: make_datetime(year=1970, month=1, day=1, hour=0, minute=0, second=0) Create a datetime object :param year: Column or literal :type year: Expr, str, int :param month: Column or literal :type month: Expr, str, int :param day: Column or literal :type day: Expr, str, int :param hour: Column or literal :type hour: Expr, str, int :param minute: Column or literal :type minute: Expr, str, int :param second: Column or literal :type second: Expr, str, int .. rubric:: Examples >>> df.mutate(date = tp.make_datetime(2000, 1, 1)) .. py:function:: mday(x) Extract the month day from a date from 1 to 31. :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(monthday = tp.mday(col('x'))) .. py:function:: minute(x) Extract the minute from a datetime :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(hour = tp.minute(col('x'))) .. py:function:: month(x) Extract the month from a date :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(year = tp.month(col('x'))) .. py:function:: quarter(x) Extract the quarter from a date :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(quarter = tp.quarter(col('x'))) .. py:function:: dt_round(x, rule, n) Round the datetime :param x: Column to operate on :type x: Expr, Series :param rule: Units of the downscaling operation. Any of: - "month" - "week" - "day" - "hour" - "minute" - "second" :type rule: str :param n: Number of units (e.g. 5 "day", 15 "minute". :type n: int .. rubric:: Examples >>> df.mutate(monthday = tp.mday(col('x'))) .. py:function:: second(x) Extract the second from a datetime :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(hour = tp.minute(col('x'))) .. py:function:: wday(x) Extract the weekday from a date from sunday = 1 to saturday = 7. :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(weekday = tp.wday(col('x'))) .. py:function:: week(x) Extract the week from a date :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(week = tp.week(col('x'))) .. py:function:: yday(x) Extract the year day from a date from 1 to 366. :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(yearday = tp.yday(col('x'))) .. py:function:: year(x) Extract the year from a date :param x: Column to operate on :type x: Expr, Series .. rubric:: Examples >>> df.mutate(year = tp.year(col('x'))) .. py:data:: col .. py:data:: exclude .. py:data:: lit .. py:data:: Expr .. py:data:: Series .. py:data:: Int8 .. py:data:: Int16 .. py:data:: Int32 .. py:data:: Int64 .. py:data:: UInt8 .. py:data:: UInt16 .. py:data:: UInt32 .. py:data:: UInt64 .. py:data:: Float32 .. py:data:: Float64 .. py:data:: Boolean .. py:data:: Utf8 .. py:data:: List .. py:data:: Date .. py:data:: Datetime .. py:data:: Object .. py:function:: paste(*args, sep=' ') Concatenate strings together :param args: Columns and or strings to concatenate :type args: Expr, str .. rubric:: Examples >>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(x_end = tp.paste(col('x'), 'end', sep = '_')) .. py:function:: paste0(*args) Concatenate strings together with no separator :param args: Columns and or strings to concatenate :type args: Expr, str .. rubric:: Examples >>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(xend = tp.paste0(col('x'), 'end')) .. py:function:: str_c(*args, sep='') Concatenate strings together :param args: Columns and/or strings to concatenate :type args: Expr, str .. rubric:: Examples >>> df = tp.Tibble(x = ['a', 'b', 'c']) >>> df.mutate(x_end = str_c(col('x'), 'end', sep = '_')) .. py:function:: str_detect(string, pattern, negate=False) Detect the presence or absence of a pattern in a string :param string: Input series to operate on :type string: str :param pattern: Pattern to look for :type pattern: str :param negate: If True, return non-matching elements :type negate: bool .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_detect('name', 'a')) >>> df.mutate(x = str_detect('name', ['a', 'e'])) .. py:function:: str_extract(string, pattern) Extract the target capture group from provided patterns :param string: Input series to operate on :type string: str :param pattern: Pattern to look for :type pattern: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_extract(col('name'), 'e')) .. py:function:: str_length(string) Length of a string :param string: Input series to operate on :type string: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_length(col('name'))) .. py:function:: str_remove_all(string, pattern) Removes all matched patterns in a string :param string: Input series to operate on :type string: str :param pattern: Pattern to look for :type pattern: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_remove_all(col('name'), 'a')) .. py:function:: str_remove(string, pattern) Removes the first matched patterns in a string :param string: Input series to operate on :type string: str :param pattern: Pattern to look for :type pattern: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_remove(col('name'), 'a')) .. py:function:: str_replace_all(string, pattern, replacement) Replaces all matched patterns in a string :param string: Input series to operate on :type string: str :param pattern: Pattern to look for :type pattern: str :param replacement: String that replaces anything that matches the pattern :type replacement: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_replace_all(col('name'), 'a', 'A')) .. py:function:: str_replace(string, pattern, replacement) Replaces the first matched patterns in a string :param string: Input series to operate on :type string: str :param pattern: Pattern to look for :type pattern: str :param replacement: String that replaces anything that matches the pattern :type replacement: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_replace(col('name'), 'a', 'A')) .. py:function:: str_ends(string, pattern, negate=False) Detect the presence or absence of a pattern at the end of a string. :param string: Column to operate on :type string: Expr :param pattern: Pattern to look for :type pattern: str :param negate: If True, return non-matching elements :type negate: bool .. rubric:: Examples >>> df = tp.Tibble(words = ['apple', 'bear', 'amazing']) >>> df.filter(tp.str_ends(col('words'), 'ing')) .. py:function:: str_starts(string, pattern, negate=False) Detect the presence or absence of a pattern at the beginning of a string. :param string: Column to operate on :type string: Expr :param pattern: Pattern to look for :type pattern: str :param negate: If True, return non-matching elements :type negate: bool .. rubric:: Examples >>> df = tp.Tibble(words = ['apple', 'bear', 'amazing']) >>> df.filter(tp.str_starts(col('words'), 'a')) .. py:function:: str_sub(string, start=0, end=None) Extract portion of string based on start and end inputs :param string: Input series to operate on :type string: str :param start: First position of the character to return :type start: int :param end: Last position of the character to return :type end: int .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_sub(col('name'), 0, 3)) .. py:function:: str_to_lower(string) Convert case of a string :param string: Convert case of this string :type string: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_to_lower(col('name'))) .. py:function:: str_to_upper(string) Convert case of a string :param string: Convert case of this string :type string: str .. rubric:: Examples >>> df = tp.Tibble(name = ['apple', 'banana', 'pear', 'grape']) >>> df.mutate(x = str_to_upper(col('name'))) .. py:function:: str_trim(string, side='both') Trim whitespace :param string: Column or series to operate on :type string: Expr, Series :param side: One of: * "both" * "left" * "right" :type side: str .. rubric:: Examples >>> df = tp.Tibble(x = [' a ', ' b ', ' c ']) >>> df.mutate(x = tp.str_trim(col('x'))) .. py:function:: as_tibble(x) Convert an object to a tibble :param x: Object to convert to a tibble :type x: [pl.DataFrame, pd.DataFrame, dict] .. rubric:: Examples >>> tp.as_tibble(polars_df) .. py:function:: is_tibble(x) Is an object to a tibble :param x: :type x: object .. rubric:: Examples >>> tp.is_tibble(df) .. py:class:: tibble(_data=None, **kwargs) Bases: :py:obj:`tidypolars.reexports.pl.DataFrame` A data frame object that provides methods familiar to R tidyverse users. .. py:method:: __dir__() .. py:method:: __repr__() Printing method .. py:method:: _repr_html_() Printing method for jupyter Output rows and columns can be modified by setting the following ENVIRONMENT variables: * POLARS_FMT_MAX_COLS: set the number of columns * POLARS_FMT_MAX_ROWS: set the number of rows .. py:method:: __copy__() .. py:method:: __str__() Printing method .. py:method:: __getattribute__(attr) .. py:method:: __getitem__(col) Get part of the DataFrame as a new DataFrame, Series, or scalar. :param key: Rows / columns to select. This is easiest to explain via example. Suppose we have a DataFrame with columns `'a'`, `'d'`, `'c'`, `'d'`. Here is what various types of `key` would do: - `df[0, 'a']` extracts the first element of column `'a'` and returns a scalar. - `df[0]` extracts the first row and returns a Dataframe. - `df['a']` extracts column `'a'` and returns a Series. - `df[0:2]` extracts the first two rows and returns a Dataframe. - `df[0:2, 'a']` extracts the first two rows from column `'a'` and returns a Series. - `df[0:2, 0]` extracts the first two rows from the first column and returns a Series. - `df[[0, 1], [0, 1, 2]]` extracts the first two rows and the first three columns and returns a Dataframe. - `df[0: 2, ['a', 'c']]` extracts the first two rows from columns `'a'` and `'c'` and returns a Dataframe. - `df[:, 0: 2]` extracts all rows from the first two columns and returns a Dataframe. - `df[:, 'a': 'c']` extracts all rows and all columns positioned between `'a'` and `'c'` *inclusive* and returns a Dataframe. In our example, that would extract columns `'a'`, `'d'`, and `'c'`. :rtype: DataFrame, Series, or scalar, depending on `key`. .. rubric:: Examples >>> df = pl.DataFrame( ... {"a": [1, 2, 3], "d": [4, 5, 6], "c": [1, 3, 2], "b": [7, 8, 9]} ... ) >>> df[0] shape: (1, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ └─────┴─────┴─────┴─────┘ >>> df[0, "a"] 1 >>> df["a"] shape: (3,) Series: 'a' [i64] [ 1 2 3 ] >>> df[0:2] shape: (2, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ │ 2 ┆ 5 ┆ 3 ┆ 8 │ └─────┴─────┴─────┴─────┘ >>> df[0:2, "a"] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[0:2, 0] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[[0, 1], [0, 1, 2]] shape: (2, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ └─────┴─────┴─────┘ >>> df[0:2, ["a", "c"]] shape: (2, 2) ┌─────┬─────┐ │ a ┆ c │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 1 │ │ 2 ┆ 3 │ └─────┴─────┘ >>> df[:, 0:2] shape: (3, 2) ┌─────┬─────┐ │ a ┆ d │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 4 │ │ 2 ┆ 5 │ │ 3 ┆ 6 │ └─────┴─────┘ >>> df[:, "a":"c"] shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ │ 3 ┆ 6 ┆ 2 │ └─────┴─────┴─────┘ .. py:method:: arrange(*args) Arrange/sort rows :param \*args: Columns to sort by :type \*args: str .. rubric:: Examples >>> df = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> # Arrange in ascending order >>> df.arrange('x', 'y') ... >>> # Arrange some columns descending >>> df.arrange(tp.desc('x'), 'y') .. py:method:: as_dict(*, as_series=True) Aggregate data with summary statistics :param as_series: If True - returns the dict values as Series If False - returns the dict values as lists :type as_series: bool .. rubric:: Examples >>> df.to_dict() >>> df.to_dict(as_series = False) .. py:method:: as_pandas() Convert to a pandas DataFrame .. rubric:: Examples >>> df.as_pandas() .. py:method:: as_polars() Convert to a polars DataFrame .. rubric:: Examples >>> df.as_polars() .. py:method:: bind_cols(*args) Bind data frames by columns :param df: Data frame to bind :type df: tibble .. rubric:: Examples >>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)}) >>> df1.bind_cols(df2) .. py:method:: bind_rows(*args) Bind data frames by row :param \*args: Data frames to bind by row :type \*args: tibble, list .. rubric:: Examples >>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)}) >>> df1.bind_rows(df2) .. py:method:: clone() Very cheap deep clone .. py:method:: count(*args, sort=False, name='n') Returns row counts of the dataset. If bare column names are provided, count() returns counts by group. :param \*args: Columns to group by :type \*args: str, Expr :param sort: Should columns be ordered in descending order by count :type sort: bool :param name: The name of the new column in the output. If omitted, it will default to "n". :type name: str .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.count() >>> df.count('b') .. py:method:: distinct(*args) Select distinct/unique rows :param \*args: Columns to find distinct/unique rows :type \*args: str, Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.distinct() >>> df.distinct('b') .. py:method:: drop(*args) Drop unwanted columns :param \*args: Columns to drop :type \*args: str .. rubric:: Examples >>> df.drop('x', 'y') .. py:method:: drop_null(*args) Drop rows containing missing values :param \*args: Columns to drop nulls from (defaults to all) :type \*args: str .. rubric:: Examples >>> df = tp.tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)} >>> df.drop_null() >>> df.drop_null('x', 'y') .. py:method:: equals(other, null_equal=True) Check if two tibbles are equal .. py:method:: glimpse() Return a dense preview of the DataFrame. The formatting shows one line per column so that wide dataframes display cleanly. Each line shows the column name, the data type, and the first few values. .. py:method:: fill(*args, direction='down', _by=None) Fill in missing values with previous or next value :param \*args: Columns to fill :type \*args: str :param direction: Direction to fill. One of ['down', 'up', 'downup', 'updown'] :type direction: str :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': [1, None, 3, 4, 5], ... 'b': [None, 2, None, None, 5], ... 'groups': ['a', 'a', 'a', 'b', 'b']}) >>> df.fill('a', 'b') >>> df.fill('a', 'b', by = 'groups') >>> df.fill('a', 'b', direction = 'downup') .. py:method:: filter(*args, _by=None) Filter rows on one or more conditions :param \*args: Conditions to filter by :type \*args: Expr :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.filter(col('a') < 2, col('b') == 'a') >>> df.filter((col('a') < 2) & (col('b') == 'a')) >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b') .. py:method:: full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right') Perform an full join :param df: Lazy DataFrame to join with. :type df: tibble :param left_on: Join column(s) of the left DataFrame. :type left_on: str, list :param right_on: Join column(s) of the right DataFrame. :type right_on: str, list :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None. :type on: str, list :param suffix: Suffix to append to columns with a duplicate name. :type suffix: str .. rubric:: Examples >>> df1.full_join(df2) >>> df1.full_join(df2, on = 'x') >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x') .. py:method:: head(n=5, *, _by=None) Alias for `.slice_head()` .. py:method:: inner_join(df, left_on=None, right_on=None, on=None, suffix='_right') Perform an inner join :param df: Lazy DataFrame to join with. :type df: tibble :param left_on: Join column(s) of the left DataFrame. :type left_on: str, list :param right_on: Join column(s) of the right DataFrame. :type right_on: str, list :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None. :type on: str, list :param suffix: Suffix to append to columns with a duplicate name. :type suffix: str .. rubric:: Examples >>> df1.inner_join(df2) >>> df1.inner_join(df2, on = 'x') >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x') .. py:method:: left_join(df, left_on=None, right_on=None, on=None, suffix='_right') Perform a left join :param df: Lazy DataFrame to join with. :type df: tibble :param left_on: Join column(s) of the left DataFrame. :type left_on: str, list :param right_on: Join column(s) of the right DataFrame. :type right_on: str, list :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None. :type on: str, list :param suffix: Suffix to append to columns with a duplicate name. :type suffix: str .. rubric:: Examples >>> df1.left_join(df2) >>> df1.left_join(df2, on = 'x') >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x') .. py:method:: mutate(*args, _by=None, **kwargs) Add or modify columns :param \*args: Column expressions to add or modify :type \*args: Expr :param by: Columns to group by :type by: str, list :param \*\*kwargs: Column expressions to add or modify :type \*\*kwargs: Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']}) >>> df.mutate(double_a = col('a') * 2, ... a_plus_b = col('a') + col('b')) >>> df.mutate(row_num = row_number(), by = 'c') .. py:method:: pivot_longer(cols=everything(), names_to='name', values_to='value') Pivot data from wide to long :param cols: List of the columns to pivot. Defaults to all columns. :type cols: Expr :param names_to: Name of the new "names" column. :type names_to: str :param values_to: Name of the new "values" column :type values_to: str .. rubric:: Examples >>> df = tp.tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]}) >>> df.pivot_longer(cols = ['a', 'b']) >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things') .. py:method:: pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None) Pivot data from long to wide :param names_from: Column to get the new column names from. :type names_from: str :param values_from: Column to get the new column values from :type values_from: str :param id_cols: A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in `names_from` and `values_from`. :type id_cols: str, list :param values_fn: Function for how multiple entries per group should be dealt with. Any of 'first', 'count', 'sum', 'max', 'min', 'mean', 'median', 'last' :type values_fn: str :param values_fill: If values are missing/null, what value should be filled in. Can use: "backward", "forward", "mean", "min", "max", "zero", "one" :type values_fill: str .. rubric:: Examples >>> df = tp.tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]}) >>> df.pivot_wider(names_from = 'variable', values_from = 'value') .. py:method:: print() .. py:method:: pull(var=None) Extract a column as a series :param var: Name of the column to extract. Defaults to the last column. :type var: str .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3)) >>> df.pull('a') .. py:method:: relocate(*args, _before=None, _after=None) Move a column or columns to a new position :param \*args: Columns to move :type \*args: str, Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.relocate('a', before = 'c') >>> df.relocate('b', after = 'c') .. py:method:: rename(_mapping=None, **kwargs) Rename columns :param _mapping: Dictionary mapping of new names :type _mapping: dict :param \*\*kwargs: key-value pair of new name from old name :type \*\*kwargs: str .. rubric:: Examples >>> df = tp.tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']}) >>> df.rename(new_x = 'x') # dplyr interface >>> df.rename({'x': 'new_x'}) # pandas interface .. py:method:: replace_null(replace=None) Replace null values :param replace: Dictionary of column/replacement pairs :type replace: dict .. rubric:: Examples >>> df = tp.tibble(x = [0, None], y = [None, None]) >>> df.replace_null(dict(x = 1, y = 2)) .. py:method:: separate(sep_col, into, sep='_', remove=True) Separate a character column into multiple columns :param sep_col: Column to split into multiple columns :type sep_col: str :param into: List of new column names :type into: list :param sep: Separator to split on. Default to '_' :type sep: str :param remove: If True removes the input column from the output data frame :type remove: bool .. rubric:: Examples >>> df = tp.tibble(x = ['a_a', 'b_b', 'c_c']) >>> df.separate('x', into = ['left', 'right']) .. py:method:: set_names(nm=None) Change the column names of the data frame :param nm: A list of new names for the data frame :type nm: list .. rubric:: Examples >>> df = tp.tibble(x = range(3), y = range(3)) >>> df.set_names(['a', 'b']) .. py:method:: select(*args) Select or drop columns :param \*args: Columns to select :type \*args: str, Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select('a', 'b') >>> df.select(col('a'), col('b')) .. py:method:: slice(*args, _by=None) Grab rows from a data frame :param \*args: Rows to grab :type \*args: int, list :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice(0, 1) >>> df.slice(0, by = 'c') .. py:method:: slice_head(n=5, *, _by=None) Grab top rows from a data frame :param n: Number of rows to grab :type n: int :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_head(2) >>> df.slice_head(1, by = 'c') .. py:method:: slice_tail(n=5, *, _by=None) Grab bottom rows from a data frame :param n: Number of rows to grab :type n: int :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_tail(2) >>> df.slice_tail(1, by = 'c') .. py:method:: summarise(*args, _by=None, **kwargs) Alias for `.summarize()` .. py:method:: summarize(*args, _by=None, **kwargs) Aggregate data with summary statistics :param \*args: Column expressions to add or modify :type \*args: Expr :param by: Columns to group by :type by: str, list :param \*\*kwargs: Column expressions to add or modify :type \*\*kwargs: Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.summarize(avg_a = tp.mean(col('a'))) >>> df.summarize(avg_a = tp.mean(col('a')), ... by = 'c') >>> df.summarize(avg_a = tp.mean(col('a')), ... max_b = tp.max(col('b'))) .. py:method:: tail(n=5, *, _by=None) Alias for `.slice_tail()` .. py:method:: unite(col='_united', unite_cols=[], sep='_', remove=True) Unite multiple columns by pasting strings together :param col: Name of the new column :type col: str :param unite_cols: List of columns to unite :type unite_cols: list :param sep: Separator to use between values :type sep: str :param remove: If True removes input columns from the data frame :type remove: bool .. rubric:: Examples >>> df = tp.tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3)) >>> df.unite("united_col", unite_cols = ["a", "b"]) .. py:method:: write_csv(file=None, has_headers=True, sep=',') Write a data frame to a csv .. py:method:: write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs) Write a data frame to a parquet .. py:property:: names Get column names .. rubric:: Examples >>> df.names .. py:property:: ncol Get number of columns .. rubric:: Examples >>> df.ncol .. py:property:: nrow Get number of rows .. rubric:: Examples >>> df.nrow .. py:property:: plot Access to polars plotting .. rubric:: Examples >>> df.plot .. py:function:: desc(x) Mark a column to order in descending .. py:function:: from_pandas(df) Convert from pandas DataFrame to tibble :param df: pd.DataFrame to convert to a tibble :type df: DataFrame .. rubric:: Examples >>> tp.from_pandas(df) .. py:function:: from_polars(df) Convert from polars DataFrame to tibble :param df: pl.DataFrame to convert to a tibble :type df: DataFrame .. rubric:: Examples >>> tp.from_polars(df) .. py:function:: contains(match, ignore_case=True) Contains a literal string :param match: String to match columns :type match: str :param ignore_case: If TRUE, the default, ignores case when matching names. :type ignore_case: bool .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select(contains('c')) .. py:function:: ends_with(match, ignore_case=True) Ends with a suffix :param match: String to match columns :type match: str :param ignore_case: If TRUE, the default, ignores case when matching names. :type ignore_case: bool .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b_code': range(3), 'c_code': ['a', 'a', 'b']}) >>> df.select(ends_with('code')) .. py:function:: everything() Selects all columns .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select(everything()) .. py:function:: starts_with(match, ignore_case=True) Starts with a prefix :param match: String to match columns :type match: str :param ignore_case: If TRUE, the default, ignores case when matching names. :type ignore_case: bool .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'add': range(3), 'sub': ['a', 'a', 'b']}) >>> df.select(starts_with('a')) .. py:function:: where(col_type) Select columns by type using a string Options: date, datetime, float, integer, numeric, string .. rubric:: Examples >>> df.select(tp.where("integer")) .. py:data:: __all__