:py:mod:`tidypolars.tibble`
===========================

.. py:module:: tidypolars.tibble


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   tidypolars.tibble.Tibble


Functions
~~~~~~~~~

.. autoapisummary::

   tidypolars.tibble.desc
   tidypolars.tibble.from_polars
   tidypolars.tibble.from_pandas


.. py:class:: Tibble(_data=None, **kwargs)

   Bases: :py:obj:`tidypolars.reexports.pl.DataFrame`

   A data frame object that provides methods familiar to R tidyverse users.

   .. py:property:: names

      Get column names

      .. rubric:: Examples

      >>> df.names

   .. py:property:: ncol

      Get number of columns

      .. rubric:: Examples

      >>> df.ncol

   .. py:property:: nrow

      Get number of rows

      .. rubric:: Examples

      >>> df.nrow

   .. py:method:: __repr__()

      Printing method


   .. py:method:: _repr_html_()

      Printing method for jupyter

      Output rows and columns can be modified by setting the following ENVIRONMENT variables:

      * POLARS_FMT_MAX_COLS: set the number of columns

      * POLARS_FMT_MAX_ROWS: set the number of rows


   .. py:method:: __copy__()


   .. py:method:: __str__()

      Printing method


   .. py:method:: __getattribute__(attr)

      Return getattr(self, name).


   .. py:method:: __dir__()

      Default dir() implementation.


   .. py:method:: arrange(*args)

      Arrange/sort rows

      :param \*args: Columns to sort by
      :type \*args: str

      .. rubric:: Examples

      >>> df = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
      >>> # Arrange in ascending order
      >>> df.arrange('x', 'y')
      ...
      >>> # Arrange some columns descending
      >>> df.arrange(tp.desc('x'), 'y')


   .. py:method:: bind_cols(*args)

      Bind data frames by columns

      :param df: Data frame to bind
      :type df: Tibble

      .. rubric:: Examples

      >>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
      >>> df2 = tp.Tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)})
      >>> df1.bind_cols(df2)


   .. py:method:: bind_rows(*args)

      Bind data frames by row

      :param \*args: Data frames to bind by row
      :type \*args: Tibble, list

      .. rubric:: Examples

      >>> df1 = tp.Tibble({'x': ['a', 'a', 'b'], 'y': range(3)})
      >>> df2 = tp.Tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)})
      >>> df1.bind_rows(df2)


   .. py:method:: clone()

      Very cheap deep clone


   .. py:method:: count(*args, sort=False, name='n')

      Returns row counts of the dataset.
      If bare column names are provided, count() returns counts by group.

      :param \*args: Columns to group by
      :type \*args: str, Expr
      :param sort: Should columns be ordered in descending order by count
      :type sort: bool
      :param name: The name of the new column in the output. If omitted, it will default to "n".
      :type name: str

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
      >>> df.count()
      >>> df.count('b')


   .. py:method:: distinct(*args)

      Select distinct/unique rows

      :param \*args: Columns to find distinct/unique rows
      :type \*args: str, Expr

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
      >>> df.distinct()
      >>> df.distinct('b')


   .. py:method:: drop(*args)

      Drop unwanted columns

      :param \*args: Columns to drop
      :type \*args: str

      .. rubric:: Examples

      >>> df.drop('x', 'y')


   .. py:method:: drop_null(*args)

      Drop rows containing missing values

      :param \*args: Columns to drop nulls from (defaults to all)
      :type \*args: str

      .. rubric:: Examples

      >>> df = tp.Tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)}
      >>> df.drop_null()
      >>> df.drop_null('x', 'y')


   .. py:method:: head(n=5, *, by=None)

      Alias for `.slice_head()`


   .. py:method:: fill(*args, direction='down', by=None)

      Fill in missing values with previous or next value

      :param \*args: Columns to fill
      :type \*args: str
      :param direction: Direction to fill. One of ['down', 'up', 'downup', 'updown']
      :type direction: str
      :param by: Columns to group by
      :type by: str, list

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': [1, None, 3, 4, 5],
      ...                 'b': [None, 2, None, None, 5],
      ...                 'groups': ['a', 'a', 'a', 'b', 'b']})
      >>> df.fill('a', 'b')
      >>> df.fill('a', 'b', by = 'groups')
      >>> df.fill('a', 'b', direction = 'downup')


   .. py:method:: filter(*args, by=None)

      Filter rows on one or more conditions

      :param \*args: Conditions to filter by
      :type \*args: Expr
      :param by: Columns to group by
      :type by: str, list

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': ['a', 'a', 'b']})
      >>> df.filter(col('a') < 2, col('b') == 'a')
      >>> df.filter((col('a') < 2) & (col('b') == 'a'))
      >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b')


   .. py:method:: frame_equal(other, null_equal=True)

      Check if two Tibbles are equal


   .. py:method:: inner_join(df, left_on=None, right_on=None, on=None, suffix='_right')

      Perform an inner join

      :param df: Lazy DataFrame to join with.
      :type df: Tibble
      :param left_on: Join column(s) of the left DataFrame.
      :type left_on: str, list
      :param right_on: Join column(s) of the right DataFrame.
      :type right_on: str, list
      :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None.
      :type on: str, list
      :param suffix: Suffix to append to columns with a duplicate name.
      :type suffix: str

      .. rubric:: Examples

      >>> df1.inner_join(df2)
      >>> df1.inner_join(df2, on = 'x')
      >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x')


   .. py:method:: left_join(df, left_on=None, right_on=None, on=None, suffix='_right')

      Perform a left join

      :param df: Lazy DataFrame to join with.
      :type df: Tibble
      :param left_on: Join column(s) of the left DataFrame.
      :type left_on: str, list
      :param right_on: Join column(s) of the right DataFrame.
      :type right_on: str, list
      :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None.
      :type on: str, list
      :param suffix: Suffix to append to columns with a duplicate name.
      :type suffix: str

      .. rubric:: Examples

      >>> df1.left_join(df2)
      >>> df1.left_join(df2, on = 'x')
      >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x')


   .. py:method:: mutate(*args, by=None, **kwargs)

      Add or modify columns

      :param \*args: Column expressions to add or modify
      :type \*args: Expr
      :param by: Columns to group by
      :type by: str, list
      :param \*\*kwargs: Column expressions to add or modify
      :type \*\*kwargs: Expr

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']})
      >>> df.mutate(double_a = col('a') * 2,
      ...           a_plus_b = col('a') + col('b'))
      >>> df.mutate(row_num = row_number(), by = 'c')


   .. py:method:: full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right')

      Perform an full join

      :param df: Lazy DataFrame to join with.
      :type df: Tibble
      :param left_on: Join column(s) of the left DataFrame.
      :type left_on: str, list
      :param right_on: Join column(s) of the right DataFrame.
      :type right_on: str, list
      :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None.
      :type on: str, list
      :param suffix: Suffix to append to columns with a duplicate name.
      :type suffix: str

      .. rubric:: Examples

      >>> df1.full_join(df2)
      >>> df1.full_join(df2, on = 'x')
      >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x')


   .. py:method:: pivot_longer(cols=everything(), names_to='name', values_to='value')

      Pivot data from wide to long

      :param cols: List of the columns to pivot. Defaults to all columns.
      :type cols: Expr
      :param names_to: Name of the new "names" column.
      :type names_to: str
      :param values_to: Name of the new "values" column
      :type values_to: str

      .. rubric:: Examples

      >>> df = tp.Tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]})
      >>> df.pivot_longer(cols = ['a', 'b'])
      >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things')


   .. py:method:: pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None)

      Pivot data from long to wide

      :param names_from: Column to get the new column names from.
      :type names_from: str
      :param values_from: Column to get the new column values from
      :type values_from: str
      :param id_cols: A set of columns that uniquely identifies each observation.
                      Defaults to all columns in the data table except for the columns specified in
                      `names_from` and `values_from`.
      :type id_cols: str, list
      :param values_fn: Function for how multiple entries per group should be dealt with.
                        Any of 'first', 'count', 'sum', 'max', 'min', 'mean', 'median', 'last'
      :type values_fn: str
      :param values_fill: If values are missing/null, what value should be filled in.
                          Can use: "backward", "forward", "mean", "min", "max", "zero", "one"
      :type values_fill: str

      .. rubric:: Examples

      >>> df = tp.Tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]})
      >>> df.pivot_wider(names_from = 'variable', values_from = 'value')


   .. py:method:: pull(var=None)

      Extract a column as a series

      :param var: Name of the column to extract. Defaults to the last column.
      :type var: str

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3))
      >>> df.pull('a')


   .. py:method:: relocate(*args, before=None, after=None)

      Move a column or columns to a new position

      :param \*args: Columns to move
      :type \*args: str, Expr

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
      >>> df.relocate('a', before = 'c')
      >>> df.relocate('b', after = 'c')


   .. py:method:: rename(*args, **kwargs)

      Rename columns

      :param \*args: Dictionary mapping of new names
      :type \*args: dict
      :param \*\*kwargs: key-value pair of new name from old name
      :type \*\*kwargs: str

      .. rubric:: Examples

      >>> df = tp.Tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']})
      >>> df.rename(new_x = 'x') # dplyr interface
      >>> df.rename({'x': 'new_x'}) # pandas interface


   .. py:method:: replace_null(replace=None)

      Replace null values

      :param replace: Dictionary of column/replacement pairs
      :type replace: dict

      .. rubric:: Examples

      >>> df = tp.Tibble(x = [0, None], y = [None, None])
      >>> df.replace_null(dict(x = 1, y = 2))


   .. py:method:: separate(sep_col, into, sep='_', remove=True)

      Separate a character column into multiple columns

      :param sep_col: Column to split into multiple columns
      :type sep_col: str
      :param into: List of new column names
      :type into: list
      :param sep: Separator to split on. Default to '_'
      :type sep: str
      :param remove: If True removes the input column from the output data frame
      :type remove: bool

      .. rubric:: Examples

      >>> df = tp.Tibble(x = ['a_a', 'b_b', 'c_c'])
      >>> df.separate('x', into = ['left', 'right'])


   .. py:method:: set_names(nm=None)

      Change the column names of the data frame

      :param nm: A list of new names for the data frame
      :type nm: list

      .. rubric:: Examples

      >>> df = tp.Tibble(x = range(3), y = range(3))
      >>> df.set_names(['a', 'b'])


   .. py:method:: select(*args)

      Select or drop columns

      :param \*args: Columns to select
      :type \*args: str, Expr

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
      >>> df.select('a', 'b')
      >>> df.select(col('a'), col('b'))


   .. py:method:: slice(*args, by=None)

      Grab rows from a data frame

      :param \*args: Rows to grab
      :type \*args: int, list
      :param by: Columns to group by
      :type by: str, list

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
      >>> df.slice(0, 1)
      >>> df.slice(0, by = 'c')


   .. py:method:: slice_head(n=5, *, by=None)

      Grab top rows from a data frame

      :param n: Number of rows to grab
      :type n: int
      :param by: Columns to group by
      :type by: str, list

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
      >>> df.slice_head(2)
      >>> df.slice_head(1, by = 'c')


   .. py:method:: slice_tail(n=5, *, by=None)

      Grab bottom rows from a data frame

      :param n: Number of rows to grab
      :type n: int
      :param by: Columns to group by
      :type by: str, list

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
      >>> df.slice_tail(2)
      >>> df.slice_tail(1, by = 'c')


   .. py:method:: summarise(*args, by=None, **kwargs)

      Alias for `.summarize()`


   .. py:method:: summarize(*args, by=None, **kwargs)

      Aggregate data with summary statistics

      :param \*args: Column expressions to add or modify
      :type \*args: Expr
      :param by: Columns to group by
      :type by: str, list
      :param \*\*kwargs: Column expressions to add or modify
      :type \*\*kwargs: Expr

      .. rubric:: Examples

      >>> df = tp.Tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']})
      >>> df.summarize(avg_a = tp.mean(col('a')))
      >>> df.summarize(avg_a = tp.mean(col('a')),
      ...              by = 'c')
      >>> df.summarize(avg_a = tp.mean(col('a')),
      ...              max_b = tp.max(col('b')))


   .. py:method:: tail(n=5, *, by=None)

      Alias for `.slice_tail()`


   .. py:method:: to_dict(as_series=True)

      Aggregate data with summary statistics

      :param as_series: If True - returns the dict values as Series
                        If False - returns the dict values as lists
      :type as_series: bool

      .. rubric:: Examples

      >>> df.to_dict()
      >>> df.to_dict(as_series = False)


   .. py:method:: to_pandas()

      Convert to a pandas DataFrame

      .. rubric:: Examples

      >>> df.to_pandas()


   .. py:method:: to_polars()

      Convert to a polars DataFrame

      .. rubric:: Examples

      >>> df.to_polars()


   .. py:method:: unite(col='_united', unite_cols=[], sep='_', remove=True)

      Unite multiple columns by pasting strings together

      :param col: Name of the new column
      :type col: str
      :param unite_cols: List of columns to unite
      :type unite_cols: list
      :param sep: Separator to use between values
      :type sep: str
      :param remove: If True removes input columns from the data frame
      :type remove: bool

      .. rubric:: Examples

      >>> df = tp.Tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3))
      >>> df.unite("united_col", unite_cols = ["a", "b"])


   .. py:method:: write_csv(file=None, has_headers=True, sep=',')

      Write a data frame to a csv


   .. py:method:: write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs)

      Write a data frame to a parquet


.. py:function:: desc(x)

   Mark a column to order in descending


.. py:function:: from_polars(df)

   Convert from polars DataFrame to Tibble

   :param df: pl.DataFrame to convert to a Tibble
   :type df: DataFrame

   .. rubric:: Examples

   >>> tp.from_polars(df)


.. py:function:: from_pandas(df)

   Convert from pandas DataFrame to Tibble

   :param df: pd.DataFrame to convert to a Tibble
   :type df: DataFrame

   .. rubric:: Examples

   >>> tp.from_pandas(df)