pdmongo

pdmongo.read_mongo(collection: str, query: List[Dict[str, Any]], db: Union[str, pymongo.database.Database], index_col: Union[str, List[str], None] = None, extra: Optional[Dict[str, Any]] = None, chunksize: Optional[int] = None) → pandas.core.frame.DataFrame[source]

Read MongoDB query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default integer index will be used.

Parameters:
  • collection (str) – Mongo collection to select for querying
  • query (list) – Must be an aggregate query. The input will be passed to pymongo .aggregate
  • db (pymongo.database.Database or database string URI) – The database to use
  • index_col (str or list of str, optional, default: None) – Column(s) to set as index(MultiIndex).
  • extra (dict, optional, default: None) – List of parameters to pass to aggregate method.
  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of docs to include in each chunk.
Returns:

Dataframe

pdmongo.to_mongo(frame: pandas.core.frame.DataFrame, name: str, db: Union[str, pymongo.database.Database], if_exists: Optional[str] = 'fail', index: Optional[bool] = True, index_label: Union[str, Sequence[str], None] = None, chunksize: Optional[int] = None) → Union[List[pymongo.results.InsertManyResult], pymongo.results.InsertManyResult][source]

Write records stored in a DataFrame to a MongoDB collection.

Parameters:
  • frame (DataFrame, Series)
  • name (str) – Name of collection.
  • db (pymongo.database.Database or database string URI) – The database to write to
  • if_exists ({‘fail’, ‘replace’, ‘append’}, default ‘fail’) –
    • fail: If table exists, do nothing.
    • replace: If table exists, drop it, recreate it, and insert data.
    • append: If table exists, insert data. Create if does not exist.
  • index (boolean, default True) – Write DataFrame index as a column.
  • index_label (str or sequence, optional) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
  • chunksize (int, optional) – Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.