How to see all the databases and Tables in Databricks

2024/9/19 9:34:34

i want to list all the tables in every database in Azure Databricks.

so i want the output to look somewhat like this:

Database | Table_name
Database1 | Table_1
Database1 | Table_2
Database1 | Table_3
Database2 | Table_1
etc..

This is what i have at the moment:

from pyspark.sql.types import *DatabaseDF = spark.sql(f"show databases")
df = spark.sql(f"show Tables FROM {DatabaseDF}")
#df = df.select("databaseName")
#list = [x["databaseName"] for x in df.collect()]print(DatabaseDF)
display(DatabaseDF)df = spark.sql(f"show Tables FROM {schemaName}")
df = df.select("TableName")
list = [x["TableName"] for x in df.collect()]## Iterate through list of schema
for x in list:
###  INPUT Required: Change for target tabletempTable = xdf2 = spark.sql(f"SELECT COUNT(*) FROM {schemaName}.{tempTable}").collect()for x in df2:rowCount = x[0]if rowCount == 0:print(schemaName + "." + tempTable + " has 0 rows")

but i'm not quite getting the results.

Answer

There is a catalog property to spark session, probably what you are looking for :

spark.catalog.listDatabases()
spark.catalog.listTables("database_name")

listDatabases returns the list of database you have.
listTables returns for a certain database name, the list of tables.

You can do something like this for example :

[(table.database, table.name)for database in spark.catalog.listDatabases()for table in spark.catalog.listTables(database.name)
]

to get the list of database and tables.


EDIT: (thx @Alex Ott) even if this solution works fine, it is quite slow. Using directly some sql commands like show databases or show tables in ... should do the work faster.

https://en.xdnf.cn/q/72404.html

Related Q&A

How to get transparent background in window with PyGTK and PyCairo?

Ive been trying really hard to create a window with no decoration and a transparent background using PyGTK. I would then draw the content of the window with Cairo. But I cant get it to work.Ive tried a…

concurrent.futures.ThreadPoolExecutor doesnt print errors

I am trying to use concurrent.futures.ThreadPoolExecutor module to run a class method in parallel, the simplified version of my code is pretty much the following: class TestClass:def __init__(self, sec…

How to write a Dictionary to Excel in Python

I have the following dictionary in python that represents a From - To Distance Matrix.graph = {A:{A:0,B:6,C:INF,D:6,E:7},B:{A:INF,B:0,C:5,D:INF,E:INF},C:{A:INF,B:INF,C:0,D:9,E:3},D:{A:INF,B:INF,C:9,D:0…

How can I check pooled connections in SQLAlchemy before handing them off to my application code?

We have a slightly unreliable database server, for various reasons, and as a consequence sometimes the database connections used by my application vanish out from under it. The connections are SQLAlch…

pandas list of dictionary to separate columns

I have a data set like below:name status number message matt active 12345 [job: , money: none, wife: none] james active 23456 [group: band, wife: yes, money: 10000] adam in…

Where is console input history stored on Python for Windows?

Good afternoon,The QuestionIs there a particular spot that the entries are stored, or is it just a local set of stored variables, for the windows version of Python?The ContextI am curious about where …

Matplotlib Animation: how to dynamically extend x limits?

I have a simple animation plot like so: import numpy as np from matplotlib import pyplot as plt from matplotlib import animation# First set up the figure, the axis, and the plot element we want to anim…

How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.Anyone know how I can overcome this issue only using standard…

Installing lxml in virtualenv for windows

Ive recently started using virtualenv, and would like to install lxml in this isolated environment.Normally I would use the windows binary installer, but I want to use lxml in this virtualenv (not glob…

Saving a model in Django gives me Warning: Field id doesnt have a default value

I have a very basic model in Django:class Case(models.Model):name = models.CharField(max_length=255)created_at = models.DateTimeField(default=datetime.now)updated_at = models.DateTimeField(default=date…