Use Apache Airflow with Teradata Vantage
Overview
This document provides detailed instructions and guidance on using Apache Airflow with Teradata Vantage to create a table using the TeradataOperator
. It covers the setup, configuration, and execution steps required to create a table within Teradata Vantage. The TeradataOperator
is specifically designed for executing queries on Teradata databases.
Use The Windows Subsystem for Linux (WSL) on Windows to try this quickstart example.
|
Prerequisites
-
Access to a Teradata Vantage instance, version 17.10 or higher.
If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com. -
Python 3.8, 3.9, 3.10 or 3.11 and python3-env, python3-pip installed.
sudo apt install -y python3-venv python3-pip
sudo apt install -y python3-venv python3-pip
brew install python
Refer Installation Guide if you face any issues.
Install Apache Airflow
-
Create a new python environment to manage airflow and its dependencies. Activate the environment.
python3 -m venv airflow_env source airflow_env/bin/activate AIRFLOW_VERSION=2.9.3 PYTHON_VERSION="$(python3 --version | cut -d " " -f 2 | cut -d "." -f 1-2)" CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
-
Install the Apache Airflow Teradata provider package.
pip install "apache-airflow-providers-teradata"
-
Set the AIRFLOW_HOME environment variable.
export AIRFLOW_HOME=~/airflow
Configure Apache Airflow
-
Switch to the virtual environment where Apache Airflow was installed at Install Apache Airflow
source airflow_env/bin/activate
-
Configure the listed environment variables to activate the test connection button, preventing the loading of sample DAGs and default connections in the Airflow UI.
export AIRFLOW__CORE__TEST_CONNECTION=Enabled export AIRFLOW__CORE__LOAD_EXAMPLES=false export AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=false
Start the Apache Airflow web server
-
Run airflow’s web server
airflow standalone
-
Access the airflow UI. Visit https://localhost:8080 in the browser and log in with the admin account details shown in the terminal.
Define the Apache Airflow connection to Vantage
The Teradata connection in airflow can be defined in the following ways:
-
Using the Airflow Web UI
-
Using Environment Variables
Define a Teradata connection in Apache Airflow Web UI
-
Open the Admin → Connections section of the UI. Click the Create link to create a new connection.
-
Fill in input details in New Connection Page.
-
Connection Id: Unique ID of Teradata Connection.
-
Connection Type: Type of the system. Select Teradata.
-
Database Server URL (required): Teradata instance hostname to connect to.
-
Database (optional): Specify the name of the database to connect to
-
Login (required): Specify the user name to connect.
-
Password (required): Specify the password to connect.
-
Click on Test and Save.
-
Refer Teradata Connection for more details.
Define a Teradata connection as Environment Variable
The Teradata connection can be defined as environment variables in one of the following formats.
-
JSON format
-
URI format
The naming convention for environment variables is AIRFLOW_CONN_{CONN_ID}
with all uppercase letters (note the single underscore surroundingCONN
). For example, if your connection ID isteradata_default
, the environment variable should be namedAIRFLOW_CONN_TERADATA_DEFAULT
.
JSON format example
export AIRFLOW_CONN_TERADATA_DEFAULT='{
"conn_type": "teradata",
"login": "teradata_user",
"password": "my-password",
"host": "my-host",
"schema": "my-schema",
"extra": {
"tmode": "TERA",
"sslmode": "verify-ca"
}
}'
URI format example
export AIRFLOW_CONN_TERADATA_DEFAULT='teradata://teradata_user:my-password@my-host/my-schema?tmode=TERA&sslmode=verify-ca'
Refer Teradata Hook for detailed information on Teradata Connection in Airflow.
Define a DAG in Airflow
-
In airflow, DAGs are defined as Python code.
-
Create a DAG as a python file like sample.py under DAG_FOLDER - $AIRFLOW_HOME/dags directory.
from datetime import datetime from airflow import DAG from airflow.providers.teradata.operators.teradata import TeradataOperator CONN_ID = "teradata_default" with DAG( dag_id="example_teradata_operator", max_active_runs=1, max_active_tasks=3, catchup=False, start_date=datetime(2023, 1, 1), ) as dag: create = TeradataOperator( task_id="table_create", teradata_conn_id=CONN_ID, sql=""" CREATE TABLE my_users, FALLBACK ( user_id decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY ( START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 NO CYCLE), user_name VARCHAR(30) ) PRIMARY INDEX (user_id); """, )
Load DAG
When the DAG file is copied to $AIRFLOW_HOME/dags, Apache Airflow displays the DAG in the UI under the DAGs section. It will take 2 to 3 minutes to load the DAG in the Apache Airflow UI.
Run DAG
DAGs can be executed in one of two ways:
-
Manually or via the API: You can trigger DAGs manually or through API calls.
-
On a scheduled basis: DAGs can be set to run according to a schedule defined within their configuration.
The example_teradata_operator
DAG is configured to be triggered manually. To define a schedule, you can use any valid Crontab schedule value for the schedule argument.
with DAG(
dag_id="example_teradata_operator",
max_active_runs=1,
max_active_tasks=3,
catchup=False,
schedule="0 0 * * *"
start_date=datetime(2023, 1, 1),
) as dag:
Run the DAG manually as shown in the image below.
Summary
This tutorial demonstrated how to use Apache Airflow along with the Airflow Teradata provider to interact with a Teradata Vantage instance. The example DAG provided shows how to create the my_users
table in the Teradata Vantage instance specified in the Connection UI.
Further reading
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members. |