How to migrate from MySQL to TiDB

ChandraSekhar Saripaka
3 min readMar 7, 2023

Considerations:

  • Database size exceeds in exponentially.
  • 99th percentile of the response time is in the order of ms.
  • Scale went 10 times more on all the parameters mentioned above.
  • Real time analytics information for a reporting dashboard.

Introduction:

With reference to the above stated problem, we recommend to use TiDB in a NewSQL system mode with Hybrid Transactional and Analytical capabilities, with a containerized approach. TiDB provides both OLTP and OLAP capabilities in the HTAP mode. This is possible by having a row and columnar stores in TiKV and TiFlash respectively.

Approach:

The first step is to estimate the current capacity of MySQL and gather certain metrics from the production setup, to understand volume of data, QPS, TPS, percentage of growth, projected growth.

The second step would be to design a scalable data architecture. Then plan the initial capacity and sizing required for the TiDB cluster. We propose to setup TiDB on kubernetes cluster.

Overall Architecture:

Block Diagram:

The below block diagram is a high level overview , showcases various components/ practices/ methodologies that are required for building a manageable, secure, scalable, high performance, efficient, elastic, highly available, fault tolerant and recoverable architecture for TiDB on kubernetes.

TiDB K8s Based Architecture:

Custom Resources:

Storage Architecture:

Data Migration:

TiDB uses a data migration module as below.There is a need to set up DM Cluster for migrating from MySQL to TiDB.​​

Data Ingestion:

Spark ecosystem for data processing on TiDB:

  • TiSpark: Data analysis and ETLs
  • TiKV: Data retrieval
  • Scheduling system: Report generation

Data Analysis:

Using both TiFlash and TiSpark, which enables HTAP.

TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer the complex OLAP queries. It takes advantages of both the Spark platform and the distributed TiKV cluster and seamlessly glues to TiDB, the distributed OLTP database, to provide a Hybrid Transactional/Analytical Processing (HTAP) solution to serve as a one-stop solution for both online transactions and analysis.

TiFlash is another tool that enables HTAP. Both TiFlash and TiSpark allow the use of multiple hosts to execute OLAP queries on OLTP data. TiFlash stores data in a columnar format, which allows more efficient analytical queries. TiFlash and TiSpark can be used together.

Data Replication:

TiCDC

User scenario

This tool is implemented by pulling TiKV change logs. It can restore cluster data to a consistent state with any upstream TSO, and support other systems to subscribe to data changes.

Upstream : TiDB

Downstream : TiDB, MySQL, Apache Pulsar, Kafka, Confluent

Advantages : Provide TiCDC Open Protocol

Limitation

TiCDC only replicates tables that have at least one valid index. The following scenarios are not supported:

  • The TiKV cluster that uses RawKV alone.
  • The DDL operation CREATE SEQUENCE and the SEQUENCE function in TiDB.

TiDB Binlog

User scenario

Incremental replication between TiDB clusters, such as using one TiDB cluster as the secondary cluster of another TiDB cluster

Upstream: TiDB

Downstream (or the output file): TiDB, MySQL, Kafka, incremental backup files

Advantages

Support real-time backup and restore. Back up TiDB cluster data to be restored for disaster recovery

Limitation

TiDB Binlog is incompatible with some TiDB versions. It is recommended that you use TiCDC.

Backup and Recovery:

--

--

ChandraSekhar Saripaka

Software Engineer with expertise in Bigdata and Distributed Systems.