

# Sqoop2开发指南

Sqoop1一般使用shell脚本方式进行操作，Sqoop2则引入了Sqoop
server，对Connector实现了集中的管理。其访问方式也变得多样化，可以通过
REST API、JAVA API、WEB UI以及CLI控制台方式进行访问，与Sqoop的架构和使用方式完全不同。

如果您创建集群勾选了Sqoop2，Sqoop2将被安装在uhadoop-\*\*\*\*\*\*-master2节点上。

## 1. 从MySQL导出到HDFS

### 1.1 创建mysql数据

Mysql数据源以UDB为例

连接Mysql,并创建数据库sqoop和表sqoop

```
[root@uhadoop-******-master2 ~]# mysql -h10.13.124.35 -uroot -p'密码' sqoop

mysql> create database sqoop;
Query OK, 1 row affected (0.00 sec)

mysql> use sqoop
Database changed
mysql> CREATE TABLE `sqoop` (
    ->   `id` varchar(64) DEFAULT NULL,
    ->   `value` varchar(64) NOT NULL DEFAULT '',
    ->     PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into sqoop values ('1', 'hha'),('2', 'zhang'),('3','hehe');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0
```

### 1.2 启动Sqoop2 CLI

uhadoop-\*\*\*\*\*\*-master2节点上执行

```
/home/hadoop/sqoop2/bin/sqoop.sh client
```

help可查到基本使用命令

```
sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:
  exit    (\x  ) Exit the shell
  history (\H  ) Display, manage and recall edit-line history
  help    (\h  ) Display this help message
  set     (\st ) Configure various client options and settings
  show    (\sh ) Display various objects and configuration options
  create  (\cr ) Create new object in Sqoop repository
  delete  (\d  ) Delete existing object in Sqoop repository
  update  (\up ) Update objects in Sqoop repository
  clone   (\cl ) Create new object based on existing one
  start   (\sta) Start job
  stop    (\stp) Stop job
  status  (\stu) Display status of a job
  enable  (\en ) Enable object in Sqoop repository
  disable (\di ) Disable object in Sqoop repository

For help on a specific command type: help command
```

### 1.3 查看connector

```
sqoop:000> show connector
```

<!-- image-todo -->

### 1.4 创建mysql-link

```
sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: mysql-link

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://10.13.124.35/sqoop
Username: root
Password: **********
JDBC Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry#
New link was successfully created with validation status OK and persistent id 1
```

### 1.5 创建hdfs-link

```
sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-link

Link configuration

HDFS URI: hdfs://Ucluster/
New link was successfully created with validation status OK and persistent id 2
```

### 1.6 显示link

```
sqoop:000> show link
```

<!-- image-todo -->

### 1.7 创建job

```
sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: job-mysql-to-hdfs

From database configuration

Schema name: sqoop
Table name: sqoop
Table SQL statement:
Table column names:
Partition column name: id
Null value allowed for the partition column:
Boundary query:

ToJob configuration

Override null value:
Null value:
Output format:
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format:
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: /tmp/sqoop

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK  and persistent id 1
```

创建完job如下

```
sqoop:000> show job
```

<!-- image-todo -->

### 1.8 启动job

```
sqoop:000> start job -j 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-12-22 14:40:10 CST
Lastly updated by: root
External ID: job_1481968387780_0011
    http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0011/
2016-12-22 14:40:10 CST: BOOTING  - Progress is not available
```

## 2. 从HDFS导出到MySQL

这里利用MySQL导入HDFS中创建的hdsf-link和mysql-link

### 2.1 创建job

```
sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: job-hdfs-to-mysql

From Job configuration

Input directory: /tmp/sqoop-input
Override null value:
Null value:

To database configuration

Schema name: sqoop
Table name: sqoop
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK  and persistent id 2
```

添加完job如下

<!-- image-todo -->

### 2.2 执行job

```
sqoop:000> start job -j 2
Submission details
Job ID: 2
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-12-22 15:38:00 CST
Lastly updated by: root
External ID: job_1481968387780_0012
    http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0012/
2016-12-22 15:38:00 CST: BOOTING  - Progress is not available
```

### 2.3 查看MySQL数据

```
mysql> select * from sqoop;
+----+--------+
| id | value  |
+----+--------+
| 1  | hha    |
| 2  | zhang  |
| 3  | hehe   |
| 4  | zhen   |
| 5  | u      |
+----+--------+
6 rows in set (0.00 sec)
```
