因子分析模块功能介绍

量化投资研究服务平台因子分析模块集成了FactorAnalyst类，针对alpha策略进行单个因子的表现分析。主要包含：

分析因子不同分位数下的平均收益表现。
分析因子在不同分位数下的收益分布。
分析因子在不同分位数下的股票累计平均收益分析。
分析最高分位数减去最低分位数所在的表现的差值，了解因子的有效性。
分析因子不同观测周期下的IC值。
针对不同的行业，分组进行分析，了解因子在不同行业的适用性。

1、功能详解

1.1. 首先，导入 FactorAnalyst 类并设置我们需要的分析的因子名称,创建一个FactorAnalyst对象。

[1]:

%matplotlib inline
from smartbeta.factor_analyst import FactorAnalyst
from data_provider.datafeed.universe import Universe

[2]:

from_dt = "20170101"
to_dt = "20170901"
factor_name = "roegrowth1"
fa = FactorAnalyst()

1.2. 调用load_data方法读取所要分析的因子数据以及行情数据。

[3]:

help(FactorAnalyst.load_data)

Help on function load_data in module smartbeta.factor_analyst:

load_data(self, factor_name, from_dt, to_dt, tickers=None, industry_group=[], sw_level=1, factor_parameters={})
    读取因子以及回报数据，转化成可供分析的格式。在调用分析方法前，必须先调用此方法读取数据。
    @param factor_name: string, 需要分析的因子名称
    @param from_dt: string, 分析开始的时间，格式'20170101'
    @param to_dt: string,分析结束的时间，格式'20170101'
    @param tickers: '指数'或个股数组，默认为None表示选取全A股票，如此项设为None并设置一个industry_group则分析全A中的此版块
    @param industry_group: array, 需要分析申万版块名称，如［‘申万银行’，'申万非银金融'］，该参数必须存在申万行业分类中， 与sw_level参数一致，
                           默认为空数组，表示全部tickers参数范围。
    @param sw_level: int, 表示industry_group所在的申万行业分类级数。
                     1 为申万一级分类， 2为二级分类，3为三级分类，

[4]:

fa.load_data(factor_name, from_dt, to_dt, industry_group=[])

loading roegrowth1 from cache, time cost 0.71s
加载行情数据

1.3. 调用analysis方法并设置分析的分位数分组以及观测周期。

观测周期表示的是通过因子选出的股票在不同时间后的表现。如：分析momentum_1m因子排名最靠前的10%的股票在未来1个月的表现。

这里10%表示分位数， 1个月为观测周期。

[5]:

help(FactorAnalyst.analysis)

Help on function analysis in module smartbeta.factor_analyst:

analysis(self, quantiles=5, forward_periods=(1, 5, 22), filter_zscore=10)
    调用此方法来设置需要分析的因子排名分组个数quantiles以及观测周期forward_periods
    @param quantiles:(int) 因子排名分组个数
    @param forward_periods:(tuple)观测周期
    @return: (DataFrame)因子分组收益表

[6]:

analy_data = fa.analysis(quantiles=10, forward_periods=(1,5,22,60), filter_zscore=3)

生成因子与预期收益表...

2、进行分析

2.1.分析因子不同分位数下的平均收益表现

[7]:

help(FactorAnalyst.analy_quantile_return)

Help on function analy_quantile_return in module smartbeta.factor_analyst:

analy_quantile_return(self, plot=True, by_group=False, demeaned=True)
    分析不同quantile在不同观测时段的收益分布，在调用该方法前，必须先调用analysis方法设置因子分组quantile。
    @param plot: bool 是否plot图形
    @param by_group: 是否对行业分组分析
    @param demeaned: 是否进行去平均数调整
    @return: (DataFrame)不同quantile不观测时段的平均收益表

[8]:

ret1 = fa.analy_quantile_return(by_group=False)

图表横坐标表示因子值的分位数，最左边的1表示的是该因子值最小的10%的个股，分别在1、5、22、60天之后的平均收益(即观测周期)，纵坐标表示平均收益。

2.2. 分析因子在不同分位数下的收益分布

小提琴线的横向宽度表示股票收益在该分组的个数，内部的3条横线分别表示收益在25%、50%以及75%分位数的股票数量分布。

[9]:

help(FactorAnalyst.analy_quantile_returns_violin)

Help on function analy_quantile_returns_violin in module smartbeta.factor_analyst:

analy_quantile_returns_violin(self, plot=True, demeaned=True)
    分析不同quantile在不同观测时段的收益分布,violin图,在调用该方法前，必须先调用analysis方法设置因子分组quantile
    @param by_group: 是否对行业分组分析
    @param demeaned: 是否进行去平均数调整
    @return: (DataFrame)分析不同quantile在不同观测的日均收益表

[10]:

ret2 = fa.analy_quantile_returns_violin()

2.3. 因子累计平均收益分析

[11]:

help(FactorAnalyst.analy_cumulative_returns_by_quantile)

Help on function analy_cumulative_returns_by_quantile in module smartbeta.factor_analyst:

analy_cumulative_returns_by_quantile(self, plot=True, forward_periods=[1], demeaned=True)
    不同quantile在同一个观测周期的收益分析(line chart),在调用该方法前，必须先调用analysis方法设置因子分组quantile
    @param forward_periods: 向后观测周期, 元素必须出现在
    @param by_group: 是否对行业分组分析
    @param demeaned: 是否进行去平均数调整
    @return: (DataFrame)不同quantile在同一个观测周期的日均收益表

[12]:

ret3 = fa.analy_cumulative_returns_by_quantile(forward_periods=[1,5,22,60])

2.4.因子在时间序列上收益的分布

图表表示因子在最百分位上的股票收益减去最低百分位上的收益的差值分布

[13]:

help(FactorAnalyst.analy_quantile_returns_spread_time_series)

Help on function analy_quantile_returns_spread_time_series in module smartbeta.factor_analyst:

analy_quantile_returns_spread_time_series(self, plot=True, lower_quant=1, upper_quant=5)
    不同quantile在单个观测周期的日收益分析－最高百分位收益减去最低百分位收益（spreading）, 在调用该方法前，必须先调用analysis方法设置因子分组quantile
    @param plot: 是否plot图表
    @param lower_quant: 需要分析的最小quantile
    @param lower_quant: 需要分析的最大quantile
    @return: (DataFrame)不同quantile的日收益数据表

[14]:

ret4 = fa.analy_quantile_returns_spread_time_series(lower_quant=1, upper_quant=10)

2.5. 不同观测周期下的IC时间序列分析

[15]:

help(FactorAnalyst.analy_ic)

Help on function analy_ic in module smartbeta.factor_analyst:

analy_ic(self, plot=True)
    IC分析, 在调用该方法前，必须先调用analysis方法设置因子分组quantile
    @param plot: 是否打印图表
    @return: (DataFrame)表示ic值的日数据表

[16]:

ic = fa.analy_ic()

[17]:

ic.head()

	1	5	22	60
date
2017-01-03	0.031261	0.107841	0.150557	0.131578
2017-01-04	0.042074	0.113026	0.143347	0.128513
2017-01-05	0.058258	0.083486	0.132134	0.124512
2017-01-06	0.064028	0.068197	0.128897	0.133586
2017-01-09	0.065005	0.064371	0.122341	0.127639

3、行业分析

通过FactorAnalyst类，可以针对某一个行业或几个行业单独进行分析，以了解因子在不同行业下的表现及其特性

3.1. 获取个股对应的行业信息

[18]:

help(Universe.get_sw_industry)

Help on function get_sw_industry in module data_provider.datafeed.universe:

get_sw_industry(self, trading_day)
    获取申万行业分类
    :param trading_day: 字符串格式的日期, datetime or int格式日期
    :return:  pd.DataFrame 包含列：
        securityId  swIndustryLv1   swIndustryLv2   swIndustryLv3
        swIndustrycodeLv1   swIndustrycodeLv2   swIndustrycodeLv3 time

[19]:

sw_industry = Universe().get_sw_industry(to_dt)
sw_industry.head()

	securityId	swIndustryLv1	swIndustryLv2	swIndustryLv3	swIndustrycodeLv1	swIndustrycodeLv2	swIndustrycodeLv3	time
0	600000.SH	申万银行	申万银行	申万银行	SW801780	SW801192	SW851911	20170901
1	600004.SH	申万交通运输	申万机场	申万机场	SW801170	SW801174	SW851751	20170901
2	600006.SH	申万汽车	申万汽车整车	申万商用载货车	SW801880	SW801094	SW850912	20170901
3	600007.SH	申万房地产	申万房地产开发	申万房地产开发	SW801180	SW801181	SW851811	20170901
4	600008.SH	申万公用事业	申万水务	申万水务	SW801160	SW801164	SW851621	20170901

3.2. 获取到所有的一级行业（swIndustryLv1唯一）

[20]:

all_industry = sw_industry.drop_duplicates('swIndustryLv1')
all_industry.head()

	securityId	swIndustryLv1	swIndustryLv2	swIndustryLv3	swIndustrycodeLv1	swIndustrycodeLv2	swIndustrycodeLv3	time
0	600000.SH	申万银行	申万银行	申万银行	SW801780	SW801192	SW851911	20170901
1	600004.SH	申万交通运输	申万机场	申万机场	SW801170	SW801174	SW851751	20170901
2	600006.SH	申万汽车	申万汽车整车	申万商用载货车	SW801880	SW801094	SW850912	20170901
3	600007.SH	申万房地产	申万房地产开发	申万房地产开发	SW801180	SW801181	SW851811	20170901
4	600008.SH	申万公用事业	申万水务	申万水务	SW801160	SW801164	SW851621	20170901

3.3. 获取某一个一级行业所对应的二级行业

[21]:

vehicle_l2 = sw_industry[sw_industry['swIndustryLv1']=='申万汽车'].drop_duplicates('swIndustryLv2')
vehicle_l2

	securityId	swIndustryLv1	swIndustryLv2	swIndustryLv3	swIndustrycodeLv1	swIndustrycodeLv2	swIndustrycodeLv3	time
2	600006.SH	申万汽车	申万汽车整车	申万商用载货车	SW801880	SW801094	SW850912	20170901
61	600081.SH	申万汽车	申万汽车零部件	申万汽车零部件	SW801880	SW801093	SW850921	20170901
77	600099.SH	申万汽车	申万其他交运设备	申万其他交运设备	SW801880	SW801881	SW858811	20170901
249	600297.SH	申万汽车	申万汽车服务	申万汽车服务	SW801880	SW801092	SW850941	20170901

4、行业因子分析

通过load_data的industry_group以及sw_level指定需要分析的申万行业名以及他们所在的行业分类级别。

当industry_group为多个行业时，分析时会将多个行业的个股统计在一起进行分析

[22]:

fa.load_data(
    factor_name, from_dt, to_dt, industry_group=["申万计算机", "申万化工", "申万轻工制造"], sw_level=1
)
analy_data = fa.analysis(quantiles=10, forward_periods=(1, 5, 22, 60), filter_zscore=3)

loading roegrowth1 from cache, time cost 0.41s
加载行情数据
生成因子与预期收益表...

其中, analy_quantile_return方法接受by_group参数，如果为True则分行业进行分析绘图，如为false，则将所有load_data时指定的多个行业的股票合并在一起进行分析

[23]:

ret_industry_by_group = fa.analy_quantile_return(by_group=True)

[24]:

ret_industry = fa.analy_quantile_return(by_group=False)

微信扫码分享本页