æè¿ PingCAP ã«å ¥ç€ŸããŸããããèªåã®ä»äºã«ã€ããŠå šãç解ããŠããªããããããŒã¿ããŒã¹ãšã¹ãã¬ãŒãžã«é¢é£ããç¥èãè£å®ããå¿ èŠããããŸãããããããç§ã«ãšã£ãŠã¯å šãããããªãããšã¯æªãããšã§ã¯ãªããšæããŸãããŸã第äžã«ãæé·ã®äœå°ãéåžžã«å€§ããããšã§ãã第äºã«ãåé¡ãèŠãèŠç¹ãç°ãªããããæ°ããã¢ã€ãã¢ãçãŸããå¯èœæ§ããããŸãã
ããŠãæ¬é¡ã«å ¥ããŸããããä»æ¥ã¯ãããŸã§ã®æµ ãæè¡çãªãèç©ãã掻ãããŠãLSMð²ã«ã€ããŠç°¡åã«è©±ããŠã¿ãŸãã
ãã€ãããèŠããŠããŸããããããäŒç€Ÿã Big Table ãäœããããã«é¢é£ããè«æã䜿çšããŸããããã®è«æã§ã¯ãBð²ã䜿çšããã«æžã蟌ã¿å¹çãèªã¿åãå¹çãããé«ããšããæ°ããã¢ãããŒããææ¡ãããŸããããã®ããŒã¿æ§é 㯠MySQL ãªã©ã®äŒçµ±çãªããŒã¿ããŒã¹ãåé§ããŠããŸããããæ®å¿µãªãã Big Table ã¯ã¯ããŒãºããœãŒã¹ã§ããããããããã®åŸããã«ããªãªãžãã«ã®èè ããã®ã¢ã«ãŽãªãºã ã«åºã¥ããŠäœæããã·ã³ã°ã«ããŒãã®ãªãŒãã³ãœãŒã¹ããŒã¿ããŒã¹LevelDBãå ¬éããããšã³ã·ã¹ãã ãäžæ°ã«æ¡å€§ããŸãããããŸããŸãªåæ£ã¹ãã¬ãŒãžã·ã¹ãã ãè¿œéããŸãããFaceBook ãRocksDBïŒç§ãçŸåšæ åœããŠãããããžã§ã¯ãã§ãç§ã®å®åã¯æ¬åœã«äœãã§ãïŒãäœæããããšã§ãæ°äžä»£ã®åæ£ããŒã¿ããŒã¹ã®ã¹ãã¬ãŒãžãšã³ãžã³ã確ç«ãããŸãããç§ãã¡ã® TiKV ããããããŒã¹ã«ããŠããŸããããŠãä»æ¥ã¯ãã®ããŒã¿æ§é ãå ·äœçã«äœãããŠããã®ããèŠãŠã¿ãŸãããã
ã§ã¯ããã®ð²ã¯äœãããŠããã®ã#
ãŸãããã¹ãŠã®æè¡ã®èªçã«ã¯ãã®æ代ã®å¶çŽããããŸããçŸåšã®æ代ã§ã¯ããµãŒããŒã§ã¯é«ã¹ã«ãŒãããã®ã©ã³ãã èªã¿æžãã® SSD ã䜿ãããšã¯ã§ãããããã©ãŒãã³ã¹ã«åœ±é¿ãäžããã«é«ã¹ã«ãŒãããã®ã©ã³ãã èªã¿æžããå®çŸããããã®ã¹ãã¬ãŒãžã¡ãã£ã¢ã¯ååšããŸããããããã£ãŠããã®æ代ã§ã¯å€§éã®é 次èªã¿æžãã®å¹çãã©ã³ãã èªã¿æžãã®å¹çããã 20 å以äžé«ãïŒå®éã®ããŒã¿ã§ã¯ãªããããããŸãããïŒããšãèæ ®ããŠãããŒã¿ã®æžã蟌ã¿é床ãã¹ãã¬ãŒãžã¡ãã£ã¢ã«å éããããã«ãé 次èªã¿æžãã䜿çšããå¿ èŠããããŸãã
ããã§ãLSMð²ãèªçããŸããïŒ#
LSMð²ã®æ£åŒå称ã¯ãLog-Structured Merge Treeãã§ãã
èŠããã«ãããŒã¿ãæžã蟌ãéã«ãã°ã®ããã«æžã蟌ãããšã§ãé 次æžã蟌ã¿ãä¿èšŒããããšããããšã§ãã
ããããããã«ã¯å€§ããªåé¡ããããŸãããã°ã¯ãªã³ã¯ãªã¹ãã®ãããªãã®ã§ãããã€ã³ããã¯ã¹æ§é ããªãå Žåããªã³ã¯ãªã¹ãã®ã©ã³ãã ã¢ã¯ã»ã¹ã®èšç®é㯠O (N) ã«ãªããŸãããããã©ãããã°ããã®ã§ããããïŒãã®åé¡ã¯åŸã§èªã¿åãæã«è§£æ±ºãããŸãã
åè¿°ãã 404 瀟㮠LevelDB ã§ã¯ãLSMð²ã¯ SSTable ãšçµã¿åãããŠäœ¿çšãããŸãã
SSTable ã®æ£åŒå称ã¯ãSorted-String Tableãã§ãã
404 瀟ã¯æ¬¡ã®ããã«èª¬æããŠããŸãã
SSTableã¯ãããŒããå€ãžã®æ°žç¶çã§é åºä»ããããäžå€ã®ããããæäŸããŸããããŒãšå€ã¯ããããä»»æã®ãã€ãæååã§ãã
ãããã£ãŠãLevelDB ã§ã¯ LSMð²ã¯ãæ°žç¶çã§é åºä»ããããäžå€ã®ããŒãšããŒã¿ã®ãªãã»ãããæ ŒçŽããããŒããªã¥ãŒã¹ãã¢ãšçµã¿åãããŠäœ¿çšãããŸãããªãã»ãããš mmap ãçµã¿åãããããšã§ãLSMð²ã«æ ŒçŽãããæ å ±ã«ç¬æã«ã¢ã¯ã»ã¹ã§ããŸãããããã§ãããïŒ
ãããã£ãŠãLSMð²ã®æžã蟌ã¿ããã»ã¹ã¯æ¬¡ã®ããã«ãªããŸãã
- 誰ããããŒã¿ãæžã蟌ããšäŒããŠããããããŸããã°ãèšé²ããŠæžã蟌ã¿æã«è£ããã®ãé²ããŸãã
- ãã®ããŒã¿ãã¡ã¢ãªå ã® SSTable ã«æžã蟌ã¿ãŸãïŒããŒã¿ãåé€ããå ŽåããããŸããããã°ãèšé²ããŠããã®ã§ã圌ã¯æ»ãã ããšæžã蟌ãããšã«ãªããŸãïŒãåæã«ãã¡ã¢ãªå ã®ããŒãã«ã«ãæžã蟌ã¿ãŸãã
- ã¡ã¢ãªå ã®ããŒãã«ãäžå®ã®ãµã€ãºã«ãªããšãæ°ããããŒãã«ãäœæããå€ãããŒãã«ã¯æžã蟌ã¿äžå¯ã«ãªããŸãã
- å€ãããŒãã«ã¯é©åãªã¿ã€ãã³ã°ã§ãã£ã¹ã¯ã«æžã蟌ãŸããŸãã
- ãã£ã¹ã¯äžã®å°ããªããŒãã«ãå¢ããã«ã€ããŠãé©åãªã¿ã€ãã³ã°ïŒæ·±å€ãªã©ïŒã§ããŒãžãè¡ããã1 ã€ã®å€§ããªããŒãã«ã«çµ±åãããŸããããŒãžã®éçšã§ã¯ãã圌ã¯æ»ãã ããšããŒã¯ãããããŒã¿ãåé€ãããŸãããããã®ããŒãã«ã¯ãã¹ãŠé åºä»ããããŠãããããããŒãžã®ããã»ã¹ã¯åçŽãªãªãã©ã€ã³ããŒãžãœãŒãã§ãã
ã§ã¯ãèªã¿åãã®ããã»ã¹ãç°¡åã§ãã
- ã¡ã¢ãªå ã®ããŒãã«ããã§ãã¯ããŸãã- ãã°ãã
- ååšããªãå Žå - æè¿ã®å°ããªããŒãã«ããã§ãã¯ããŸãã
- ãŸã ååšããªãå Žå - æè¿ã®å€§ããªããŒãã«ããã§ãã¯ããŸãã
- ããã§ãååšããªãå Žå - ãããç¹°ãè¿ããŸãã
ã§ã¯âŠ ãªãèªã¿åãã O (N) ã®ææªã®å Žåã«èŠããã®ã§ããããïŒ
ããã§ã¯ãä»æ¥ã®éèŠãªãã€ã³ãã«å ¥ããŸããããããŒãããã¯ãåãåºããŠãã ãããããã¹ãã¯ãããŸããã
èªã¿åãã®æé©å#
ðŽâ«ð²#
äœè 㯠20 è¡ã® Haskell ã§èµ€é»æšãæžãããšèšã£ãŠããŸãããæžããªããšããããšã¯ããªããäžæã ããã§ãããªã®ã§ã説æããŸããã
ããã·ã¥ããŒãã«#
// ããã¯ååž°ã§ã
GOTO 0x00;
ã¹ããããªã¹ã#
ããŠãããã§ãšãããããŸãðãã»ãšãã©ã®ã¹ãã¬ãŒãžãšã³ãžã³ã§äœ¿çšãããŠããæé©åãã¹ããããªã¹ãã«å°çããŸããã
level 2: [11] [21] [41] [70]
level 1: [11] [14] [21] [31] [41] [58] [70]
level 0: [11] [12] [14] [15] [21] [24] [31] [39] [41] [49] [58] [61] [70]
ã¹ããããªã¹ã㯠N åã®ã¬ãã«ãæã€ããšãã§ããŸãã
LSMð²ã¯é åºä»ããããããŒã¿æ§é ãªã®ã§ãçè«çã«ã¯ O (log N) ã®èªã¿åãé床ã«è¿ã¥ãããšãã§ããŸãã
ã©ã®ããã«æ§ç¯ããã... ããããç§ã®ããã°ãèªãã§ããåäŸãã¡ã¯ãã§ã«ã»ãŒæžããŠãããšæããŸãã®ã§ã倧ãŸãã«èª¬æããŸããã©ããããã®ã¬ãã«ã®ã¹ããããªã¹ããçæãããšãã¹ããŒã¹ã®æµªè²»ãšæéã®å¢å ãçžå¯Ÿçã«ãã©ã³ã¹ãåãããã説æããŸãã
ãŸããçŸå®ã®ããŒã¿ããŒã¹ã§ã¯ãåŠçããããŒã¿éãéåžžã«å€§ããããã2 ã€ã®ããŒããšã« 1 ã€ã®ã¬ãã«ãäœæãããªã©ãéåžžã«åçŽãªæ¹æ³ã§ã¯æ§ç¯ã§ããŸãããããã«ããã倧éã®ã¹ããŒã¹ãç¡é§ã«ãªããŸããå®éã«ã¯ãéåžžã«å·§åŠãªç¢ºçæŠç¥ã䜿çšããŸãã
ã³ã€ã³ãã¹
ã³ã€ã³ã®è¡šè£ã¯ã¡ããã© 50% ã®ç¢ºçãæã€ãããå³å¯ã«ã¹ããããªã¹ãã® log ãšäžèŽããŸãã
int ãã®ããŒãã¯äœçªç®ã®ã¬ãã«ã«ãã£ãã·ã¥ãããã®ã§ãã(){
var level = 0;
while ( true ){
var ã©ã³ãã æ° = random.nextInt();
if ( ã©ã³ãã æ° % 2 == 0 ) {
level += 1;
} else {
break;
}
}
return level;
}
ããã«ãããå ¬å¹³ã§ã¯ãªããããããŸããããããã€ãã®ã¢ããªã«äººããŒããå€ãã¢ã¯ã»ã¹ããããã€ã³ããã¯ã¹ã§ã¯ãªãå ŽåããããŸããã確çã«åºã¥ããŠæ¯èŒçå®å®ããŠããŸãã
次åäºåïŒAgateDB ã® SSD æé©åãš RocksDB ãå§åããæ¹æ³ã«ã€ããŠãBadger ãããžã§ã¯ãããåãã話ããããããããŸãããã楜ãã¿ã«ã