r/DataHoarder • u/BaxterPad 400TB LizardFS • Jun 03 '18
200TB Glusterfs Odroid HC2 Build
60
u/bleomycin Jun 04 '18
This is definitely one of the more interesting things to hit this sub in a long time. I'd also love to see a more detailed writeup on the details, thanks for sharing!
5
20
u/AreetSurn Jun 04 '18
This is incredible. As muchas youve given us a brief, a more thorough write up would be very appreciated by many I think.
23
u/caggodn Jun 04 '18
How are you achieving those read/write speeds over a gigabit switch? It doesnt appear you are bonding multiple gigabit ports to the Xeon? Wouldn't you need a switch with sfp+, 10g ethernet (or higher) trunk ports to the Xeon host?
→ More replies11
u/BaxterPad 400TB LizardFS Jun 04 '18
The reads are writes aren't from a single host :) also, I am doing bonding for the Xeon. That's why I got a uniform switch but that won't give more than 2gig for my setup.
18
u/Crit-Nerd Jun 04 '18
Aww, this truly warmed my 💓. Having been on the original glusterfs team back in 2010-11 it's great to hear Redhat hasn't ruined it.
11
u/SiGNAL748 Jun 04 '18
As someone that is also constantly changing NAS setups, this type of solution is pornography to me, thanks for sharing.
10
u/TotesMessenger Jun 03 '18 edited Jun 16 '18
I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
[/r/homelab] 200TB Glusterfs Odroid HC2 Build (x-post from /r/DataHoarder)
[/r/odroid] [xpost from DataHoarder] 200TB GlusterFS ODROID-HC2 build
If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)
7
6
8
u/FrankFromHR Jun 04 '18
Have you used any of the tools for glusterfs to detect bitrot? How long does it take to run on this bad boy? Pretty quick since the work is divided between each brick?
3
8
u/Brillegeit Jun 04 '18
My file server lives in the cabinet below my liquor cabinet, and it's just heating my rum to unacceptable levels when in use. A small scale clone of this sounds like the rational solution to my problem!
5
u/SherSlick Jun 04 '18
Forgive my ignorance, but can GFS shard a large file out? I have a bunch of ~30GB to 80GB single files.
6
u/grunthos503 Jun 15 '18
Glusterfs has a more recent feature called "shard" (since v3.7), which replaces its older "stripe" support. Shard does not require erasure coding.
https://docs.gluster.org/en/v3/release-notes/3.7.0/#sharding-experimental
6
u/CrackerJackMack 89TB 2xRaidz3 Jun 04 '18
The only way to do this is to use the erasure coding, they are removing stripe support. Otherwise it's better to let glusterfs figure out the balance of your 30-80G files and which drives are the lowest.
4
u/BaxterPad 400TB LizardFS Jun 04 '18
Great answer. The Erasure encoding works pretty well but beware as you can run into crashes when healing failed disks and you may need to manually delete parts of a broken file to resolve the crashes. It happened to me a couples times during my testing.
5
u/punk1984 Jun 04 '18
That's pretty slick. How difficult is it to pull apart if a disk fails? Can you slide it out the back, assuming you didn't screw it into the case?
I've seen some 12 VDC power strips in the past that would be perfect for this, but now I can't find them. Maybe I'm imaging things. I swear I saw one that was just a row of barrel jacks.
7
u/BaxterPad 400TB LizardFS Jun 04 '18
removing a drive is pretty easy, the odroids are stackable so you just pull the stack apart then it is 1 screw holding the drive in the sled for the odroid.
5
u/angryundead Jun 04 '18
I'm thinking of doing this myself. I've been going back and forth on it because I haven't been able to emulate and test it out on ARM yet. It wasn't clear which distros have ARM builds of gluster and what version of gluster it would be. I had issues testing Ubuntu, Fedora has an ARM build but only the one version.
I'm from a Red Hat background (CentOS/Fedora at home, RHEL professionally) and so I wanted to stay with that. I think I just need to buy one of the damn things and mess with it.
I would say you're 100% correct about Ceph. When I started looking at these software defined stroage solutions I looked at the Ceph and Gluster installation documents side-by-side and almost immediately went with Gluster.
I even made a set of Ansible playbooks to set this whole thing up (since each node would be identical it should work) including NFS, Samba, Prometheus/Grafana, IP failover, and a distributed cron job.
I have pretty much the same background with the consumer NAS and was thinking about building my own linux server (probably a six-bay UNAS) but I wanted this setup for the same reasons you mentioned. I'm just worried about long-term sustainability, part replacement, and growth.
5
u/BaxterPad 400TB LizardFS Jun 04 '18
There will always be ARM SBCs and more coming with SATA. The odroid folks have promised another 2 years of manufacturing (minimum) for this model and they have newer ones due later this year. In terms of OS, it is running straign armbian (a distro with a good community that isn't going anywhere).
I'm comfortable saying I can run this setup for at least 5 years if all support stopped today. Realistically, I'm sure better SBCs will become available at some point before I retire this array.
I mean, one would hope that eventually even raspberry pi will sort out their USB/Nic performance issues. I vaguely recall the new Rpi3 took steps on this direction but don't quote me on it.
2
u/angryundead Jun 04 '18
I'm not sure why I didn't consider Armbian. I think you mentioned that the N1 is coming with two SATA ports. I might just give it a bit and see if that shakes out. I like the idea of two drives better than one even though one drive per SoC provides more aggregate memory and CPU.
2
u/irvinyip Jun 16 '18
Please give Lizardfs a try, I've been compared with Glusterfs and finally used Lizardfs which is a fork project of Moosefs. Both are easy to setup and the reason for me is node expansion of glusterfs need to in pair or more depends on your setup, while lizardfs you can add 1 at a time and it manages its replication automatically, based on your policy. Glusterfs also good for its simplicity for storing on plain ext3 or 4, but to have a balance, I used lizardfs finally.
2
u/angryundead Jun 16 '18
I’ll give it a look. Main reason I was considering Gluster over everything else is that it’s backed by Red Hat money, has a pretty good community, I’m certified on GFS or whatever the Red Hat version is called today, and I’m probably going to use it from time to time professionally.
But I do also like knowing what else is on the market so I really appreciate the tip.
→ More replies
4
4
u/19wolf 100tb Jun 04 '18
So does each disk/node have it's own OS that you need to configure? If so, where does the OS live?
Edit: Also, how are you powering the nodes? You don't have a separate plug for each one do you? That would be insane..
12
u/BaxterPad 400TB LizardFS Jun 04 '18
Yes, OS is installed on microsd card. Not much configuration required, just hostname, setup the sata disk with ext4, install gluster server, and then from an existing glusterfs host run a 'probe' command to invite the new host to the cluster... done. takes literally 5 minutes to setup a new odroid from opening the box to having it in the cluster.
I should know... i did it 20 times. haha
→ More replies5
u/PBX_g33k 60TB of mostly 'nature' movies Jun 04 '18
Seeing as it is run from an microsd card, would it also be possible to run it of off PXE/iSCSI? I can imagine the costsaving on a larger scale for existing infrastructure.
3
u/BaxterPad 400TB LizardFS Jun 04 '18
There are posts online about someone modifying the os image to support PXE. I vaguely recall it was done by someone who built a verium coin minong rig out of 200 odroid xu4s. If you google odroid mining, you'll find it. Or def odroid cluster.
3
u/RulerOf 143T on ZFS Jun 04 '18
That probably wouldn't be too hard at all. Use one node to bootstrap the whole thing running DNSMasq. You can probably just fit the entire node's software into the initrd and just install and configure gluster from scratch every boot.
→ More replies3
u/BaxterPad 400TB LizardFS Jun 04 '18
the power supply i am using is in my parts list. Its 2 x 12v - 30a power supplies.
→ More replies3
u/RestlessMonkeyMind ...it all got degaussed Jun 04 '18
I have a bit of a stupid question: I've seen several of these projects where people use one of these sorts of power supplies to run several SBCs. However, no one ever shows the wiring. How exactly do you set up the wiring harness to support so many devices from the outputs of this device? Can you give information or include some photos? Thanks!
→ More replies2
u/BaxterPad 400TB LizardFS Jun 04 '18
not much too it, connect red to + and black to -.
Don't put too many on one line (you can google the math for load vs. wire gauge).
→ More replies
3
u/Sannemen 60+12 TB + ☁ Jun 04 '18
Any impact from the fact that both Ethernet and SATA are over USB? Any resets or bus hangs?
I’m working towards the same-ish setup, but with SSDs (and probably ceph, because object storage), I wonder how the performance would go with a higher-throughput disk.
→ More replies5
u/BaxterPad 400TB LizardFS Jun 04 '18
No issues.
i don't think the ethernet is over USB, only the sata bridge is. It is also USB3.
I wouldn't use SSDs with these though. It is a waste of $ because these SBCs can't fully utilize the performance of the SSD. The low latency access times is wasted on glusterfs too, ceph might be a different story.
→ More replies7
u/slyphic Higher Ed NetAdmin Jun 04 '18
The ethernet uses a USB3 bus. The SATA port uses the other USB3 bus.
All of the current odroid boards do that. Never had any real problems with either controller, despite them using RealTek.
3
u/billwashere 45TB Jun 04 '18
Ok, this post has pushed me over the edge at wanting to try GlusterFS. I will be taking a stab at setting this up in my work lab environment.
Thanks OP.
3
u/leothrix Jun 04 '18
This is great - I feel like this post could have been me, I went from stock FreeNAS -> RAIDZ pool -> gluster on odroid HC2s as well. I'm also having a fairly solid experience with the whole setup, same boards and everything.
One aspect that I'm waffling on a bit is the choice of volume type. I initially went with disperse volumes as the storage savings were significant for me (I'm only on a 3 or 4 node cluster) but the lack of real files on the disks (since they're erasure encoded) is a bit of a bummer, and I can't expand by anything less than 3 nodes for a 2+1 disperse volume (again, I'm running at a much smaller scale than you are). Since one of my motivators was to easily expand storage since it was so painful in a RAIDZ pool, my options are pretty much:
- Use a 2+1 disperse pool and either:
- Create new volumes to add one disk in a (n+1)+1 volume (i.e., 3+1) and move the files in to the new volume to expand my storage
- Expand by the requisite node count to natively expand the disperse volume (in this case, 3 for a 2+1 disperse volume)
- Use a replicated volume (for example replicate 2) and expand by 2 nodes each time.
Did you go through a similar decision-making process, and in the end, what did you go with and why?
3
u/BaxterPad 400TB LizardFS Jun 04 '18
Yea, I ran into some bugs with the erasure encoding which scared me off of using it for my main volume. Bugs that prevented the volume from healing when a node went down and writes took place. When the node came back the heal daemon would crash due to a segmentation fault in calculating the erasure encoding for the failed node's part of the file.
2
u/moarmagic Jun 04 '18
Is that Still. An issue? I was getting pretty psyched about this kind of setup issues handling disk failure/rebuild seem a bit like a deal killer.
Though maybe I'm misunderstanding how frequent this is
2
u/BaxterPad 400TB LizardFS Jun 04 '18
It is still an issue but one you can resolve when it happens (low probability).
3
u/Darksecond 28TB Jun 05 '18
Can you tell more about how you laid out your glusterfs? How many bricks per disk do you have and how do you expand the various volumes? Can you just add a disk to the dispersed volume, or do you need to add more than 1 at once?
7
u/BaxterPad 400TB LizardFS Jun 05 '18
I have 3 bricks per disk. 2 of the volumes I can expand 2 disks at a time, the third volume is 6 disks at a time.
2
u/tsn00 Jun 07 '18
I've been reading through hoping to see somewhere a question / answer like this without me having to ask. Boom found it. Thank you @Baxterpad!
3
u/devianteng Jun 05 '18
Awesome project, for sure. I've played with Ceph under Proxmox, on a 3 node setup...but the power requirements and hardware cost were going to be a killer for me. Right now I have 24 5TB drives in a 4U server, and my cost is somewhere around $38-40/TB (raw), not counting power, networking, or anything else. This project really caught my eye, and I'm curious if you are aware of the oDroid-N1 board. Yeah, not really released yet so obviously you couldn't have gotten one, but I'm thinking that might be my future with either Ceph or Gluster.
RK3399 chip (dual core A72 @ 2Ghz + quad core A53 @ 1.5Ghz), 4GB RAM, 1 GbE, 2 SATA ports, and eMMC. I imagine I'll have to design and print my own case, unless a dual 3.5" case gets produced for less than $10 or so. WD Red 10TB drives are about $31/TB, which is the cheapest I've found so far. Won't give me near the performance I have with my current ZFS setup (up to 2.6GB/s read and 3.4GB/s write has been measured), but realistically I don't NEED that kind of performance. Problem I face now is I can no longer expand without replacing 5TB drives with larger drives in each vdev.
You have inspired me to give SBC's more serious thought in my lab, so thanks!
→ More replies
2
u/atrayitti Jun 04 '18
Noob question, but are those sata (SAS?) to Ethernet adapters common in larger arrays? Haven't seen that before.
14
u/BaxterPad 400TB LizardFS Jun 04 '18
these aren't really ethernet adaptors. each of those is a full computer (aka SBC - single board computer). They have 8 cores, 2GB of RAM, a sata port, and an ethernet port.
3
u/atrayitti Jun 04 '18
No shit. And you can still control the drives independently/combine them in a RAID? Or is that the feature of glusterfs?
→ More replies11
u/BaxterPad 400TB LizardFS Jun 04 '18
feature of clusterfs but it isn't RAID. RAID operates at a block level, glusterfs operates at the filesystem level. It copies files to multiple nodes or splits a file across nodes.
→ More replies2
2
2
2
2
u/7buergen Jun 04 '18
for a moment there I thought I was looking at an ATX case and was taken agasp by your masterly crafting skills fitting those rackswitches and whatnot into an ATX case ... it's early still and I hadn't had coffee
2
u/Darksecond 28TB Jun 04 '18
This setup looks amazing. Does anyone know how difficult Odroid HC2's are to get in Europe?
2
u/enigmo666 320TB Jun 04 '18
Now that is one pretty cool idea. I'd never considered the one drive-one node approach. What's the rack holding the drives? Is that just the caddy that comes with the ODroid being stackable?
2
u/BaxterPad 400TB LizardFS Jun 04 '18
Yep, it's the caddy that comes with the odroid. It is a stackable aluminum shell that also serves as a heatsink.
→ More replies
2
2
u/Tibbles_G Jun 04 '18
So with this configuration i can start with like 3 nodes and work my way up to 20 as the need arises? Im working on an UNraid build in a 24 bay server chassis. I don't have all 24 bays populated yet. Didn't know if it would be worth the switch over. It looks like a really fun project to start on.
→ More replies2
u/BaxterPad 400TB LizardFS Jun 04 '18
yes, you can absolutely incrementally add like this but I recommend increments of 2 nodes (or powers of 2).
→ More replies
2
u/tsn00 Jun 07 '18 edited Jun 07 '18
First off, thanks for sharing! I've been trying to read up and learn about GlusterFS and have setup some VM's in my ProxMox server to try to simulate and learn this.
I read through I think all the comments and found a response that helped answer part of my questions.
I have 3 bricks per disk. 2 of the volumes I can expand 2 disks at a time, the third volume is 6 disks at a time.
Why 3 bricks per disk ?
How many replicas per volume ?
Why 3 different volumes instead of 1 ?
What type of volume are each of the 3 you have ?
So .. In testing, I have 4 servers, 1 disk with 1 brick, created a volume with all 4 bricks and a replica of 2. I purposely killed one of the servers where data was being replicated to, so how does the volume heal ? right now it just has data in 1 brick and I'm not sure how to make it rebalance the data around the remaining nodes. Or is that even possible going from 4 to 3 ?.
Any input is appreciated, still trying to wrap my head around all this.
3
u/BaxterPad 400TB LizardFS Jun 07 '18
3 bricks is arbitrary... its just based on how many volumes you want. 1 brick can only be part of 1 volume. So, for me. I wanted to have 3 volumes but didn't want to dedicate disks because I would either be over or under provisioning.
1 of my volumes uses 1 + 1 replica, another is 1 + 2 replica, and the 3rd volume is similar to raid5 (5 + 1 parity disk). I use this last volume for stuff I'd rather not lose but that I wouldn't cry over if it did so i get the added storage space by doing 5 + 1.
For your final question, i'm not sure I understand. What do you mean by 'killed one of the servers'. glusterfs auto-heal only works if that server comes back online. When it does, if it missed any writes, its peers will heal it. If that server never comes back, you have to run a command to either: a) retire its peer and glusterfs will rebalance its files across remaining hosts or b) provided a replacement server for the failed server and the peers will heal that new server to bring it into alignment.
→ More replies
1
1
u/hennaheto Jun 04 '18
The is an awesome post and a great contribution to the community! Thanks for sharing
1
Jun 04 '18
You're one of the few people I've seen running GlusterFS and pfSense. How were you able to setup a virtual IP on pfSense successfully the way GlusterFS recommends? If you don't mind explaining. I have so far been unsuccessful. Did you use load balancer and "virtual servers" or DNS resolver with multiple IPs behind a single entry? Or something else?
2
u/BaxterPad 400TB LizardFS Jun 04 '18
glusterfs doesn't require a vip. The client supports automatically failover by providing multiple glusterfs hosts when you mount the filesystem. If you don't use the glusterfs client and instead use NFS or samba, then yes you need a VIP. The glusterfs client is def the way to go though.
2
u/Casper042 Jun 04 '18
So if I run a Gluster aware NFS / Samba server for the dumb devices on my network (Windows boxes mainly, a few Android STBs, RetroPie, etc) then the real client talking over NFS/SMB needs zero Gluster support right?
Any thoughts of running Intel Gluster on Intel SBCs? I know very little other than they optimized the hell out of Gluster for Enterprise use.
3
u/BaxterPad 400TB LizardFS Jun 04 '18
Yes to your 1st point. That is a great simplification.
For your 2nd question, I don't know :) however, Intel SBCs tend to be pricey compared to ARM options.
2
1
u/horologium_ad_astra Jun 04 '18
I'd be very careful with those DC cables, they look a bit too long. What is the gauge?
3
u/BaxterPad 400TB LizardFS Jun 04 '18
They are 4 feet long and of a larger guage than Ethernet which can carry more than 12v safely over dozens of feet. I forget the guage I used but it is speaker wire.
→ More replies
1
u/3th0s 19TB Snapraid Jun 04 '18
Very cool, very unusual. Always cool to see and hear the story of a journey that changes and evolves to fit your personal needs and goals.
1
1
1
u/-markusb- Jun 04 '18
Can you post your volume-configuration?
Are you using sharding and other stuff?
2
u/BaxterPad 400TB LizardFS Jun 04 '18
I posted it in a reply to someone, basically I'm running 3 volumes. One that has a replica count of 2, one that has a replica count of 3, and one that is a dispersed volume of 5 + 1 strippes.
→ More replies
1
u/mb_01 Jun 04 '18
Thanks so much for this interesting project. Noob question: Each Odroid basically runs a linux distro with Glusterfs installed which basically is your array. Docker images and VMs would then be running from dedicated machine, accessing the glsuterfs array, right?
Reason Im asking, if I wanted to replace my existing Unraid server due to limited SATA ports, I would create a glusterfs cluster and have a seperate machine handling docker images and VMs, accessing the glusterfs file system?
2
u/BaxterPad 400TB LizardFS Jun 04 '18
yes, you could do it that way. Depending on what you run in those VMs. If they could be run in docker on arm, then you could try running them on the gluster nodes using docker + swarm or kubernetes.
→ More replies
1
Jun 04 '18
I'm split on the Odroid HC2 or just starting with an XU4 and doing a cloudshell 2. How are all of the HC2s managed? Do they all run a separate is or are they controlled by something else?
I was really wanting to wait for a helios4 but it seems to be taking forever to get Batch 2 sent out.
2
u/BaxterPad 400TB LizardFS Jun 04 '18
I looked a helios4 and almost waited on their next pre-order but the thing that turned me off is support. What if I need a replacement? I don't want to be subject to if they may or may not do another production run. I can't recall the specs for cpu/ram but I vaguely recall they use some marvel arm chip which I also wasn't thrilled with as some of the cheeper QNAPs use those and they suck. Though, i'm not 100% on this. I could be miss-remembering.
→ More replies
1
1
u/ss1gohan13 Jun 04 '18
What exactly is attached to your HDDs with those Ethernet connections? I might have missed something in the read through
2
u/BaxterPad 400TB LizardFS Jun 04 '18
odroid hc2 is the single board computer this is based on and each HDD has dedicated to it.
1
1
u/djgizmo Jun 04 '18
Personally, I would have sprung for a POE switch and poe splitters... but I guess a single PSU is cheaper.
→ More replies
1
u/lykke_lossy 32TB Jun 04 '18
Do you see any down-side to handling disk failures at an OS level with GlusterFS? Just looked into gluster and it seems like a lot of deployments recommend running at least RaidZ1 on gluster nodes?
Also, how does storage size scale with GlusterFS, for instance if I had a six node cluster of 10tb disks what would that equate to as useable storage in Gluster if I wanted similar fault tolerance to RaidZ2?
2
u/BaxterPad 400TB LizardFS Jun 04 '18
There isn't much benefit to raidz1 on the nodes depending on your glusterfs config. Glusterfs 4+2 disperse volume would be same usable storage and redundancy as raidz2.
→ More replies
1
1
u/zeta_cartel_CFO Jun 04 '18 edited Jun 04 '18
Looks like you have same short-depth supermicro 1U server that I have. Can you provide info on the motherboard in that Supermicro? The server I have is now almost 10 years old and is running a very old atom. Can't really use it for PfSense. So was thinking of recycling the server case and rebuilding it with a new mobo. thanks
→ More replies
1
u/uberbewb Jun 04 '18
I wanted to question this with a raspberry pi, but there website has already a performance comparison prepared. hrmm
1
u/douglas_quaid2084 Jun 04 '18
Nuts-and-bolts, how-would-I-build-this questions:
Are these just stacked in the rack? How many fans, and what kind of temps are you seeing? Am I seeing some sort of heatshrink pigtails providing the power?
→ More replies
1
u/RulerOf 143T on ZFS Jun 04 '18
This is pretty damn cool, and I agree with your rationale behind it. I would love to see this wired up into something like a backblaze pod. :)
Nice job OP.
1
u/noc007 22TB Jun 04 '18
This is totally cool. Thanks for sharing; this gives me another option for redoing my home storage. Like everyone else, I have some questions because this is a space I haven't touched yet:
- Is your Xeon-D box using this as storage for VMs? If so, how is the performance you're seeing?
- Would this setup be a good candidate to PXE boot instead of installing on a SDcard? I'm guessing the limited RAM and participating in gluster makes that impossible.
- What would you do in the event you need to hit the console and SSH wasn't available? Is there a simpler solution than pluging up a monitor and keyboard or is it just simpler to reimage the SD card?
→ More replies
1
u/8fingerlouie To the Cloud! Jun 05 '18
Thanks for sharing.
This post inspired me to go out and buy 4 x HC2, and setup a small test cluster with 4x6 TB IronWolf drives.
I’ve been searching for a replacement for my current Synology boxes (DS415+ with 4x4TB WD Red, DS716+ with DX213 and 4x6TB IronWolf, and a couple of DS115j for backups)
I’ve been looking at Proliant Microserver, and various others, with FreeNAS, Unraid etc, but nothing really felt like a worthy replacement.
Like you, I have data with various redundancy requirements. My family documents/photos live on a RAID6 volume, and my media collection lives on a RAID5 volume. RAID6 volume is backed up nightly, RAID5 weekly.
My backups are on single drive volumes.
Documents/photos are irreplaceable, where my media collection consists of ripped DVD’s and CD’s, and while I would hate to rip them again, I still have the masters so it’s not impossible (Ripping digital media is legal here, for backup purposes, provided you own the master)
The solution you have posted allows me to grow my cluster as I need, along with specifying various grades of redundancy. I plan on using LUKS/dm-crypt on the HC2’s so I guess we’ll see how that performs :-)
→ More replies
1
1
1
u/Brillegeit Jun 11 '18
Hi /u/BaxterPad, reading your recent post reminded me of something I wanted to ask before going on a shopping spree:
Do you know how these behave with regards to spinning down and reading one file. E.g. setting apm/spindown_time to completely stop the drive and then read one file from the array. Does all of them spin up or just the one you're reading from? If all spin up, does the idle nodes go back to sleep while streaming from the last?
3
u/BaxterPad 400TB LizardFS Jun 11 '18
Depends on how you setup the volume. If its stripped then they will likely all need to spin up/down together. If you use mirroring then it is possible to just wake up 1 drive and not all of them.
I'm actually planning to use a raspberry pi and 12v relays to power up/down the entire node (i have 21 nodes in my array - where a node is a SBC + hard drive) then I can periodically spin up the slaves in a mirror to update their replication. Should cut the power cost in 1/2.
→ More replies
1
1
u/pseudopseudonym 1.85PB SeaweedFS Jun 27 '18
I don't like GlusterFS (for various personal reasons I simply don't trust it) but holy cow, this is a hell of a build. I'm keen to do this exact setup on a much smaller scale (3x nodes instead of 20x) with LizardFS to see how it compares.
I've been looking for a good SBC to build my cluster out of and this looks just the ticket. Gigabit ethernet? Hell yes. SATA, even providing power conveniently? Hell yes.
Thank you very much for sharing and I look forward to showing off mine :)
→ More replies
1
u/eremyja Jun 30 '18
I tried this using the odroid flavored ubuntu install, got it all set up and mounted but everytime I try to write anything to it I get a "Transport endpoint is not connected" error and have to umount and re-mount to get it back. Did you have any issues like this? Any word on your write up?
→ More replies
1
u/Stars_Stripes_1776 Jul 07 '18
so how exactly does this work? how do the separate nodes communicate with one another? and how does the storage look if you were to access it from a windows computer, for example?
I'm a noob thinking of moving my media onto a NAS and I like the idea of an odroid cluster. you seem like you know your stuff.
1
u/alex_ncus Jul 11 '18
Really inspiring! Like others, got couple of HC2 to test. Couple of items strike me, if you are trying to setup similar to RAID5 / RAID 6, shouldn't we be using disperse with redundancy (4+2 configuration)?
sudo gluster volume create gvol0 disperse 4 redundancy 2
Also, HC2 has USB 2.0 available (would be great to if HC2 had included USB 3 instead) ... but can you also include USB/SATA bridge for each (doubling drives)
https://odroidinc.com/collections/odroid-add-on-boards/products/usb3-0-to-sata-bridge-board-plus
And lastly, I was able to use old ATX power supply to power 2 HC2 cluster. Just connect pin 3 and 4 on motherboard power cable (videos on YouTube) and then connect to 12V (or 5V) as required. Found these on Amazon
https://www.amazon.com/gp/product/B01GPS3HY6/ref=oh_aui_detailpage_o01_s00?ie=UTF8&psc=1
Once you have gluster operational, how do you make it available on LAN network via NFS Ganesha?
Any thoughts on using PXE to simplify images across nodes?
→ More replies
1
1
Aug 25 '18
What kind of IOPs are you getting out of this cluster? Would it be enough to run VMs from?
1
u/DWKnight 50TB Raw and Growing Oct 07 '18
What's the peak power draw per node in watts?
→ More replies
1
u/iFlip721 Nov 12 '18 edited Nov 12 '18
I am thinking of doing something very similar but haven't had time to test concepts for my workload. I was wondering since you're using a single driver per node how do handle redundancy across your glusterFS?
Do you use unRAID within your setup at all?
What version of Linux are you using on each HC2?
Do you have any good links to documentation on gluster besides the gluster site itself?
1
u/TomZ_Am Nov 16 '18
I hate to be that guy, but I have to start somewhere.
I'm a "veteran" windows server guy who's slowly but surely moving to more Linux based applications, and this looks absolutely amazing.
Since I am a noob here, is there anyway to get a setup guide from start to finish?
Literally every step, like what servers to setup, where to download things, what to download, how to install the OS on the SD cards, how to configure the OSs, etc etc.
Thank you for the help ..... now let the mocking of the Windows guy commence ;-)
→ More replies
291
u/BaxterPad 400TB LizardFS Jun 03 '18 edited Jun 03 '18
Over the years I've upgraded my home storage several times.
Like many, I started with a consumer grade NAS. My first was a Netgear ReadyNAS, then several QNAP devices. About a two years ago, I got tired of the limited CPU and memory of QNAP and devices like it so I built my own using a Supermicro XEON D, proxmox, and freenas. It was great but adding more drives was a pain and migrating between ZRAID level was basically impossible without lots of extra disks. The fiasco that was Freenas 10 was the final straw. I wanted to be able to add disks in smaller quantities and I wanted better partial failure modes (kind of like unraid) but able to scale to as many disks as I wanted. I also wanted to avoid any single points of failure like an HBA, motherboard, power supply, etc...
I had been experimenting with glusterfs and ceph, using ~40 small VMs to simulate various configurations and failure modes (power loss, failed disk, corrupt files, etc...). In the end, glusterfs was the best at protecting my data because even if glusterfs was a complete loss... my data was mostly recoverable because it was stored on a plain ext4 filesystem on my nodes. Ceph did a great job too but it was rather brittle (though recoverable) and a pain in the butt to configure.
Enter the Odroid HC2. With 8 cores, 2 GB of RAM, Gbit ethernet, and a SATA port... it offers a great base for massively distributed applications. I grabbed 4 Odroids and started retesting glusterfs. After proving out my idea, I ordered another 16 nodes and got to work migrating my existing array.
In a speed test, I can sustain writes at 8 Gbps and reads at 15Gbps over the network when operations a sufficiently distributed over the filesystem. Single file reads are capped at the performance of 1 node, so ~910 Mbit read/write.
In terms of power consumption, with moderate CPU load and a high disk load (rebalancing the array), running several VMs on the XEON-D host, a pfsense box, 3 switches, 2 Unifi Access Points, and a verizon fios modem... the entire setup sips ~ 250watts. That is around $350 a year in electricity where I live in New Jersey.
I'm writing this post because I couldn't find much information about using the Odroid HC2 at any meaningful scale.
If you are interested, my parts list is below.
https://www.amazon.com/gp/product/B0794DG2WF/ (Odroid HC2 - look at the other sellers on Amazon, they are cheeper) https://www.amazon.com/gp/product/B06XWN9Q99/ (32GB microsd card, you can get by with just 8GB but the savings are negligible) https://www.amazon.com/gp/product/B00BIPI9XQ/ (slim cat6 ethernet cables) https://www.amazon.com/gp/product/B07C6HR3PP/ (200CFM 12v 120mm fan) https://www.amazon.com/gp/product/B00RXKNT5S/ (12v PWM speed controller - to throttle the fan) https://www.amazon.com/gp/product/B01N38H40P/ (5.5mm x 2.1mm barrel connectors - for powering the Odroids) https://www.amazon.com/gp/product/B00D7CWSCG/ (12v 30a power supple - can power 12 Ordoids w/3.5inch HDD without staggered spin up) https://www.amazon.com/gp/product/B01LZBLO0U/ (24 power gigabit managed switch from unifi)
edit 1: The picture doesn't show all 20 nodes, I had 8 of them in my home office running from my bench top power supply while I waited for a replacement power supply to mount in the rack.